Download as pdf or txt
Download as pdf or txt
You are on page 1of 136

Automatic recognition of license plates

Study group IN6-621


Henrik Hansen, Anders Wang Kristensen,
Morten Porsborg Køhler, Allan Weber Mikkelsen,
Jens Mejdahl Pedersen and Michael Trangeled
May 26, 2002
Det Teknisk-Naturvidenskabelige Fakultet
Aalborg Universitet

Institut for Elektroniske Systemer

TITEL:
Automatisk genkendelse af num-
merplader

PROJEKTPERIODE: SYNOPSIS:
5. februar - 31. maj, 2002
Denne rapport beskriver analyse, de-
sign samt test af et system til automa-
PROJEKT GRUPPE: tisk genkendelse af nummerplader,
IN6-621 tænkt som et delsystem til en automa-
tisk fartkontrol. Input til systemet
er en serie farvebilleder af køretøjer i
GRUPPEMEDLEMMER: bevægelse, og output består af num-
Henrik Hansen merpladens registreringsnummer.
Anders Wang Kristensen Fremskaffelsen af de ønskede informa-
Morten Porsborg Køhler tioner sker i tre dele. Først udtrækkes
Allan Weber Mikkelsen nummerpladen fra det samlede billede,
derefter adskilles de syv tegn fra hinan-
Jens Mejdahl Pedersen
den, og til sidst genkendes de enkelte
Michael Trangeled
karakterer ved brug af statistisk møn-
stergenkendelse samt korrelation.
VEJLEDER: Algoritmerne blev udviklet ved hjælp
Thomas Moeslund af et sæt træningsbilleder, og testet på
billeder taget under varierende forhold.
Det færdige program er i stand til at
ANTAL KOPIER: 9 uddrage de ønskede informationer i en
høj procentdel af testbillederne.
RAPPORT SIDEANTAL: 123

APPENDIKS SIDEANTAL: 13

TOTAL SIDEANTAL: 136


Faculty of Engineering and Science
Aalborg University

Institute of Electronic Systems

TITLE:
Automatic recognition of license
plates
SYNOPSIS:
PROJECT PERIOD:
February 5. - May 31. 2002 This report describes analysis, design
and implementation of a system for au-
tomatic recognition of license plates,
PROJECT GROUP: which is considered a subsystem for au-
IN6-621 tomatic speed control. The input to
the system is a series of color images of
moving vehicles, and output consists of
GROUP MEMBERS:
the registration number of the license
Henrik Hansen
plate.
Anders Wang Kristensen Extraction of the desired information is
Morten Porsborg Køhler done in three steps. First, the license
Allan Weber Mikkelsen plate is extracted from the original im-
Jens Mejdahl Pedersen age, then the seven characters are iso-
Michael Trangeled lated, and finally each character is iden-
tified using statistical pattern recogni-
tion and correlation.
SUPERVISOR: The algorithms were developed using a
Thomas Moeslund set of training images, and tested on
images taken under varying conditions.
The final program is capable of extract-
NUMBER OF COPIES: 9
ing the desired information in a high
REPORT PAGES: 123 percentage of the test images.

APPENDIX PAGES: 13

TOTAL PAGES: 136


Preface

This report has been written as a 6th semester project at the Institute of
Electronic Systems at Aalborg University. The main theme of the semester
is gathering and description of information, and the goal is to collect physical
data, represent these symbolically, and demonstrate techniques for processing
these data. The report mainly applies to the censor and supervisor, along with
future students at the 6th semester in Informatics.
This report includes analysis, design and test of a system designed to au-
tomatically recognize license plates from color images. Source code and the
corresponding executable program are included on the attached CD (See Ap-
pendix C for full contents of the CD, as well as instructions of use).
We would like to thank the Aalborg Police Department for information used
in the report, and for their guided tour of the present speed control system.
Also, we would like to thank our supervisor of this project, Thomas Moeslund.

——————————— ———————————
Henrik Hansen Anders Wang Kristensen

——————————— ———————————
Morten Porsborg Køhler Allan Weber Mikkelsen

——————————— ———————————
Jens Mejdahl Pedersen Michael Trangeled
Contents

Introduction 13

I Analysis 15

1 Traffic control 17
1.1 Current system . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1.1 Disadvantages of the current system . . . . . . . . . . . 18
1.2 Improved system . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3 The components of the system . . . . . . . . . . . . . . . . . . . 21
1.3.1 Differences from the existing system . . . . . . . . . . . . 22
1.3.2 Project focus . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 License plates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.1 Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4.2 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.4.3 Mounting and material . . . . . . . . . . . . . . . . . . . 24
1.5 System definition . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6 Delimitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.6.1 License plate formats . . . . . . . . . . . . . . . . . . . . 25
1.6.2 Video processing . . . . . . . . . . . . . . . . . . . . . . 25
1.6.3 Identification of driver . . . . . . . . . . . . . . . . . . . 26
1.6.4 Transportation of data . . . . . . . . . . . . . . . . . . . 26
1.6.5 Quality of decisions . . . . . . . . . . . . . . . . . . . . . 26

II Design 27

2 License plate extraction 29


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3
CONTENTS

2.2 Hough transform . . . . . . . . . . . . . . . . . . . . . . . . . . 30


2.2.1 Edge detection . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.2 Line detection using Hough transform . . . . . . . . . . . 33
2.2.3 Line segment extraction . . . . . . . . . . . . . . . . . . 35
2.2.4 Candidate region extraction . . . . . . . . . . . . . . . . 36
2.2.5 Strengths and weaknesses . . . . . . . . . . . . . . . . . 36
2.3 Template matching . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.3.1 Template construction . . . . . . . . . . . . . . . . . . . 37
2.3.2 Cross correlation . . . . . . . . . . . . . . . . . . . . . . 39
2.3.3 Normalized cross correlation . . . . . . . . . . . . . . . . 41
2.3.4 Strengths and weaknesses . . . . . . . . . . . . . . . . . 41
2.4 Region growing . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.1 License plate features . . . . . . . . . . . . . . . . . . . . 42
2.4.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4.3 Extracting the regions . . . . . . . . . . . . . . . . . . . 44
2.4.4 Strengths and weaknesses . . . . . . . . . . . . . . . . . 46
2.5 Combining the method . . . . . . . . . . . . . . . . . . . . . . . 46
2.5.1 Scale invariance . . . . . . . . . . . . . . . . . . . . . . . 47
2.5.2 Rotation invariance . . . . . . . . . . . . . . . . . . . . . 47
2.5.3 Lighting invariance . . . . . . . . . . . . . . . . . . . . . 47
2.5.4 Summarizing the differences . . . . . . . . . . . . . . . . 48
2.5.5 Candidate selection . . . . . . . . . . . . . . . . . . . . . 48
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3 Character isolation 51
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2 Isolating the characters . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.1 Static bounds . . . . . . . . . . . . . . . . . . . . . . . . 52
3.2.2 Pixel count . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.3 Connected components . . . . . . . . . . . . . . . . . . . 53
3.2.4 Improving image quality . . . . . . . . . . . . . . . . . . 55
3.2.5 Combined strategy . . . . . . . . . . . . . . . . . . . . . 56
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4 Character identification 59
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.2 Template matching . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Statistical pattern recognition . . . . . . . . . . . . . . . . . . . 60
4.4 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4
CONTENTS

4.4.2 End points . . . . . . . . . . . . . . . . . . . . . . . . . . 62


4.4.3 Circumference . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.4 Compounds . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.4.5 The value of each pixel . . . . . . . . . . . . . . . . . . . 67
4.5 Feature space dimensionality . . . . . . . . . . . . . . . . . . . . 67
4.5.1 Result using SEPCOR . . . . . . . . . . . . . . . . . . . 68
4.6 Decision theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.6.1 Bayes decision rule . . . . . . . . . . . . . . . . . . . . . 69
4.6.2 Discriminant functions . . . . . . . . . . . . . . . . . . . 70
4.6.3 Test of normal distribution . . . . . . . . . . . . . . . . . 75
4.6.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . 79
4.7 Comparing the identification strategies . . . . . . . . . . . . . . 81
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

III Test 83

5 Extraction test 85
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Region growing and Hough transform . . . . . . . . . . . . . . . 85
5.2.1 Criteria of success . . . . . . . . . . . . . . . . . . . . . . 85
5.2.2 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.3 Test description . . . . . . . . . . . . . . . . . . . . . . . 86
5.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.1 Criteria of success . . . . . . . . . . . . . . . . . . . . . . 88
5.3.2 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.3 Test description . . . . . . . . . . . . . . . . . . . . . . . 88
5.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
5.4 Combined method . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.1 Criteria of success . . . . . . . . . . . . . . . . . . . . . . 90
5.4.2 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.3 Test description . . . . . . . . . . . . . . . . . . . . . . . 90
5.4.4 Result of test . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6 Isolation test 93
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.2 Criteria of success . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3 Test description . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

5
CONTENTS

6.4 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94


6.5 Connected components . . . . . . . . . . . . . . . . . . . . . . . 94
6.6 Pixel count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.7 Static Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.8 Combined method . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.8.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 97

7 Identification test 99
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.2 Criteria of success . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.3 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4 Feature based identification . . . . . . . . . . . . . . . . . . . . 100
7.4.1 Test description . . . . . . . . . . . . . . . . . . . . . . . 100
7.4.2 Result of test using Euclidian distance . . . . . . . . . . 100
7.4.3 Result of test using Mahalanobis distance . . . . . . . . 102
7.5 Identification through correlation . . . . . . . . . . . . . . . . . 104
7.5.1 Test description . . . . . . . . . . . . . . . . . . . . . . . 104
7.5.2 Result of test . . . . . . . . . . . . . . . . . . . . . . . . 104
7.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8 System test 107


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 Criteria of success . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.3 Test data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.4 Test description . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

IV Conclusion 111

9 Conclusion 113

V Appendix 115

A Videocamera 117
A.1 Physical components . . . . . . . . . . . . . . . . . . . . . . . . 118

B Visiting the police force 121


B.1 The system at the precinct . . . . . . . . . . . . . . . . . . . . . 121

6
CONTENTS

B.2 The system in the field . . . . . . . . . . . . . . . . . . . . . . . 122


B.3 Evaluation of the current system . . . . . . . . . . . . . . . . . 123

C Contents of CD 125
C.1 The program . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

VI Literature 127

7
List of Figures

1.1 Equipment inside the mobile control station . . . . . . . . . . . 18


1.2 Workflow of current system . . . . . . . . . . . . . . . . . . . . 19
1.3 Workflow of the improved system . . . . . . . . . . . . . . . . . 20
1.4 The tasks of the system . . . . . . . . . . . . . . . . . . . . . . 21
1.5 Data abstraction level . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 A standard rectangular Danish license plate . . . . . . . . . . . 24

2.1 Image used for examples in Chapter 2 . . . . . . . . . . . . . . . 31


2.2 Overview of the Hough transform method . . . . . . . . . . . . 32
2.3 Horizontal and vertical edges . . . . . . . . . . . . . . . . . . . . 33
2.4 Example image and corresponding Hough transform . . . . . . . 34
2.5 Example license plate and reconstructed line segments . . . . . 35
2.6 Images with different lighting . . . . . . . . . . . . . . . . . . . 38
2.7 Template created through opacity-merging of the probabilities . 39
2.8 The layout of the cross correlation . . . . . . . . . . . . . . . . . 40
2.9 The steps of region growing . . . . . . . . . . . . . . . . . . . . 43
2.10 Binary image . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.11 Largest contained rectangle . . . . . . . . . . . . . . . . . . . . 45
2.12 Rectangle from maximum and minimum values . . . . . . . . . 45
2.13 Selection of the most probable candidate . . . . . . . . . . . . . 49

3.1 Isolating characters . . . . . . . . . . . . . . . . . . . . . . . . . 52


3.2 Example showing statically set character bounds . . . . . . . . . 52
3.3 Horizontal projections of the actual characters . . . . . . . . . . 53
3.4 An iteration in the process of finding connected components . . 54
3.5 Horizontal and vertical projections . . . . . . . . . . . . . . . . 55
3.6 Combination of character isolation procedures . . . . . . . . . . 57

4.1 Projections of all the digits . . . . . . . . . . . . . . . . . . . . . 62

9
LIST OF FIGURES

4.2 The neighbors of p1 . . . . . . . . . . . . . . . . . . . . . . . . 63


4.3 Example where the number of 0-1 transitions is 2, Z(p1) = 2. . 64
4.4 The steps of the thinning algorithm . . . . . . . . . . . . . . . . 64
4.5 Result of the thinning algorithm . . . . . . . . . . . . . . . . . 65
4.6 Two digits with different circumferences. . . . . . . . . . . . . . 65
4.7 Compound example . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.8 Result of applying a Laplacian filter. . . . . . . . . . . . . . . . 66
4.9 Number of features used based on maximum correlation . . . . . 69
4.10 Decision regions . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.11 The covariance matrix equals the identity. . . . . . . . . . . . . 73
4.12 The features are statistically independent. . . . . . . . . . . . . 73
4.13 Positive and negative correlation. . . . . . . . . . . . . . . . . . 74
4.14 Histogram plot showing approximately a normal distribution . . 76
4.15 Normal plot produced in Matlab . . . . . . . . . . . . . . . . . . 76
4.16 χ2 -test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.1 Examples of falsely accepted regions . . . . . . . . . . . . . . . 89

6.1 Example of unsuccessful and successful isolations . . . . . . . . 94


6.2 Plate that fails, using connected components . . . . . . . . . . . 95

7.1 Identification percentage plotted against number of features. . . 100


7.2 Error in character identification . . . . . . . . . . . . . . . . . . 101
7.3 Result of identification using training set as training . . . . . . . 103
7.4 Result of identification using test set as training . . . . . . . . . 104
7.5 The size problem illustrated. . . . . . . . . . . . . . . . . . . . . 105
7.6 The brightness problem exaggerated. . . . . . . . . . . . . . . . 106

8.1 Good quality license plates wrongly identified . . . . . . . . . . 109


8.2 Poor quality license plates correctly identified . . . . . . . . . . 109

A.1 Color aliasing using the Bayer pattern . . . . . . . . . . . . . . 118


A.2 CCD-chip with registers . . . . . . . . . . . . . . . . . . . . . . 119

10
List of Tables

1.1 Differences between the systems . . . . . . . . . . . . . . . . . . 23


1.2 Areas of the improved system addressed in this project . . . . . 23

2.1 Strengths of the Hough transform method . . . . . . . . . . . . 37


2.2 Weaknesses of the Hough transform method . . . . . . . . . . . 37
2.3 Strengths of the template method . . . . . . . . . . . . . . . . . 42
2.4 Weaknesses of the template matching method . . . . . . . . . . 42
2.5 Strengths of the region growing method . . . . . . . . . . . . . . 46
2.6 Weaknesses of the region growing method . . . . . . . . . . . . 46
2.7 Demands for license plate extraction method . . . . . . . . . . . 48

3.1 Strengths of different methods . . . . . . . . . . . . . . . . . . . 56

4.1 Class distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.1 Results for region growing and Hough transform. . . . . . . . . 87


5.2 Region selection by correlation. . . . . . . . . . . . . . . . . . . 89
5.3 Test of selecting most probable region . . . . . . . . . . . . . . . 90

6.1 Result from the connected component test . . . . . . . . . . . . 95


6.2 Result from the pixel count test . . . . . . . . . . . . . . . . . . 96
6.3 Result from the static bounds test . . . . . . . . . . . . . . . . . 96
6.4 Result from the combined method . . . . . . . . . . . . . . . . . 97

7.1 Result of the test on the test set . . . . . . . . . . . . . . . . . . 101


7.2 Result of the test on the training set . . . . . . . . . . . . . . . 101
7.3 Result of the tests on the training sets . . . . . . . . . . . . . . 102
7.4 Result of the tests . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.5 Identifying the characters using templates. . . . . . . . . . . . . 105

11
8.1 Overall result for the system . . . . . . . . . . . . . . . . . . . . 108

9.1 Main results obtained. . . . . . . . . . . . . . . . . . . . . . . . 114


Introduction

Many traffic accidents in Denmark are caused by high speed, and reports show
that speed control is reducing the speed, because it has a deterrent effect on the
drivers [8]. Therefore a new and more efficient way of speed control has been
introduced. It is a partially automatic control where a photo of the offender
is taken, instead of the traditional method where the police had to pull over
each driver.
Although the current system is far more effective than the old system, it still
has some shortcomings. The goal of this project is to investigate the possibility
of creating an improved system which alleviates some of these shortcomings.
We want to examine the possibility of automating the work flow of the system.
The reason behind this particular choice of philosophy is that the major dis-
advantage of the current system is the use of manual labor to accomplish tasks
that we believe nowadays could be handled just as efficiently by computers. A
secondary goal of this project is to explore the possibility of replacing parts of
the costly equipment currently used with more reasonably priced equipment.

13
Part I

Analysis

15
Chapter 1
Traffic control

The goal of this chapter is to create a system definition which describes the sys-
tem we want to develop. To identify the shortcomings of the current partially
automatic speed control system, and to identify the parts of the system which
arguably can be further automated, both the current system and a proposal
for an improved system will be described in this chapter.

1.1 Current system


In the current system, one police officer is required to serve an automatic speed
control station. The equipment is placed in anonymous vehicles and can easily
be moved. This mobility makes it possible to monitor traffic in larger areas
with the same equipment. When setting up the equipment at a new location,
several routines must be performed before the automatic control can take place.
The camera must be manually calibrated, and the system needs input such as
the allowed speed of the monitored road. Figure 1.1 shows the equipment
inside the control station. The picture was taken during an excursion with
Aalborg Police Department [7], which is described in Appendix B.
When a vehicle approaches the control station, the speed is measured by
radar and if the vehicle is speeding, a photo is automatically taken. The photo
shows a front view of the vehicle, so that the driver and license plate can
be identified. For each picture taken, the speed of the vehicle is stored on a
portable PC for later reference. The operator only takes action when special
vehicles such as busses pass. When this occurs, the system must be told that
another speed limit applies for the given vehicle.
The film with the offenders is developed at a central laboratory and after-
wards a manual registration of the license plates is made, and the driver of the
car receives a fine. The registration is performed in the police district where

17
Chapter 1. Traffic control

Figure 1.1: Equipment inside the mobile control station

the pictures were originally taken. Here they receive a digitized version of the
processed film. The entire procedure is illustrated in Figure 1.2.
Two persons are required for the manual registration of the license plates.
The first person uses a software tool for optimizing the pictures, so that driver
and plate appears clearly. Then the license plate is registered along with the
speed of the vehicle for each picture. If the vehicle was not speeding or if the
face of the driver does not appear clearly, the picture is discarded. The second
person verifies that the characters of the license plate have been correctly
identified. If the vehicle is identifiable, a fine is issued to the offender. The
entire process takes about 3-4 days from the picture is taken, until the fine is
received by the offender.

1.1.1 Disadvantages of the current system


Although the current system provides efficiency far superior to that of manually
pulling over vehicles, there are certain aspects that prevent the system from
being optimal. For one, the pictures are initially taken on regular film, and
later digitized. This means transport to and from the central laboratory, as
well as the work required to digitize the pictures. Besides, the camera and
speed measurement equipment is rather expensive. A fully equipped vehicle
costs approximately 1.000.000 DKR [7]. Compared to the initial cost of the

18
1.2 Improved system

100 km/t
50 km/t

RL 39 312
0 km/t

Manual
 
verification Manned
Radar
control
station

Camera

Manual

registration

Central
development
laboratory

Figure 1.2: Workflow of current system

vehicle, this is an extra 700.000 DKR for the speed control equipment itself.
Apart from the cost of materials, the manual registration of license plates
requires that the personnel is trained in the use of the previously mentioned
software tool.

1.2 Improved system


This section introduces the solution we propose, describes its architecture and
shows how it differs from the existing system.
Figure 1.3 shows the steps of the improved system. Basically the system
can be divided into two parts. The first involves monitoring the scene by
recording a video sequence of the road. The second part is the automated
analysis of the gathered information.
The righthand side of Figure 1.3 shows this situation with a camera placed
near the side of the road recording the cars passing by. This camera could
either be temporarily placed as part of a time limited surveillance post or the
camera could be permanently placed.
The second part of the mode of operation is an off-line analysis of the video
sequence, where potential traffic offenders are identified. This is done using

19
Chapter 1. Traffic control

100 km/t
50 km/t

RL 39 312
0 km/t

Automatic
registration



 


 

Unmanned
control
station

Video sequence

Figure 1.3: Workflow of the improved system

image analysis on a computer using the sequence as input. The lefthand side
of Figure 1.3 shows this part.
The purpose of the image analysis is two-fold; first it should be determined
for each car in the sequence if an actual traffic violation took place. This means
determining the speed of the vehicles based on the video sequence. Secondly
any offenders should be identified by reading the license plate of their cars, so
that they later can be held responsible.
Developing reliable systems for extracting the information needed for speed
estimation and vehicle identification is very important. In the case of estimat-
ing the speed of vehicles, the consequence of reaching the conclusion that a
vehicle drove too fast, when in reality it did not, is much greater than letting
a real offender slip by.
In the case of identifying license plates it is even worse if a wrong conclusion
is reached, since this would mean that an innocent person would be fined for
the actions of someone else. Therefore, the algorithms must be designed to
refuse to make such a decision if it is estimated that the probability of being
correct is below some predetermined threshold. An alternative would be a
manual decision in cases of doubt. Since the current system requires a manual

20
1.3 The components of the system

verification, it is reasonable to assume that the person could perform the same
task, but verifying the computers output instead of that of a co-worker.

1.3 The components of the system


Figure 1.4 shows the details of the latter of the two tasks described in Section
1.2. The first task, the recording of the video sequence, is not shown. The
recorded video sequence is considered input to the system. For a brief intro-
duction to digital cameras and some basic optics, Appendix A will give the
reader an understanding of image-related concepts such as resolution, as well
as some insight into the physical components of the camera.
Video sequence

Get next subsequence Identify characters


containing a vehicle

Isolate characters

Determine speed

Extract license plate

Select optimal frame

No Speed Yes
violation?

Figure 1.4: The tasks of the system

Basically the system consists of a sequence of 6 tasks, that are ideally


carried out once for each vehicle in the film. The first task is to extract the
next subsequence in the film containing the next vehicle. This subsequence is
then analyzed and the speed of the vehicle is determined. In the case where
no speed violation took place, no further processing is necessary, and the next
subsequence is fetched for processing.
If a violation took place, the license plate of the vehicle must be identified.
The part of the system responsible for this is shown in the righthand side of
Figure 1.4 in the dashed box. The input to this subsystem is a video sequence

21
Chapter 1. Traffic control

with a vehicle exceeding the speed limit. The subsystem consists of four tasks;
the first task selects the optimal frame of the incoming video sequence. The
second task then extracts the region believed to contain the license plate.
The third task isolates the seven characters and the last task identifies the
individual characters.

The seven characters


of the license plate
Identify characters
7 images with
the characters
Isolate characters
Subimage containing
only the license plate
Extract license plate

Single image
Select optimal frame

Input video sequence

Figure 1.5: Data abstraction level

Figure 1.5 shows the four tasks of the system. The right half of the figure
shows the input and output of these steps all the way from the input video
sequence to the final identified characters of the license plate. Alternatively
this progression could be viewed as the reduction or suppression of unwanted
information from the information carrying signal, here a video sequence con-
taining vast amounts of irrelevant information, to abstract symbols in the form
of the characters of a license place.

1.3.1 Differences from the existing system


The architecture introduced above differs from the existing system in several
aspects. The two most important differences are mentioned here. The first
difference is the fact that the manual identification of the vehicle’s license plate
has been eliminated, and is now carried out automatically by a computer. The
second difference is that the expensive still frame camera and radar distance
measurer have been replaced by a video camera, and therefore a later video
processing step must be introduced for determining the speed of the vehicles.
Table 1.1 shows a summary of the differences between the existing system,
and the system that will be investigated in this report.

22
1.4 License plates

Topic Current system Improved system


Speed determination Radar Postprocessing step
Plate identification Manually Automatically
Table 1.1: Differences between the systems

1.3.2 Project focus


In this project we delimit ourselves from the steps before extracting the license
plate in the best suited frame. The entire issue of whether or not the vehicles
are speeding will not be addressed in this project. The remainder of this report
is only concerned with developing algorithms for extracting the information
needed to identify the characters. The input is an image of a vehicle assumed
to be speeding.

Component In project
Speed
- Recording video
- Speed determination
Recognition
- Selecting optimal frame
- Extracting license plate ✓
- Isolating characters in plate ✓
- Identifying characters ✓
Table 1.2: Areas of the improved system addressed in this project

Before summing up the requirements to the system, an essential component


of the system, the Danish license plate, will be described, in order to establish
the characteristics that will later form a basis for important design choices.

1.4 License plates


The system for catching speed violators depends primarily on recognizing the
front side license plates. This section describes the characteristics for a stan-
dard Danish license plate based on the reference guide for Danish vehicle in-
spection [2].

1.4.1 Dimensions
All license plates must be either rectangular or quadric shaped. The dimensions
of the plate can vary, depending on the vehicle type, but the most common

23
Chapter 1. Traffic control

license plate is the rectangular shaped shown in Figure 1.6. The plate has the
dimensions (w × h) 504 × 120 mm and has all 7 characters written in a single
line.

Figure 1.6: A standard rectangular Danish license plate

Some vehicles uses ‘square’ plates with characters written in two lines, but
according to the guideline for vehicle inspection [2], all front side plates must
be rectangular shaped, and therefore this will be the only license plate shape
that this project will focus on. Other license plate dimensions are allowed for
different vehicles such as tractors and motorbikes.

1.4.2 Layout

The front side license plate on all Danish vehicles is either white enclosed by
a red frame, or yellow enclosed by a black frame. All standard license plates
have two capital letters ranging from A to Z, followed by five digits ranging
from 0 to 9. Beside the standard license plate, there is an option for buying a
so called wish-plate or customized license plates, where the vehicle owner can
choose randomly from 1 to 7 characters or digits to fill the plate. All characters
and digits are written in black, both on yellow and white license plates.

1.4.3 Mounting and material

The front side license plate must be mounted horizontally, and in upright
position seen from the side of the car. In other words, the textual part must
face directly forward. The plate may not be changed in shape, decorated or
embellished, and the plate may not be covered by any means. Bolts used for
mounting may not affect the readability of the plate and must be painted in
the same color as the mounting point on the plate. The reference guide says
nothing about the material, but generally the plates are made of a reflecting
material for increased readability.

24
1.5 System definition

1.5 System definition


The primary goal of this project, is to investigate how the existing system for
catching speed violators can be automated, using a video sequence from an
inexpensive video camera.
Once the video sequence has been recorded, the analysis can be divided in
two major parts. The first part is to have the system determine the speed of
the vehicle in question. If it is determined that a speed violation took place, the
second part of the analysis is carried out. The second part is the recognition
of the license plate. The extraction of the license plate, the isolation of the
individual characters and the recognition of these characters are the topics of
the remainder of this report.
Given that the previous steps were carried out successfully, the system can
now extract the name and address of the offender and issue a fine, completely
without human interaction.

 ¿

This project will focus on the design of algorithms used for extracting
the license plate from a single image, isolating the characters of the
plate and identifying the individual characters.
Á À

1.6 Delimitation
This section concludes the chapter by summing up the delimitations for the
system. The delimitations are grouped by subject.

1.6.1 License plate formats


This project does not consider square or customized plates. The vast majority
of license plates currently in use are of the standard rectangular shape and are
not customized. In a full-blown implementation of the system all license plate
types should of course be handled.

1.6.2 Video processing


The license plate recognition developed in this report operates on single images,
whereas a full implementation would also need to extract a suitable frame from
the input video sequence. No investigation is performed into neither techniques
for determining vehicle speed from a video sequence, nor for the extraction of
a suitable frame from the sequence.

25
Chapter 1. Traffic control

1.6.3 Identification of driver


As mentioned in Section 1.1 the driver must be clearly identifiable before a
fine can be issued. This report does not describe methods for automatically
determining whether this condition holds, but only how to provide an output
of the characters on the license plate.

1.6.4 Transportation of data


The question of how the film sequences are transferred to the processing com-
puter before the analysis is beyond the scope of this project. It is however rea-
sonable to imagine that permanent surveillance posts would have some sort of
static network connection, whereas temporary posts would periodically require
manual changes of the recording media. Alternatively some kind of wireless
LAN could be employed.

1.6.5 Quality of decisions


As described in Section 1.2 a key element in making the proposed system
useful is, that in cases of doubt when identifying the characters, that is when
the probability of the best guess being correct is below some threshold, the
system should refuse to make a decision. This issue is not addressed in this
project and instead the best guess is always used as the final result.

26
Part II

Design

27
Chapter 2
License plate extraction

The seven characters


of the license plate
Identify characters
7 images with
the characters
Isolate characters
Subimage containing
only the license plate
Extract license plate

Single image

2.1 Introduction
Before isolating the characters of the license plate in the image, it is advanta-
geous to extract the license plate.
This chapter presents three different extraction strategies. First the theory
behind each method is developed, then the strengths and weaknesses are sum-
marized and finally, in the last section of this chapter, the three strategies are
combined. The strategies are Hough transform, template matching and region
growing. The common goal of all these strategies are, given an input image, to

29
Chapter 2. License plate extraction

produce a number of candidate regions, which with high probability contains


a license plate.
The purpose of this chapter is not to develop three competing methods,
since it is unlikely that one method will prove better in all cases. Instead the
goal is to have the strengths of each method complement the weaknesses of the
other methods and in this way improve the overall efficiency of the system.
The methods presented in this chapter are all based on several assumptions
concerning the shape and appearance of the license plate. The assumptions
are listed in the following:

1. The license plate is a rectangular region of an easily discernable color

2. The width-height relationship of the license plate is known in advance

3. The orientation of the license plate is approximately aligned with the


axes

4. Orthogonality is assumed, meaning that a straight line is also straight in


the image and not optically distorted.

The first two assumptions are trivially fulfilled, since the license plate is
bright white or yellow and the size is known (see Section 1.4). The last two
assumptions are based on the fact that the camera should always be aligned
with the road. This is necessary in order to reduce perspective distortion.
Several other objects commonly found in the streets of urban areas fit the
above description as well. These are primarily signs of various kinds. This
should be taken into account when finding locations for placing the camera
in the first place. Even if some of these objects are mistakenly identified,
this is not a great problem since later processing steps will cause these to be
discarded.
Throughout the chapter the same source image is used in all examples.
This image is shown in Figure 2.1. The methods have been developed using
36 training images, and they can all be found on the attached CD-ROM along
with 72 images used for testing.

2.2 Hough transform


This section presents a method for extracting license plates based on the Hough
transform. As shown in Figure 2.2 the algorithm behind the method consists
of five steps.

30
2.2 Hough transform

Figure 2.1: Source image

The first step is to threshold the gray scale source image. Then the resulting
image is passed through two parallel sequences, in order to extract horizontal
and vertical line segments respectively.
The first step in both of these sequences is to extract edges. The result is
a binary image with edges highlighted. This image is then used as input to
the Hough transform, which produces a list of lines in the form of accumulator
cells. These cells are then analyzed and line segments are computed.
Finally the list of horizontal and vertical line segments are combined and
any rectangular regions matching the dimensions of a license plate are kept as
candidate regions. This is also the output of the algorithm.

2.2.1 Edge detection


Before computing the edges, the source image is thresholded. The choice of
optimal threshold value is highly dependent on lighting conditions present
when the source images were taken. The value chosen in this project was
based on the training images.
The first step is to detect edges in the source image. This operation in effect
reduces the amount of information contained in the source image by removing
everything but edges in either horizontal or vertical direction. This is highly
desirable since it also reduces the number of points the Hough transform has
to consider.
The edges are detected using spatial filtering. The convolution kernels
used are shown in Equation (2.1). Since horizontal and vertical edges must be

31
Chapter 2. License plate extraction

Source image (gray scale)

Thresholding

Binary image

Horizontal edge detection Vertical edge detection

Binary image with

Abstraction level
edges highlighted

Hough transform Hough transform

Accumulator cells
(corresponding to lines)

Line segment extraction Line segment extraction

Horizontal and vertical


line segments

Candidate region extraction

Candidate regions

Figure 2.2: Overview of the Hough transform method

detected separately, two kernels are needed.


" #
−1 h i
Rhorizontal = Rvertical = −1 1 (2.1)
1

The choice of kernels was partly based on experiments and partly because
they produce edges with the thickness of a single pixel, which is desirable input
to the Hough transform.
Figure 2.3 shows the two kernels applied to the inverted thresholded source
image. The image is inverted to make is easier to show in this report. Since
the two filters are high pass filters, which approximates the partial derivatives
in either horizontal or vertical direction, a value of positive one in the resulting
image corresponds to a transition from black to white and a value of minus
one corresponds to the opposite situation. Since it is of no importance which
transition it is, the absolute value of each pixel is taken before proceeding.

32
2.2 Hough transform

Figure 2.3: Horizontal and vertical edges

2.2.2 Line detection using Hough transform


The Hough transform[4] is a method for detecting lines in binary images. The
method was developed as an alternative to the brute force approach of finding
lines, which was computationally expensive.
A brute force approach to finding all lines in an image with n points would
be to form all lines between pairs of points and then check if the remaining
points were located near the lines. Since there are roughly n(n − 1)/2 of these
lines and roughly n comparisons per line this is an O(n3 ) operation [4].
In contrast the Hough transform performs in linear time. The Hough trans-
form works by rewriting the general equation for a line through (xi , yi ) as:

yi = axi + b ⇔ (2.2)
b = −xi a + yi (2.3)

For a fixed (xi , yi ), Equation (2.3) yields a line in parameter space and a
single point on this line corresponds to a line through (xi , yi ) in the original im-
age. Finding lines in an image now simply corresponds to finding intersections
between lines in parameter space.
In practice, Equation (2.3) is never used since the parameter a approaches
∞ as the line becomes vertical. Instead the following form is used:

x cos θ + y sin θ = ρ (2.4)

In Equation (2.4) the θ parameter is the angle between the normal to the
line and the x-axis and the ρ parameter is the perpendicular distance between
the line and the origin. This is also illustrated in Figure 2.4. Also in contrast
to the previous method, where points in the image corresponded to lines in

33
Chapter 2. License plate extraction

parameter space, in the form shown in Equation (2.4) points correspond to


sinusoidal curves in the ρθ plane.

(0, 0) −π/2 0◦ π/2


y θ

θ1 ρ1
(x2 , y2 )
Hough transform

(ρ1 , θ1 )

(x1 , y1 )

ρ
x

Figure 2.4: Example image and corresponding Hough transform

As shown in Figure 2.4, the range of θ is ± π2 and for an image with a



resolution of w × h the range of ρ is 0 to w2 + h2 .
Figure 2.4 also shows two points, (x1 , y1 ) and (x2 , y2 ), and their correspond-
ing curves in parameter space. As expected, the parameters of their point of
intersection in parameter space corresponds to the parameters of the dashed
line between the two points in the original image.
In accordance with the preceding paragraph, the goal of the Hough trans-
form is to identify points in parameter space, where a high number of curves in-
tersect. Together these curves then correspond to an equal amount of collinear
points in the original image. A simple way to solve this problem is to quantize
the parameter space. The resulting rectangular regions are called accumulator
cells and each cell corresponds to a single line in the image.
The algorithm behind the Hough transform is now straight forward to de-
rive. First the accumulator array is cleared to zero. Then for each point in the
edge image iterate over all possible values of θ and compute ρ using Equation
(2.4). Finally for each computed (ρ, θ) value, increment the corresponding ac-
cumulator cell by one. Since the algorithm iterates over all points it is clear
that it performs in O(n) time.
For a general purpose implementation of the algorithm as outlined above,
choosing a higher resolution for the accumulator array yields more precise
results, but at the cost of increased processing time and higher memory re-
quirements. This is not the case for the implementation used in this project,
since a full sized accumulator array is never needed in practice. The reason

34
2.2 Hough transform

is that only horizontal and vertical lines are considered. This corresponds to
computing the Hough transform only in a small interval around either 0 or
π/2. Since only a single ‘column’ of the accumulator array is ever computed,
this vastly decreases both the memory requirements and processing time of the
algorithm.
After the algorithm has iterated over all points, the accumulator cells con-
tain the number of points which contributed to that particular line. Finding
lines is then a matter of searching the accumulator array for local maxima.

2.2.3 Line segment extraction


The Hough transform in its original form as proposed in [4] supports only line
detection as opposed to line segment detection. A slight modification of the
algorithm makes it possible to extract line segments.
The key is simply to remember which points in the original image con-
tributed to which lines. Instead of only storing the number of contributing
points in the accumulator cells, references to the points are also kept. The
downside to this procedure is that in general for an image with n points and
a resolution in θ of rθ , exactly n × rθ references must be stored. As mentioned
above, the transform is only computed for a single value of θ, so rθ is always 1
in this project, which makes this fact less of a problem.
Finding line segments is now a matter of iterating over each cell and search-
ing for line segments. This is done by first sorting all the points according to
their position along the line in question and then grouping points into line seg-
ments. Since it is known that the points are in order, grouping is most easily
done by iterating over all points and for each points checking whether the next
point in line is within some maximum distance. If it is, the line segment is
extended to include the new point, otherwise a new line segment is started. In
this project the maximum distance, or maximum gap length, was chosen as the
value which gave the best results for the training images. Also, as some of the
line segments are potentially very short, specifying a minimum line segment
length is advantageous and helps reduce computation in later steps.

Figure 2.5: Example license plate and reconstructed line segments

Figure 2.5 shows an ideal case, where the longest line segments from the

35
Chapter 2. License plate extraction

image have been extracted. The short line segments in the letters and in the
top part have been eliminated.
The first step is sorting the points along the line. This is done by first
finding the parameterized equation of the line. A cell with parameters (ρi , θi )
corresponds to a line with Equation (2.5).
" # " # " #
x cos(θi + π/2) ρi cos θi
= t+ (2.5)
y sin(θi + π/2) ρi sin θi
Computing t for each point and sorting after t is now trivial. The result of
this step is a list of line segments.

2.2.4 Candidate region extraction


The final step is to find the candidate regions, that is the regions believed
to contain license plates. The input to this step is a list of horizontal line
segments and a list of vertical line segments.
The procedure used is first to scan the lists for pairs of segments that meet
the following requirements:

1. They should start and end at approximately the same position.

2. The ratio between their average length and distance should equal that
of a standard license plate.

The resulting two lists of regions, one with horizontal and one with ver-
tical pairs, are then compared and any region not contained in both lists are
discarded. The remaining regions are the final candidate regions.

2.2.5 Strengths and weaknesses


This section discusses the advantages and disadvantages of the Hough trans-
form method. The strengths are listed in Table 2.1 and the weaknesses in
Table 2.2.

Strengths Explanation
Scaling invariant Since the algorithm does not look for regions
of particular size, it is invariant to scaling of
the license plate.

36
2.3 Template matching

Relatively independent As long as the license plate is brighter than


of license plate color the surroundings, the plate is usually cor-
rectly extracted.
Table 2.1: Strengths of the Hough transform method

Weaknesses Explanation
Trouble detecting verti- Vertical lines in the license plate are typi-
cal lines cally more than a factor four shorter than the
horizontal lines and thus more susceptible to
noise.
Finds more than just the All rectangular regions with dimensions
license plate equal to that of a license plate are identified,
which is sometimes many. This makes it dif-
ficult to choose the correct candidate later.
Table 2.2: Weaknesses of the Hough transform method

2.3 Template matching


The main philosophy behind extraction through template matching is, that by
comparing each portion of the investigated image to a template license plate,
the actual license plate in the image is found as the region bearing the most
resemblance to the template. A common way to perform this task is to use a
cross correlation scheme.

2.3.1 Template construction


The way the template is constructed plays a significant role in the success
of template matching. There are no rules for template construction but the
template must be constructed in such a way, that it has all the characteristics
of a license plate as it is presented in the source image.

Image preprocessing

When examining the source images in the training set, it is quite obvious that
there are two major differences in the plates. The plates vary in size because
of the fact that the cars are at different distances and the license plates vary in
light intensity due to different lighting conditions when the images are taken.
Both are issues that cannot be handled by the template construction. The
size issue cannot be helped at all. This observation might very well turn out

37
Chapter 2. License plate extraction

to be critical for this approach, but for now that fact will be ignored and the
attention turned to the lighting issue. This can be helped by proper image
preprocessing.
A bit simplified, the goal is to make the plates look similar. This can be
done by thresholding the image. The threshold value is not calculated dynam-
ically because the image lighting is unpredictable and the different locations
and colors of cars makes it impossible to predetermine the black (or white)
pixel percentage needed for dynamic thresholding. Therefore the value giv-
ing the best result in a number of test images is chosen. Figure 2.6 shows
two images, with very similar license plates, but different overall lighting, to
demonstrate, that a dynamic threshold is hard to accomplish.

Figure 2.6: Images with different lighting

Also, a license plate has an edge, which on the processed image will appear
to be solid black, and a bit thicker in the bottom. It is however not a good
idea to add this feature to the template. Even a small rotation of the license
plate means that the correlation would yield a big error.

Designing the template

A subtemplate for the letters can be constructed by adding layer after layer
containing the letters in the alphabet, and then merging the layers, giving
each layer the opacity of the probability of the letter. In the same way a
subtemplate can be made for the digits. Ignoring the diversity of the license
plates on the road today and combination restrictions, each letter is given
the same probability, as are the digits. Assuming that letters have the same
probability does not reduce the efficiency of the system. Doing so, we end up
with a gray-scale template as seen in Figure 2.7.

38
2.3 Template matching

Figure 2.7: Template created through opacity-merging of the probabilities. The left
hand side shows a letter template, and the right hand side a digit template

2.3.2 Cross correlation


One of the more common ways to calculate the similarities between two images
is to use cross correlation. Cross correlation is based on a squared Euclidean
distance measure[5] in the form:

XX
d2f,t (u, v) = [f (x, y) − t(x − u, y − v)]2 (2.6)
x y

Equation (2.6) is an expression for the sum of squared distances between


the pixels in the template and the pixels in the image covered by the template.
The value calculated represents that of pixel (u, v) in the correlated image as
shown in Figure 2.8. The pixels near the edges are ignored as the correlation is
only carried out when the entire template can be correlated. A correlation near
the edges would mean that a pseudo-pixel value in the image for the area under
the template exceeding the image would have to be assigned. This could for
instance be the average value of the rest of the region. This has not been done,
because a license plate exceeding the image is useless in terms of recognition,
so there is no point in spending computing power on these regions.

39
Chapter 2. License plate extraction

Origin u Source Image

Template

t(x−u,y−v)

Output Image

Figure 2.8: The layout of the cross correlation

Expanding the expression for the Euclidean distance, d2 , produces:


XX
d2f,t (u, v) = [f 2 (x, y) − 2f (x, y) · t(x − u, y − v) + t2 (x − u, y − v)] (2.7)
x y

Examining the expansion, it is noted that the term x y t2 (x − u, y − v)


P P

is constant, since it is the value of the sum of pixels in the entire template
squared.
P 2
Assuming that the term f (x, y) can be regarded as constant as well,
means that it is assumed that the light intensity of the image does not vary in
regions the size of the template over the entire image.
This is useful, because based on this assumption, the remaining term as
expressed in Equation (2.8) becomes a measure for the similarity between
the image and the template. This measure for similarity is called the cross
correlation.

XX
c(u, v) = f (x, y) · t(x − u, y − v) (2.8)
x y

The assumption for which the validity of the measure is based, is however
somewhat frail. The term x y f 2 (x, y) is only approximately constant for
P P

images in which the image energy only varies slightly. In most images this is
not the case. The effect of this is that the correlation value might be higher in
bright areas, than in areas where the template is actually matched. Also the
range of the measure is totally dependent on the size of the template.
These issues are addressed in the normalized cross correlation.

40
2.3 Template matching

2.3.3 Normalized cross correlation


The expression for calculating the normalized cross correlation coefficient,
γ(u, v), is shown in Equation (2.9). The way it handles the issue of bright areas
is by normalizing the measure, by dividing by the sum of the mean deviations
squared[5][6]. Without this normalization the range is dependent on the size
of the template, which seriously restricts the areas in which template matching
can be used. Also, the mean value of the image regions are subtracted, corre-
sponding to the cross covariance of the images[6]. Often the template will need
to be scaled to a specific input image, and without normalization the output
can take on any size depending on the scaling factor.

P P
x y [f (x, y)− f u,v ][t(x − u, y − v) − t]
γ(u, v) = qP P (2.9)
2
P P 2
x y [f (x, y) − f u,v ] x y [t(x − u, y − v) − t]

f u,v is the mean value of the image pixels in the region covered by the template,
and t is the mean value of the template. The value of γ lies between -1 and 1,
where -1 is the value of a reversed match, 1 when a perfect match occurs. The
value approaches 0 when there is no match.
As can be seen in the expression for the cross correlation coefficient (Equa-
tion (2.9)), it is a computationally expensive task. For each pixel in the output
image, the coefficient has to be calculated. Assuming an image of size M 2 , a
template of size N 2 and not including the normalization (only the numerator
of Equation (2.9)), the calculations involves approximately N 2 (M − N + 1)2
multiplications and the same number of additions [5]1 .

2.3.4 Strengths and weaknesses


Strengths Explanation
Single and simple simi- Template matching is a strong approach for
larity measure finding a single similarity measure between
two images. Through a simple off-the-page
algorithm this measure can be easily calcu-
lated with good results without an investi-
gation of the specifics of the region sought
after. Often, using a sample region will be
sufficient to identify similar regions.

1
For all the complexity evaluations it should be noted that these are estimates, and vary
depending on method of implementation.

41
Chapter 2. License plate extraction

Table 2.3: Strengths of the template method

Weaknesses Explanation
Slow algorithm A large input image and a smaller template,
will make performing the simple calculations
in the many nested summations a demanding
task.
Not invariant to rotation If the region sought after is rotated or dis-
and perspective distor- torted in the input image, the region may
tion very well bear little or no resemblance to
the template on a pixel by pixel basis. This
means the similarity measurement will fail.
Not invariant to scaling Scaling of input images proves to be an un-
surpassable problem. It is an impossible task
to examine the input image using all possible
sizes for the template, and even the smallest
variation in size will often lead to a wrong
result.
Static threshold The images vary a great deal in overall
brightness, depending on the surroundings
Table 2.4: Weaknesses of the template matching method

2.4 Region growing


This section will describe another method for finding uniform regions (such as
a license plate) in an image. The basic idea behind region growing is to identify
one or more criteria, that are characteristic for the desired region. Once the
criteria have been established, the image is searched for any pixels that fulfill
the requirements. Whenever such a pixel is encountered, its neighbors are
checked, and if any of the neighbors also match the criteria, both of the pixels
are considered as belonging to the same region. Figure 2.9 visualizes these
steps. The criteria can be static pixel values, or depend on the region that is
being expanded.

2.4.1 License plate features


When looking at a license plate, the most obvious feature that sets it apart
from its surroundings, is its brightness. Since most of all Danish license plates

42
2.4 Region growing

Get next pixel in image

No

Does pixel
fulfil requirements?

Yes

Grow to include neighbors

Yes

Expansion possible?

No

Mark found pixels as region

Figure 2.9: The steps of region growing

are made from a reflecting material (see Section 1.4), they will usually appear
brighter than the rest of the vehicle. Exceptions occur with white cars, but
in general, looking at the brightness is an effective way of finding possible
candidates for license plates. Of course, it has to be considered, that the
characters inside the plate are black. This is helpful when a license plate has
to be distinguished from e.g. other parts of a white vehicle. License plates also
have well defined dimensions.
Combining the two features above, all bright rectangular regions with a
certain height-width ratio should be considered candidates for license plates.

2.4.2 Preprocessing
Before searching through the image for any pixels bright enough to be part
of a license plate, the image is prepared by converting it to binary. This is
done, since color plays no role when looking for bright pixels. In performing
this conversion, the threshold value is critical as to whether or not the license
plate can be distinguished from its surroundings. On one hand, the threshold

43
Chapter 2. License plate extraction

should be low enough to convert all pixels from the license plate background
into white, on the other hand it should be chosen high enough to convert as
much of the other parts of the image as possible into black. Figure 2.10 shows
an example of such a binary image. The license plate appears as a bright
rectangle, but there are several other white regions.

Figure 2.10: Binary image

A problem arises, since the overall brightness of the images is not known in
advance, and therefore it is reasonable to select a relatively low threshold to
guarantee that the background of the license plates are always converted into
white. Then other criteria, such as the ratio mentioned above, will have to
help in selecting the most probable candidate for a license plate, from among
the list of white regions.

2.4.3 Extracting the regions


When using the region growing concept to extract regions with similar fea-
tures, the most intuitive approach is to use recursion. As described in Section
2.4, each pixel is compared to the criteria (in this case only brightness), and
whenever a pixel fulfills the requirements, all of its neighbors are also compared
to the criteria. This method requires, that all pixels that have been examined
are marked to prevent the same region from being extracted multiple times.
This can be achieved merely by setting the value of all previously found pixels
to black.
The nature of the region growing algorithm does not guarantee that a region
satisfies the condition of being a square with certain dimensions. There are
several solutions to this problem:

44
2.4 Region growing

A region is transformed into the largest contained rectangle within its


area.
This method is susceptible to noise, since a single black pixel along an
otherwise white line of pixels, could prevent the rectangle from reaching
the optimal size (see Figure 2.11).

Figure 2.11: Largest contained rectangle

A region is transformed into a rectangle, based on maximum and mini-


mum values of the pixels it contains.
This method reduces the effect of noise, but introduces the possibility
of mistaking arbitrary white forms for a white square. It guarantees,
however, that the entire license plate will be contained in a single region,
even though the plate is not completely horizontal (see Figure 2.12).

Figure 2.12: Rectangle from maximum and minimum values

Since the method of transforming a region into the largest contained rect-
angle is susceptible to noise, it is reasonable to assume that the second method
will prove to give the best results.
An enhancement to the algorithm can be achieved by setting a dynamic
criteria for when a neighbor pixel belongs to the same region. Instead of a

45
Chapter 2. License plate extraction

static threshold, dividing the picture in black and white pixels, the criteria
could be that the neighboring pixel must not differ more than a given margin
in brightness. This would mean that license plates partly covered in shadow
could be seen as a coherent region, but also introduces the risk of letting the
license plate region expand beyond the real license plate, if the border is not
abrupt.

2.4.4 Strengths and weaknesses


Tables 2.5 and 2.6 summarizes the most important characteristics of the region
growing method.

Strengths Explanation
Fast algorithm Each pixel is examined no more than once
for each neighbor. This implies an O(n) al-
gorithm.
Invariant to distance be- The method extracts candidates with the
tween camera and vehicle correct shape, it does not depend on size of
regions.
Resistant to noise The region is expanded to the largest possible
rectangle based on maximum and minimum
values.
Table 2.5: Strengths of the region growing method

Weaknesses Explanation
High demands for mem- The recursive nature of the algorithm stores
ory temporary results for each call to the recur-
sive function.
Static threshold The images vary a great deal in overall
brightness, depending on the surroundings.
Table 2.6: Weaknesses of the region growing method

2.5 Combining the method


In order to select which method to use and how to combine them, each method
must be evaluated against the demands set by the system.

46
2.5 Combining the method

2.5.1 Scale invariance


The size of the license plate in the image varies and therefore the extraction
method must be invariant to those changes. Hough transformation is only
concerned with the horizontal and vertical lines in the image, and the ratio
between them. Region growing finds a bright region and again it is basically
the ratio between width and height that is evaluated. Therefore both the
Hough transformation and the region growing method are, as they should be,
invariant to changes in license plate size.
Template matching is not scale invariant. Since template matching is based
on a pixel by pixel comparison, any change in size makes the result unpre-
dictable.

2.5.2 Rotation invariance


To reduce the number of needed calculations, the Hough transform was de-
signed to find horizontal and vertical lines, based on the assumption that the
license plate is aligned with the axes of the image. Because of this a license
plate slightly rotated will not be extracted.
Region growing finds the bright areas regardless of orientation, and a small
rotation does not change the height-width ratio noticeably. This means that
the method is capable of extracting rotated plates.
Template matching is also seriously affected by rotation. If the plate is
rotated even slightly, the outcome of the template matching is, as was the case
with scaling, unpredictable.

2.5.3 Lighting invariance


The surroundings and weather conditions makes it difficult to use any form
of dynamic thresholding, when converting the image into binary. The success
rate of both template matching and region growing suffers because of it. Hough
transform is however much less sensitive to changes in the lighting conditions
under which the picture was taken. The reason is, that while both template
matching and region growing thresholds the image with a predetermined value,
the Hough transformation is applied to an image, on which an edge detection
has been performed. As long as the lines in the image are detectable by the
edge detection algorithm, the Hough transformation does not care how bright
or dark the image is.

47
Chapter 2. License plate extraction

2.5.4 Summarizing the differences

Hough Region Template


Area transform growing matching
Scale invariance ✓ ✓ ✗
Lighting invariance ✓ ✗ ✗
Rotation invariance ✗ ✓ ✗
Table 2.7: Demands for license plate extraction method

It cannot be guaranteed, that the images are taken in the same distance
from each vehicles. Thus, the criteria must be invariant with respect to the
distance between camera and vehicle, or in other terms it is crucial that the
method is scaling invariant.
Due to the scaling problem template matching will not be able to extract
license plates single handedly. It might however be very useful as an extension
to the other methods examined in this chapter, because it will be able to aid
in the evaluation of the intermediate results.
In real life, the assumption that license plates are aligned with the axes
might not hold. It might not be possible to place the camera under optimal
conditions, resulting in perspective and/or rotation distortion. Therefore a
more robust system, with some invariance to these factors would be desirable.
Here the region growing method seems like a good choice. But, since the system
will be expected to function in all weather conditions, it would be preferable
that the method is as insensitive to changes in lighting conditions as possible.
The Hough transform has that quality to some extent.
Combining the two methods ideally makes sure the system finds the plate
under all conditions. Both methods will produce a set of possible regions,
which serves as input to a method which is to determine the most probable
candidate.

2.5.5 Candidate selection


As mentioned, the output from region growing and Hough transform consists
of a list of regions, of which one hopefully contains the license plate. In the
system, three steps are performed in order to select the best candidate.

Correlation
First correlation takes its turn to sort out regions, that bear no resem-
blance to license plates. This step sorts out regions such as bright areas
of the sky.

48
2.5 Combining the method

Peak-and-valley
This method is designed to sort out any uniform regions, such as pieces of
the road. It works by examining a horizontal projection of the candidate
region. In this projection, the numbers of sudden transitions from a high
number of black pixels to low, and vice versa, is counted. If this number is
below an experimentally determined threshold, the region cannot contain
a license plate.

Height-width ratio
Although some times a region with the license plate will contain some
edges around the actual plate, the ratio provides a mean of sorting out
a lot of regions, that could not possibly contain a license plate. If more
than one region passes through all three criteria, the one with the best
ratio is selected.

Figure 2.13: Selection of the most probable candidate

These methods have been designed to complement each other, so that var-
ious types of regions can be sorted out from true license plates. Figure 2.13
illustrates, how the three steps each sort out a certain type of regions. The

49
Chapter 2. License plate extraction

first region from the left will not pass a correlation test, since the black pixels
does not cover the correct areas. When resized, both regions 2 and 3 pass this
test, but region 2 is too uniform to pass the peak-and-valley test. Now only
region 3 and the real license plate are left, and the height-width ratio easily
determines which is the better alternative.
The performance of the combined methods will be examined in Section 5.4,
as well as a discussion of the order in which the methods should be applied.

2.6 Summary
This chapter introduced three methods for finding candidate regions for the
license plate. While region growing and Hough transform seem to be viable
algorithms, the correlation scheme suffers from a scaling problem, which pre-
vents it from being a real alternative. Also, it was demonstrated how the selec-
tion of the most probable candidate takes place, with three different methods
that complement each others weaknesses. One of these was correlation, which
proved to be a good discriminator, when sorting out regions that does not
contain a license plate.

50
Chapter 3
Character isolation

The seven characters


of the license plate
Identify characters
7 images with
the characters
Isolate characters
Subimage containing
only the license plate
Extract license plate

Single image

3.1 Introduction
To ease the process of identifying the characters, it is preferable to divide the
extracted plate into seven images, each containing one isolated character. This
chapter describes several methods for the task of isolating the characters.
Since no color information is relevant, the image is converted to binary
colors before any further processing takes place. Figure 3.1 shows the ideal
process of dividing the plate to seven images containing a character each.

51
Chapter 3. Character isolation

Figure 3.1: Isolating characters

3.2 Isolating the characters


There are several approaches for finding the character bounds, all with different
advantages and disadvantages. The following sections describes three different
approaches and how these methods are combined to provide a very robust
isolation.

3.2.1 Static bounds


The simplest approach is to use static bounds, assuming that all characters
are placed approximately at the same place on every license plate as shown in
Figure 3.2. The license plate is simply divided into seven parts, based upon
statistical information on the average character positions.

Figure 3.2: Example showing statically set character bounds

The advantage of this method is the simplicity, and that its success does
not depend on the image quality (assuming that the license plate extraction is
performed satisfactory). Its weakness is the fairly high risk of choosing wrong
bounds and thereby making the identification of the characters difficult. This
risk is directly proportional to the quality of the license plate extraction output.

52
3.2 Isolating the characters

The first character on the lowest plate in Figure 3.2 shows an example
output from the plate isolation, where a portion of the mounting frame was
included. Another weakness is that the method only separates the characters
instead of finding the exact character bounds.

3.2.2 Pixel count


Instead of using static bounds, a horizontal projection of a thresholded plate
often reveals the exact character positions, as demonstrated in Figure 3.3. The
approach is to search for changes from valleys to peaks, simply by counting the
number of black pixels per column in the projection. A change from a valley
to a peak indicates the beginning of a character, and vice versa.

Figure 3.3: Horizontal projections of the actual characters

Depending on the quality of the license plate extraction, and the success of
removing the frame, this method is very useful, since it is independent of the
character positions. The downside is that this method is very dependent upon
image quality and the result of the license plate extraction.

3.2.3 Connected components


Another method is to search for connected components in the image. The
easiest way to perform this task is on a binary image, using simple morphology
functions.
The method for extracting connected components in a binary image is to
search for a black pixel. When such a pixel is found, it it assumed that it is a
part of a component and therefore the basis for the an iterative process, based
on Equation (3.1) [4]
M \
Xk = (Xk−1 B) A k = 1, 2, 3, .... (3.1)
where Xk represents the extracted component, A is the source image and
B is a structuring element of size 3×3 indicating 8-connectivity neighboring.

53
Chapter 3. Character isolation

X0 is the first black pixel from where the iteration starts.


The algorithm starts by creating a new image X0 , only containing the
first black pixel from where the it begins, illustrated as the black structure
in Figure 3.4. In the first step of the iterative algorithm, X0 is dilated using
the structuring element B. This means that the area of interest is expanded
from the first black pixel, to all of its neighbors. The outcome X1 of the first
iteration is then the similarities from the dilation of X0 and the original image
A, found by an AN D operation. In other words, all the neighbors of the first
black pixel that are also black, belong to the component searched for.
In the next iteration X1 is dilated, and the result is the similarities between
the dilation and the original image. This iterative process continues until the
resulting component equals the previous, Xk = Xk−1 , meaning that resulting
component no longer differs from the previous. The first two iterations of the
process is illustrated in Figure 3.4.

Pixels in image with value ‘1’


Connected pixels found (X k)

Result of dilation

Structuring element B

Figure 3.4: An iteration in the process of finding connected components

When a connected component has been found, it is tested whether it fulfills


the size specifications for a character, and thereby sorting out unwanted parts,
for example bolts or dirt.
The advantages of this method is that it its independent of the image
rotation and that the bounds found are very precise. Its disadvantages are
that it requires a good image quality and a good conversion from the original
image to the binary image, to avoid making two or more characters appear as
one connected region. Small gaps in the characters can also cause the method

54
3.2 Isolating the characters

to fail.

3.2.4 Improving image quality


Variations in image quality and license plate conditions can affect all of the
methods mentioned previously. The following sections describe how the image
quality can be improved.

Isolating the plate

The image received from the extraction often contains more than just the
license plate, for example the mounting frame as in Figure 3.1. This frame can
be removed by further isolation of the actual license plate.
The isolation of the license plate from any superfluous background is per-
formed using a histogram that reflects the number of black pixels in each row
and in each column. In most cases, projecting the amount of black pixels both
vertically and horizontally reveals the actual position of the plate. An example
of the projections is shown in Figure 3.5.
A simple but effective method for removing the frame is based on the
assumption, that the vertical projection has exactly one wide peak created
by the rows of the characters. Therefore the start of the widest peak on the
vertical projection, must be the top of the characters, and the end of the peak
the bottom of the characters. It is expected that the horizontal projection has
exactly seven wide peaks and eight valleys (one before each character, and one
after the last character).

Figure 3.5: Horizontal and vertical projections

The success of this method depends on the assumption that the plate is
horizontal. In some cases the method will result in a small part of the frame
being left back in parts of the image, for example if the angle of the plate is
too large, although depending on the thickness of the remaining frame, it is
still possible to separate the characters.

55
Chapter 3. Character isolation

Dynamic threshold

If the original image is dark or the license plate is dirty, the binary image
created from the standard threshold value can be very dark and filled with
unwanted noise. There are various methods to eliminate this problem; the
threshold used when the original image is converted from color to binary, is
calculated dynamically based on the assumption that an ideal license plate
image on average contains approximately 69% white/yellow pixels and 31%
black pixels including the frame1 . The idea is first to make the image binary.
Second, the ratio between black and white pixels is calculated and compared
with the expected value. Then a new threshold value is selected and the
original image is converted again, until a satisfactory ratio has been achieved.
Although this method is affected when the edges of the plate have not been
cut of, it is far more reliable than setting a static threshold.

Removing small objects

A different approach is to remove the unwanted pixels from the binary image.
Knowing that we are looking for seven fairly large regions of black pixels, it
can be useful to process a number of erosions on the image before searching
for the bounds, and thereby removing unwanted objects, for example bolts.

3.2.5 Combined strategy


Static Pixel Connected
bounds count components
Extra Edges ✗ ✓ ✓
Bad threshold ✓ ✗ ✗
Noise ✓ ✓ ✗
Table 3.1: Strengths of different methods
To achieve the best possible isolation, a combined strategy can be used to
improve the probability of success. Table 3.1 shows how the three methods
complement each other. While static bounds are immune to threshold values
and image noise, it is very dependent upon whether all edges have been elim-
inated. On the other hand, when using pixel count the edges can easily be
distinguished from characters, but the threshold value is crucial to its success.
Finally, finding the connected components is susceptible to noise as well as a
bad threshold. However, the connected components are less likely to be dis-
turbed, since the weaknesses towards threshold and noise are rather small. The
1
Based on the mean value of black pixels, calculated from the training set images

56
3.2 Isolating the characters

table should demonstrate, that it is possible to combine the three methods, so


that none of the three potential difficulties remain a problem.
Figure 3.6 shows how the different methods are combined, and the method
is summarized below:

First convert the image into binary colors using a dynamic threshold to
achieve the best result

Try the connected-component method to see if seven good components


can be found

If the first attempt is unsuccessful, try to isolate the plate by cutting


away edges and try the connected components method again.

If still unsuccessful, try the pixel-count method to search for the character
bounds on the horizontal projection.

If this method also fails, use static bounds as the final option.

Input from license plate isolation

Dynamic threshold Cut away edges

Isolate using Segment using Segment using


connected components connected components pixel count method

No
Succes? Succes? Succes?
No No
Yes Yes Segment using
Yes static bounds

Output from character isolation

Figure 3.6: Combination of character isolation procedures

Connected components is always performed as the first attempt, since it


has a very good probability of success if the input image does not contain too
much noise in the form of edges or similar black regions. If the connected
components method fails, even after an attempt to cut away the edges, pixel-
count is performed as the first alternative. The static bounds are not utilized,
until all other options have been exhausted. The reason is, that the static
bounds are very reliant on the fact, that there are no vertical edges, and that
the plate is totally aligned with the axes.

57
Chapter 3. Character isolation

3.3 Summary
In this chapter several approaches to isolating the characters from an input
image containing the entire license plate, was described. None of the methods
were capable of providing reliable results on their own, due to the varying
input image quality, but a combination of the methods ensures a very robust
isolation scheme. The results of the combination can be seen in Section 6.8.

58
Chapter 4
Character identification

4.1 Introduction

The seven characters


of the license plate
Identify characters
7 images with
the characters
Isolate characters
Subimage containing
only the license plate
Extract license plate

Single image

After splitting the extracted license plate into seven images, the character
in each image can be identified. Identifying the character can be done in a
number of ways. In this chapter, methods for this will be addressed.
First, a solution based on the previously discussed template matching (Sec-
tion 2.3), will be presented. Thereafter, a method based on statistical pattern
recognition will be introduced, and a number of features used by this method
will be described. Then, an algorithm for choosing the best features called
SEPCOR, will be presented. After finding a proper set of features, means of
selecting the most probable class is necessary. For this purpose, Bayes decision

59
Chapter 4. Character identification

rule will be introduced, along with theory on discriminant functions and pa-
rameter estimation. Finally the methods will be compared against each other
and it will be examined if there is anything to be gained from combining the
two.

4.2 Template matching


This section presents, how the isolated characters can be identified by calcu-
lating the normalized correlation coefficient. Previously it was investigated
how template matching could help in the extraction of the license plates from
the input images, and therefore the theory behind template matching can be
reused. The conclusion then, was that it was unable to perform the task by
itself, but was useful in combination with other methods. Searching through
a large image with a small template could mean running the algorithm hun-
dreds of thousands of times, whereas if a single similarity measurement of two
same-sized images is needed, the algorithm would have to run only once.
The idea behind an implementation of a correlation based identification
scheme is simple. Two template pools, one consisting of all the possible values
for the letters, and one of all the values of the digits, are constructed. Once
the license plate has been cut into the seven characters, each image containing
a single character is evaluated in turn in the following way. The normalized
correlation coefficient between the image of the character and the appropriate
template pool is computed. The template that yields the highest coefficient
indicates what character is depicted in the input images.

4.3 Statistical pattern recognition


Having looked at identifying characters using template matching, the focus
is now turned to statistical pattern recognition. Several methods for pattern
recognition exist, but one of the more classical approaches is statistical pattern
recognition, and therefore this approach has been chosen for identifying the
characters. For simplicity, the focus will only be on the images containing
digits, but the methods are completely similar for the letters.
When using statistical pattern recognition, the first task is to perform fea-
ture extraction, where the purpose is to find certain features that distinguish
the different digits from each other. Some of the features that are found might
be correlated, and an algorithm called SEPCOR can therefore be used to min-
imize the number of redundant features. When an appropriate number of

60
4.4 Features

uncorrelated features have been found, a way of deciding which class an ob-
servation belongs to, is needed. For this purpose Bayes decision rule will be
introduced, and finally an expression for the classification will be proposed.

4.4 Features
All of the features are extracted from a binary image because most of the
features require this. The area and the circumference of the digits are the
most simple features to distinguish. These features are not sufficient, because
there are different digits with approximately the same area and circumference,
e.g. the digits ‘6’ and ‘9’. To distinguish in such cases, the number of endpoints
in the upper half and lower half of the image are taken into account. Here it
is assumed that the digit ‘6’ has one endpoint in the upper half and zero
endpoints in the lower half, and the endpoints of the digit ‘9’ are the reverse
of a ‘6’. To distinguish between the digits ‘0’ and ‘8’, it is not possible to use
the endpoint feature because they both have zero endpoints. Here the number
of connected compounds is an appropriate feature. The number of compounds
in the digit ‘8’ is three, whereas a ‘0’ has only two compounds.
Furthermore the value of each pixel is chosen as a feature. A final feature
is the area of each row and column, meaning the number of black pixels in
each of the horizontal or vertical lines in the image.
The features are listed below:

Area

Area for each row

Area for each column

Circumference

End points in upper/lower half

Compounds

The value of each pixel

Most of the features require that the characters have the same size in pixels,
and therefore the images with the characters are initially normalized to the
same height. The height is chosen because there is no difference between the
height of the characters, whereas the width of the different digits differ, e.g.
a ‘1’ and an ‘8’ have the same height but not the same width. The following
describes the methods for extracting each feature.

61
Chapter 4. Character identification

4.4.1 Area
When calculating the area of a character, assuming background pixels are white
and the character pixels are black, the number of black pixels are counted.
In area for each row the number of black pixels in every horizontal line are
counted. As seen in Figure 4.1, these vertical projections are distinct for many
of the digits. The horizontal projections are more similar in structure, with
either a single wide peak, or a peak in the beginning and end of the digit.
Although more similar they will still be used to distinguish between the digits.

Figure 4.1: Projections of all the digits

4.4.2 End points


The number of end points is also a feature that distinguishes some of the digits
from each other. It is also clear that the locations of these end points vary,
and this can help in distinguishing between the digits. A ‘6’, for instance, has
only a single end point in the upper half, and none in the lower. Although
the positions also vary horizontally, there would be problems when deciding
whether the end points of a ‘1’ are in the left or the right half of the image.
Therefore, only the vertical position of the end points is considered in the
implementation of the system.
The number of end points is found from a skeleton of the character. To
obtain the skeleton a thinning algorithm is used, and the result is a one pixel
wide skeleton. The end points of the character are the pixels that have only
one connected 8-neighbor.
The thinning algorithm [4] iteratively deletes1 the edges of the character,
until a one pixel wide skeleton is reached. The algorithm must fulfill that it
does not delete end points and that it does not break connections. Breaking
connections will result in additional end points, and therefore this is quite
an important demand to the algorithm. In order to get the skeleton of the
1
In this situation deleting should be understood as setting the pixel to the background
value.

62
4.4 Features

character, the algorithm is divided into two parts. The first part deletes edges
in east, south or the northwest corner. The second part deletes edges in west,
north or the southeast corner.
During the processing of the two parts, pixels that satisfy the conditions
listed below are flagged for deletion. The deletion is not applied until the
entire image has been processed, so that it does not affect the analysis of the
remaining pixels.
The first part of the algorithm deletes pixels if all of the following conditions
are satisfied.

(a) 2 ≤ N (p1) ≤ 6

(b) Z(p1) = 1

(c) p2 · p4 · p6 = 0

(d) p4 · p6 · p8 = 0

N (p1) is the number of neighbors (depicted in Figure 4.2) of p1 with the


value one, this condition ensures that end points (N (p1) = 1) and non-border
pixels (N (p1) > 6) are not deleted. Z(p1) is the number of 0-1 transitions
in the ordered sequence of p2, p3, p4, . . . , p8, p9, p2. An example of this can
be seen in Figure 4.3, where Z(p1) = 2, it is a one pixel wide connection
and in this situation p1 should not be deleted because this would break the
connection, and because of condition (b) pixel p1 is not deleted. Conditions (c)
and (d) satisfy that the pixel is placed in east, south or the northwest corner,
see the gray pixels in Figure 4.4b.

p9 p2 p3

p8 p1 p4

p7 p6 p5

Figure 4.2: The neighbors of p1

The second part of the algorithm deletes pixels if condition (a) and (b)
combined with (c’) and (d’) are satisfied.

(c’) p2 · p4 · p8 = 0

(d’) p2 · p6 · p8 = 0

63
Chapter 4. Character identification

0 1 0

0 p1 0

0 1 0

Figure 4.3: Example where the number of 0-1 transitions is 2, Z(p1) = 2.

Condition (c’) and (d’) satisfy that the pixel is placed in west, north or the
southeast corner, see the gray pixels in Figure 4.4c.
An example of using the thinning algorithm is depicted in Figure 4.4. The
gray pixels in are those marked for deletion. a) shows the original images, b) is
the result of applying part 1 one time and c) is the output of using part 2 once,
d) is part 1 once more, and e) is the final result of the thinning algorithm.

a b c d e

Pixels in image with value ‘1’


Pixels marked for deletion

Figure 4.4: The steps of the thinning algorithm

One iteration of the algorithm consists of:

1. Applying part one, and flag pixels for deletion.

2. Delete flagged pixels.

3. Applying part two, and flag pixels for deletion.

4. Delete flagged pixels.

The algorithm terminates when no further pixels are flagged for deletion.
When using the thinning algorithm even small structures are present in the
final skeleton of the image, an example of this can be seen in Figure 4.5.

64
4.4 Features

Connection point

Figure 4.5: Result of the thinning algorithm

Normally a ‘0’ does not have any end points, but in this case it gets one end
point. The way of finding end points is therefore slightly modified. Structures
that are less than 3 pixels long before reaching a connection point are not taken
into account. Then the digit ‘0’ in Figure 4.5 has no end points, which was
expected.

4.4.3 Circumference
Another possible feature to distinguish between characters is their circumfer-
ence. As it can be seen in Figure 4.6, a ‘2’ has a somewhat larger circumference
than a ‘1’.

100 174

Figure 4.6: Two digits with different circumferences. The circumference is shown below

To find the circumference of a character, the pixels on the outer edge are
counted. This is done by traversing the outer edge of the character, and count-
ing the number of pixels, until the start position has been reached. The cir-
cumference is very dependent upon the selected threshold value, but relatively
resistant to noise.

4.4.4 Compounds
Each character consists of one or more connected compounds. A compound is
defined as an area that contains pixels with similar values. Since the image is
binary, a compound consists of either ones or zeros. The background is only

65
Chapter 4. Character identification

considered a compound if it appears as a lake within the digit. Figure 4.7


shows that a ‘0’ has two compounds, while a ‘1’ has only one.

1 2 1

(a) (b)

Figure 4.7: (a) shows the two compounds in a ‘0’, and (b) the one in a ‘1’

To find the number of compounds, a Laplacian filter is applied to the


character. The Laplacian operator is defined as [4]:

∂2f ∂2f
L[f (x, y)] = + (4.1)
∂x2 ∂y 2
This leads to the following digital mask:

0 1 0
1 −4 1
0 1 0

Convolving the original binary image of the character with this mask, the
result is an image with black background, and the edges represented by a one
pixel thick white line, see Figure 4.8.

(a) (b)

Figure 4.8: Result of applying a Laplacian filter to a ‘3’. (a) is the original and (b), is
the resulting image.

The number of compounds is then found by counting the number of con-


nected white lines. Since the Laplacian filter is very sensitive to noise[4], white
lines with lengths below a certain threshold are discarded. As alternative
method for finding connected compounds, the method used when isolating the
characters could also be used. The alternative method was described in Section
3.2.3.

66
4.5 Feature space dimensionality

4.4.5 The value of each pixel


The value of each pixel is used as a feature. In this way, the feature based
identification resembles correlation based identification. When the Euclidean
distance between a sample and a class is calculated, the difference is, that
in the template based identification the correlation measure is normalized,
which it is not in the feature based identification. Additionally, in the feature
based identification more features are being used, such as the number of end
points. In both cases, the distance is measured from a mean calculated from
the training set.

4.5 Feature space dimensionality


Having introduced a number of features, it is interesting to see, how they
complement each other. If two features are redundant, one of the features may
be removed without decreasing the efficiency of the system.
For this purpose, the SEPCOR algorithm is presented. SEPCOR is an
abbreviation of SEParability and CORrelation, and has the purpose of reducing
dimensionality of the feature space, with the least possible loss of information.
Before explaining the steps of the SEPCOR algorithm, a measure of vari-
ability is defined:

variance of class mean values


V = (4.2)
mean value of class variances
This V-value says something about the different features. A higher V-value
means a better and more distinct feature. This is because the variance of mean
values tells how far the mean values of the different classes are from each other.
The further away they are from each other, the easier they are to distinguish,
thus the numerator is large. The mean value of the variances is a measure of
variances of a given class. If this is a small value, the class is not spread over a
wide area, and thus it is easier to distinguish, hence the denominator would be
small. This description makes it clear that the bigger a V-value, the better the
feature is. Knowing the V-value of all the features, the SEPCOR algorithm
can be carried out.
The steps of the SEPCOR algorithm are:

1. Create a list of all features, sorted after their V-value.

2. Repeat:

(a) Remove and save the feature with the largest V-value.

67
Chapter 4. Character identification

(b) Calculate the correlation coefficient between the removed feature


and the remaining features in the list.
(c) Remove all features with a correlation greater than some threshold.

This algorithm runs until a desired number of features have been removed
from the list, or the list is empty.
Calculating the correlation coefficient as mentioned in step 2.b is done as
shown in Equation (4.3).

σij
c =| √ | (4.3)
σii · σjj
The correlation coefficient is actually the normalized correlation coefficient
as presented in Section 2.3.2. Here the σij simply corresponds to the (i, j)
entry in the covariance matrix, which is the covariance between feature i and
j, and σii is the variance of feature i (see Section 4.6).

4.5.1 Result using SEPCOR


The number of features picked from using the SEPCOR algorithm depends
on how the threshold for the correlation coefficient, c, is set. This threshold
is hereafter mentioned as the maximum correlation. Its range is from 0 to 1,
where a threshold of 1 would yield the maximum number of features, since
even features that correlate completely will be used. Figure 4.9 shows how the
number of selected features rises, as the maximum correlation approaches 1.
When a maximum correlation of 1 is used, 306 features are picked. This is the
maximum number of features that can be picked, and includes all the features
mentioned in Section 4.4. The number of features to be used to identify the
characters will be discussed in Chapter 7.

4.6 Decision theory


After having acquired a sample in the form of a feature vector with values for
all the previously described features, the sample must be classified as belonging
to one the ten different classes. To do this, a statistical classifier is used. In
this chapter, Bayes classifier will be presented and evaluated as a discriminant
function used to divide the feature space.
The use of the Mahalanobis distance measure as an alternative to the Eu-
clidean distance is based on the assumption, that the data set is normally dis-
tributed. Why this is a necessary assumption will be discussed, and whether or

68
4.6 Decision theory

300

250
Training set
Test set

200
Number of features

150

100

50

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Correlation threshold

Figure 4.9: Number of features selected as a function of the maximum correlation coef-
ficient.

not the data set used in this project is normally distributed will be investigated
by making various tests.
When using the Bayes classifier it is necessary to know the class conditional
probability density functions. Since the distribution of the data is assumed
normal, this means that the parameters of this particular distribution has to
be estimated. A method for doing this is also introduced.

4.6.1 Bayes decision rule


First, Bayes classifier is described:

p(x̄ | ωj ) · P (ωj )
P (ωj | x̄) = (4.4)
p(x̄)
in which s
X
p(x̄) = p(x̄ | ωj ) · P (ωj ) (4.5)
j=1

x̄ is the feature vector of the sample being investigated. A finite set of classes,
s is defined as Ω = {ω1 , ω2 , . . . , ωs }. In this case, the classes represent the
different characters that can occur on a license plate. Similar, a set of actions
is declared as A = {α1 , α2 , . . . , αs }. These actions can be described as the

69
Chapter 4. Character identification

event that occurs when a character is determined to be e.g. of class ω2 . This


implies assigning the correct character value to the given sample. The loss by
performing the wrong action, meaning performing αi when the correct class is
ωj is then denoted by λ(αi | ωj ). The total loss by taking the wrong action
can then be described as:
s
X
R(αi | x̄) = λ(αi | ωj ) · P (ωj | x̄) (4.6)
j=1

where P (ωj | x̄) is the probability that ωj is the true class given x̄.
This can be used when having to make decisions. One is always interested in
choosing the action that minimizes the loss. R(αi | x̄) is called the conditional
risk. What is wanted is a decision rule, α(x̄), that determines the action to be
taken on a given data sample. This means, that for a given x̄, α(x̄) evaluates
to one of the actions in A. In doing so, the action that leads to the smallest
risk should be chosen, in accordance with Equation 4.6.

Minimum-error-rate classification

As already stated, it is desirable to choose the action that reduces the risk the
most. It can also be said, that if action αi corresponds to the correct class ωj ,
then the decision is correct for i = j, and incorrect for i 6= j. The symmetrical
loss-function is built on this principle:

 0 i=j

λ(αi | ωj ) = i, j = 1, . . . , s (4.7)

 1 i 6= j

If the decision made is correct, there is no loss. If it is not, a unit loss is


achieved, thus all errors cost the same. The average probability of error is the
same as the risk of the loss function just mentioned, because the conditional
risk evaluates to:
R(αi | x̄) = 1 − P (ωi | x̄) (4.8)
where P (ωi | x̄) is the conditional probability that αi is the correct action[9].
So, in order to minimize the probability of error, the class i that maximizes
P (ωi | x̄) should be selected, meaning that ωi should be selected if P (ωi | x̄) >
P (ωj | x̄), when i 6= j. This is the minimum-error-rate classification.

4.6.2 Discriminant functions


Classifiers can be used to divide the feature space into decision regions. One
way of partitioning is the use of discriminant functions, where a discriminant

70
4.6 Decision theory

function gi (x̄) for each class is defined. Then the classifier assigns vector x̄ to
class ωi if: gi (x̄) > gj (x̄) for all j 6= i. An example of dividing the feature space
into decision boundaries can be seen in Figure 4.10, which accomplishes the
minimum error-rate.

Figure 4.10: Decision regions

The feature vector x̄ is assigned to the class corresponding to the region in


which the vector is placed.
A Bayes classifier can also be represented in this way, where gi (x̄) = P (ωi |
x̄). The maximum discriminant function will then correspond to the a posteri-
ori 2 probability, see Equation 4.4. When looking at the a posteriori probability
the discriminant function can also be expressed as: gi (x̄) = p(x̄ | ωi )P (ωi ), be-
cause the denominator is a normalizing factor and therefore not important in
this context. Bayes classifier is therefore simply determined by p(x̄ | ωi )P (ωi ).
The most commonly used density function is the multivariate normal den-
sity function, because of its analytical qualities [9]. The univariate normal
density function is given by:
" µ ¶2 #
1 1 x−µ
p(x) = √ exp − (4.9)
2π σ 2 σ
Which is specified by two parameters; the mean µ and the variance σ 2 , or
simply writing p(x) ∼ N (µ, σ 2 ). Normal distributed samples tend to cluster
about the mean within 2σ. In the multivariate normal density function, the
mean is instead represented as a vector, and the variance is represented as a
matrix. The multivariate normal density function is given by:
· ¸
1 1 ¯ −1 (x̄ − µ̄)
p(x̄) = ¯ |1/2 exp − (x̄ − µ̄)t Σ̄ (4.10)
(2π)d/2 | Σ̄ 2
where x̄ is a column vector with d entries, µ̄ is the mean vector also with
¯ is a d × d matrix called the covariance matrix. Equation 4.10
d entries, and Σ̄
2
Knowledge not known in advance; knowledge based on observations

71
Chapter 4. Character identification

¯ ), where the mean and the covariance is


can be abbreviated as p(x̄) ∼ N (µ̄, Σ̄
calculated by:

µ̄ = E[x̄] (4.11)

¯ = E[(x̄ − µ̄)(x̄ − µ̄)t ]


Σ̄ (4.12)

The covariance is calculated for each component in the vectors; xi corre-


sponds to the i’th feature of x̄ and µi is the i’th component of µ̄, and σij is the
¯ . The entries in the mean and in the covariance matrix
i-j’th component of Σ̄
is calculated from:

µi = E[xi ] (4.13)

σij = E[(xi − µi )(xj − µj )] (4.14)

And when having n samples in the observation.

n
X xik
µi = (4.15)
k=1
n

n
X (xik − µi )(xjk − µj )
σij = (4.16)
k=1
n−1

In the covariance matrix the diagonal, σii , is the variance of feature i and
the off-diagonal, σij , is the covariance of feature i and j. Samples drawn from
a normal distributed population tend to gather in a single cloud. The center
of such a cloud is determined by the mean vector, and the shape is dependent
of the covariance matrix. The interpretation of the covariance matrix and the
shape of the clouds can be divided into five different cases. For simplicity the
examples have only two features.

Case 1 " # " #


¯= σ11 σ12 1 0
Σ̄ = (4.17)
σ21 σ22 0 1

If the covariance matrix equals the identity, the cloud is circular shaped,
which is depicted in Figure 4.11, where the axes represent the two features.

72
4.6 Decision theory

Figure 4.11: The covariance matrix equals the identity.

Figure 4.12: The features are statistically independent.

Case 2 and 3 " #


¯= σ11 0
Σ̄ (4.18)
0 σ22

If all the off-diagonal entries are 0, then the features are uncorrelated and
thus statistically independent, and the cloud is formed as an ellipse where
the axes are the eigenvectors of the covariance matrix and the length is the
corresponding eigenvalue3 . In these cases the ellipse is aligned with the feature-
axes. If σ11 > σ22 the major axis is parallel with the axis of feature one, and
if σ11 < σ22 the minor axis is parallel with feature one. Figure 4.12 shows the
two cases.

Case 4 and 5 " #


¯= σ11 σ12
Σ̄ (4.19)
σ21 σ22

If σ12 = σ21 is positive, it is called positive correlation. An example of this


could be the features area and circumference, if the area increases, the cir-
cumference usually also increases, and therefore there is a positive correlation
between the two features. Negative correlation is the opposite, and σ12 = σ21
3
In a n × n matrix there are n eigenvectors and n eigenvalues

73
Chapter 4. Character identification

Positive correlation Negative correlation

Figure 4.13: Positive and negative correlation.

is negative. The orientation of the clouds in the two situations are depicted in
Figure 4.13.
In the case of multiple features the clouds are hyper-ellipsoids instead,
where the principal axes again are given by the eigenvectors, and the eigenval-
ues determine the length of these axes. In the hyper-ellipsoids the quadratic
form:

¯ −1 (x̄ − µ̄)
r2 = (x̄ − µ̄)t Σ̄ (4.20)
is constant and sometimes called the squared Mahalanobis distance from x̄
to µ̄ [9].
To achieve the minimum error-rate, the classification can be achieved by
the use of discriminant functions for the normal density. Since the discriminant
equal function: gi (x̄) = p(x̄ | ωi )P (ωi ), it can be rewritten as:

gi (x̄) = log p(x̄ | ωi ) + log P (ωi ) (4.21)


When combining this with the normal density function, under the assump-
¯ ) then [9]:
tion that p(x̄ | ωi ) ∼ N (µ̄i , Σ̄i

1 ¯ −1 (x̄ − µ̄ ) − d log 2π − 1 log | Σ̄


¯ | + log P (ω ) (4.22)
gi (x̄) = − (x̄ − µ̄i )t Σ̄i i i i
2 2 2
Where d2 log 2π is a class independent normalizing factor, log P(ωi ) is the
¯ | is the determinant of the covariance matrix. The
class probability and | Σ̄i
appearance of the digits on the license plate are assumed to have the same
probability, meaning P(ωi ) is the same for all classes, and log P(ωi ) therefore
is not necessary.
In this project, when calculating the determinant of the covariance matrix,
it often nearly equaled zero because the matrix was ill-conditioned meaning
that the condition number is too large. This can be seen by calculating the

74
4.6 Decision theory

condition number for the matrices, which for all matrices was greater than
1020 . The condition number of a matrix A, is defined as the product of the
norm of A and the norm of A−1 .
Under the assumption that the classes have the same probability and the
determinant of the covariance matrix nearly equals zero, the classification is in-
stead done simply on basis of the squared Mahalanobis distance, as represented
in Equation (4.20).
As part of computing the squared Mahalanobis distance, the inverse covari-
ance matrix must be computed. As stated above this was made difficult by
the fact that these matrices were often ill-conditioned and traditional methods
for computing the inverse matrix fails in these special cases.
An alternative to computing the traditional inverse is to compute the
pseudo-inverse. The pseudo-inverse is defined in terms of singular value de-
composition and extends the notion inverse matrices to singular matrices[10].
As an alternative distance measure, the Euclidean distance can be used.
The squared Euclidean distance is defined as[4]:

d2 = (x̄ − µ̄)t (x̄ − µ̄) (4.23)

When using the Euclidean distance, the inverse covariance matrix is replaced
with the identity matrix. This means, that unlike Mahalanobis distance, the
Euclidean distance is not a weighted distance.

4.6.3 Test of normal distribution


In order to use the discriminant function method presented in the previous
section, it is necessary to test whether or not the observations are normal dis-
tributed, because the method requires data to be so. Three methods have been
used to determine if data is normal distributed. First, plotting the observa-
tion in a histogram and then visually determine if it is approximately normal
distributed. Second, Matlab can be used to display a normal probability plot
of the data, again with visual inspection used to determine the similarity with
the normal distribution. The third method is a goodness of fit test, where
a value is calculated to determine if data is normal distributed. In this sec-
tion the circumference of the number ‘0’ is used as an example for the three
methods. Ideally all features should be investigated, and those that are not
approximately normal distributed should be ignored.
The histogram plot is depicted in Figure 4.14, and it is seen that the graph
approximately resembles a normal distribution.

75
Chapter 4. Character identification

Figure 4.14: Histogram plot showing approximately a normal distribution

The Matlab plot displays true normal distributions as linear curves, and so
it is expected, that the data from the circumference is approximately linear.
Figure 4.15 shows, that this is almost the case.

Normal Probability Plot

0.98

0.95

0.90

0.75
Probability

0.50

0.25

0.10

0.05

0.02

275 280 285 290 295 300 305 310 315 320
Data

Figure 4.15: Normal plot produced in Matlab

The goodness of fit test can be based on the χ2 (chi-square) distribution [1].
The observations made in a single class, e.g. the number ‘0’, are then grouped
into k categories. The observed frequencies of sample data in the k categories
are then compared to the expected frequencies of the k categories, under the
assumption that the population has a normal distribution. The measure of the
difference between the observed and the expected in all categories is calculated

76
4.6 Decision theory

by:

k
2
X (oi − ei )2
χ = (4.24)
i=1
ei
where oi is the observed frequency for category i, ei is the expected fre-
quency for category i, and k is the number of categories.
The χ2 -test requires the expected frequencies to be five or more for all
categories.
When using the χ2 -test to check whether or not the data is normal dis-
tributed, data is assumed to be defined by the mean µ and the variance σ 2 .
The null and alternative hypotheses are based on these assumptions.
H0 : The population has a normal probability distribution, N (µ, σ 2 ).
Ha : The population does not have a normal probability distribution.
The rejection rule:
Reject H0 if χ2 > χ2α
where α is the level of significance and with k-3 degrees of freedom.
When defining the categories, the starting point is the standard normal
distribution, N(0,1), and the strategy is then to define approximately equally
sized intervals. Here k = 6 categories are created, with the limits for the
intervals defined as ]-∞;-1], ]-1;-0.44], ]-0.44;0], ]0;0.44], ]0.44;1] and ]1;∞[.
The intervals and the area of probability are depicted in Figure 4.16. The
degrees of freedom can then be calculated as Df reedom = k − 3 = 3.

Figure 4.16: Standard normal distribution, with the 6 categories and the corresponding
areas of probability

The observations made from the training set are used to estimate the mean
circumference, µ, and the standard deviation s, of the normal distribution.
P
xi 11360
µ= = = 298.95 (4.25)
n 38

77
Chapter 4. Character identification

rP r
(xi − µ)2 3977.87
s= = = 10.37 (4.26)
n−1 37
The limits for the categories can then be calculated on the basis of the
defined intervals, the mean and the standard deviation.

x1 = 298.95 + (−1) · 10.37 = 288.58


x2 = 298.95 + (−0.44) · 10.37 = 294.39
x3 = 298.95 + (0) · 10.37 = 298.95
x4 = 298.95 + (0.44) · 10.37 = 303.51
x5 = 298.95 + (1) · 10.37 = 309.32

The observed frequency of data, oi , is summed, and the expected frequen-


cies, ei , for each interval is computed by multiplying the sample size with
the corresponding area of probability, in this case (0.1587, 0.1713, 0.17, 0.17,
0.1713, 0.1587). The observed and expected frequencies can be seen in Table
4.1. The χ2 test statistic can then by computed as stated in Equation (4.24).

Interval 0→288 289→294 295→298 299→303 304→309 ≥310


Observed (o) 4 6 10 6 6 6
Expected (e) 6.03 6.51 6.46 6.46 6.51 6.03
(o - e) -2.03 -0.51 3.54 -0.46 -0.51 -0.03
Table 4.1: Class distribution

k
2
X (oi − ei )2
χ = = 2.74 < χ20.10 = 5.251
i=1
ei

The null hypothesis, H0 , will not be rejected with a 10% level of significance,
and with a 3 degrees of freedom.
The results for the other features, were not always as close to a normal
distribution as desired. A feature such as the number of compounds is very
stable, and does not have any variance whatsoever. Other features have seem-
ingly random distributions, which to a high extent could be caused by the
relatively small amount of training data.
Although some of the features are not normal distributed, the majority are.
Therefore, in this project it is assumed that all features are normal distributed
and hence we can apply Bayes decision rule. Section 4.6.4 will discuss methods
for estimating the parameters that fit the best, under the assumption that the
data is normal distributed.

78
4.6 Decision theory

4.6.4 Parameter estimation


In the general case of using Bayes classification it is necessary to know both
the a priori probabilities as well as the class conditional probability density
functions. The problem is not the a priori probabilities, since these are usually
easy to obtain, but instead the class conditional probability density functions.
In practice, exact specifications of these functions are never directly available.
Instead, what is available is a number of design samples which should be used
for training the system plus evidence that these samples has some known para-
metric distribution.
It is assumed that the a priori probabilities of all the classes are equal.
This reduces the problem of finding the class conditional probability density
functions to a matter of estimating the parameters of a multivariate normal
distribution, that is the mean, µ̄ and the square covariance matrix Σ̄ ¯.
One method for parameter estimation is investigated. This method is called
maximum likelihood estimation and takes a number of preclassified sets of
samples X0 . . . X9 as input. These samples are images of each of the numbers
from the ten different classes. Since the sets are preclassified, this procedure
is a kind of supervised learning.

Maximum likelihood estimation

Given the design samples described above, maximum likelihood estimation


works by estimating the ten sets of parameters, which maximize the probability
of obtaining the design samples. If the ten sets of parameters are the vectors
θ̄0 . . . θ̄9 each with d = f + f 2 elements for f features, corresponding to a
mean vector and the covariance matrix, then the class conditional probability
density function, which is to be maximized, can be written as being explicitly
dependent on the parameter vectors as p(x̄ | ωj ; θ̄j ).
Considering each class separately reduces the problem to estimating the
parameters of ten independent distributions. If for instance the sample set X
consists of the n samples X = {x̄1 , x̄2 , . . . , x̄n }. Then since the samples were
drawn independently, the probability of the sample set given a fixed parameter
vector θ̄ can be expressed as the product[9]:

n
¯
Y
p(X | θ̄) = p(x̄k | θ) (4.27)
k=1

Equation (4.27) viewed as a function of θ̄ is called the likelihood of θ̄.


The global maximum of this function, say θ̂, is the value which maximizes

79
Chapter 4. Character identification

Equation (4.27) and which makes p(x̄ | ωj ; θ̄j ) the best fit for the given sample
set.

n
¯ =
X
l(θ̄) = log p(X | θ) log p(x̄k | θ̄) (4.28)
k=1

If θ̄ = {θ1 , θ2 , . . . , θd }t then Equation (4.28) is a function of exactly d vari-


ables4 . Finding the critical points analytically can now be done by locating
the points where all the partial derivatives are zero. One of these points is
guaranteed to be the global maximum. Equation (4.29) shows the gradient
operator for θ̄:

 

∂θ1
 ∂ 
 ∂θ2 
∇θ̄ =  . 
  (4.29)
 .. 

∂θd

All critical points of Equation (4.28), including θ̂, are then solutions to:

n
X
∇θ̄ l = ∇θ̄ log p(x̄k | θ̄) = 0 (4.30)
k=1

This is in general a set of d non-linear equations. As an example, Equation


(4.30) is solved in its simplest case below for d = 2, which corresponds to a
single feature. In this case the feature vector θ̄ has θ1 = µ and θ2 = σ 2 . The
first task is to substitute θ1 for µ and θ2 for σ 2 into Equation (4.9) and take
the logarithm as in Equation (4.28):

¯
l(θ̄) = log p(X | θ) (4.31)
Xn
= log p(xk | θ̄) (4.32)
k=1
n
" Ã µ ¶2 !#
X 1 1 xk − θ 1
= log √ √ exp − √ (4.33)
k=1
2π θ2 2 θ2
n
X 1 1
= − log 2πθ2 − (xk − θ1 )2 (4.34)
k=1
2 2θ 2

4
The reason for taking the logarithm is that it eases some algebraic steps later on (this
is clearly legal, since the logarithm is a monotonically increasing function).

80
4.7 Comparing the identification strategies

The next step is to insert the last equation into Equation (4.30):

" # n

1 X 1
∇θ̄ l = ∂θ1
∂ − log 2πθ2 − (xk − θ1 )2 (4.35)
∂θ2
k=1
2 2θ2
n
" #
1
X
θ2
(xk − θ1 )
= −θ1 )2 =0 (4.36)
k=1
− 2θ12 + (xk2θ 2
2

Finally µ and σ 2 are back substituted and solved for. This yields:

n
1X
µ = xk (4.37)
n k=1
n
1X
σ2 = (xk − µ)2 (4.38)
n k=1

The results for the general multivariate version are similar:

n
1X
µ̄ = x̄k (4.39)
n k=1
n
¯ = 1
X
Σ̄ (x̄k − µ̄)(x̄k − µ̄)t (4.40)
n k=1

These results correspond well to Equations (4.15) and (4.16). Thus the
maximum likelihood estimates are as expected the sample mean and the sample
covariance matrix.

4.7 Comparing the identification strategies


In order to decide which method to use it must be clear in which situations each
method is preferable. This is done by identifying their individual strengths and
weaknesses.
The strengths and weaknesses of template matching were listed in Table
2.3 and 2.4, and although directed towards license plate extraction, most of
them do apply in more general terms.
Template matching is a strong similarity measure, when the size and rota-
tion of the object to be compared is known. The algorithm is simple and fast,
when it only has to run once on small images such as the isolated characters.
Feature based identification makes certain demands to the system in which
it is to be used. The use of the involved classifiers demands that the a priori

81
Chapter 4. Character identification

probabilities are accessible, and it has to be possible to estimate the class


conditional probability density functions.
Once these demands are met, the strengths and weaknesses are totally
dependent on the features selected. If good features are available, the classifi-
cation can be performed with only a few features, and depending on the speed
of extracting these features, the method can be very fast.
Feature based identification is in many cases very sensitive to input quality.
Noisy images may hinder the correct extraction of certain features, which can
be very disruptive for systems that base the identification on a small amount
of features. The amount of features in this system ensures, that although one
or two features fail, in the sense that they are wrongly extracted, the overall
identification is still possible.
Template matching is also sensitive to noise and of course the accuracy
decreases along with input quality, but not nearly as much as what might
happen with feature identification.

4.8 Summary
In this chapter the two methods, template matching and statistical pattern
recognition, for identifying characters were described. In connection with the
statistical recognition, a method for reducing the number of features called
SEPCOR was introduced. Mahalanobis distance was stated as a measure of
identifying the characters, derived from Bayes decision rule. As an alternative
measure, the Euclidean distance was used. A mean for testing, whether data
is normal distributed was presented, and parameter estimation for the Bayes
classifier is described. Finally, the two identification strategies were compared.

82
Part III

Test

83
Chapter 5
Extraction test

5.1 Introduction
This chapter describes the tests performed to verify that the license plate
extraction performs adequately. Since region growing and Hough transform
have the same objective, they will be described together, whereas correlation
is described separately, since it is used for verifying whether or not a region
could potentially contain a license plate. Finally the combined method will be
tested, in order to give an estimate of the actual performance of the subsystem.

5.2 Region growing and Hough transform


Both region growing and Hough transform were designed to go through the
entire input image, searching for regions that correspond to some of the char-
acteristics of a license plate. Although the approaches are very different, they
have the same input, and the same success criteria for the output regions. This
section will describe how the tests were performed, what the results were, and
provide an evaluation of those results.

5.2.1 Criteria of success


When testing the license plate extraction of region growing and Hough trans-
form, it is important to realize, that the goal is not to provide a single image,
but rather a limited list of regions, where at least one of the regions should
contain the license plate. The criteria for when a region is useful, is relatively
straight-forward. It must encompass the entire license plate, so that all seven
characters are fully visible. Also, it may not contain too much noise in the

85
Chapter 5. Extraction test

form of edges. If the region contains more than just the license plate, it will
be harder to distinguish the correct region from the entire list of regions.
As a secondary goal, the methods should not provide an excessive amount
of regions. A very large amount of regions makes it harder to select the most
probable license plate candidate.

5.2.2 Test data


Both methods were tested using two sets of images. One consisted of 36 images
that were used as a training set to develop the algorithms. These images were
all taken under the same lighting conditions, but shadows from trees nearby
gave some variance in the brightness of the license plates. Also, the images
were taken in various distances from the vehicles, to ensure that the algorithms
were constructed to be invariant to the size of the license plate.
Apart from the training set, the tests were run on a collection of 72 test
images. These were all taken on the same road, but facing in two different
directions, to provide some challenges regarding the lighting situation. As
with the training images, they were taken in different distances.
The goal is, of course, that the hit percentage of the test images is as high
as that of the training set. If the training set results are a lot better than for
the test set, it means that the system has been optimized towards a certain
situation, which is undesirable.

5.2.3 Test description


As mentioned, the test was performed on all images. For each image, a manual
inspection was made, to see if the license plate had been successfully extracted
in at least one of the regions. This rather overwhelming task was eased by first
trying to automatically identify the correct license plate, so that in many of
the cases, it was not necessary to go through all of the regions.
In the case of region growing, no suitable way of finding a dynamic threshold
was found. Instead, the algorithm was run five times, with different values for
the threshold. This caused a larger number of regions to be found, but also
ensured that the algorithm had a better chance of finding the license plates
under the varying lighting conditions.
The Hough transform test was performed in a very similar way, with manual
inspection of the regions, if automatic determination of the correct region did
not succeed.
For both methods, it was also registered, how many candidate regions were
extracted. Generally, a lower number will make the task of finding the most

86
5.2 Region growing and Hough transform

probable candidate easier later on.

5.2.4 Results
The region growing method turned out to be a very robust way of finding the
license plate. As Table 5.1 indicates, it almost always found the plate. In only
2 of the 72 test images, it was not capable of locating a satisfying region. The
total amount of regions lies in the lower hundreds, which may at first glance
seem a lot. It covers the fact that several regions are duplicated because of the
iterations with different threshold values.
There were some common characteristics of the two plates that were not
found. They were both rather dark images, taken so that the shade was in
front of the vehicle. Also, they were both yellow plates, further decreasing
the brightness of the plates in contrast to the surroundings. A further scaling
of the threshold value might have solved this issue, but for lower thresholds,
the total amount of regions rise quickly, so that this might not provide better
overall results anyway.

Data set Method Successful Regions


Training set Region growing 36/36 (100.0 %) 200 - 400
Hough transform 23/36 (63.9 %) 100 - 200
Test set Region growing 70/72 (97.2 %) 150 - 300
Hough transform 43/72 (59.7 %) 300 - 600
Table 5.1: Results for region growing and Hough transform.

For Hough transform, the percentage of found plates was somewhat lower.
Again, the yellow plates were more troublesome, with a hit percentage of 43
compared too a rate of 64 % for white plates. Also, the number of candidate
regions was approximately a factor 2 higher than for region growing in the test
set, but with very large differences from image to image. In the training set the
number was lower, again with large differences between the individual images.
As with region growing, the results were slightly better for the training set,
but the difference is so minuscule, that it does not imply that the algorithm
had been artificially designed for the specific set of training images.
As is also seen in Table 5.1, the amount of regions found by the Hough
transform is higher for the test set than for the training set. The reason is,
that a parking lot in the background of some of the images causes a very high
number of edges to be detected. Region growing actually finds fewer regions in
the test set, but the numbers represent an average, with very high deviations
between the individual images.

87
Chapter 5. Extraction test

The plates that were not found by region growing, were not found by the
Hough transform either. Therefore, the conclusion must be that the two meth-
ods do not complement each other.
In conclusion, the region growing method will guarantee a very high success
rate, but also a rather large amount of regions to be sorted out before finding
the most probable candidate. A very large percentage of the regions can easily
be discerned as random regions without much similarity with a license plate.
The algorithms were proven to be almost as efficient when applied to images,
that were not part of the training set, and were taken in different lighting
situations and distances.

5.3 Correlation
The correlation was ruled out as an extraction method, due to the fact that it
was not scaling invariant. Another use was found for it in the selection of the
candidate regions, identified by the other two methods. It is for this use, that
the correlation method will be tested.

5.3.1 Criteria of success


If the region containing the license plate is selected each time, it passes the
test, even if some regions that are not actually license plates slip through. The
reason for this is that in the implementation it will not function alone, but in
combination with other methods, namely the peak-to-valley and height-width
ratio methods. This cooperation of techniques will be tested in Section 5.4.
Therefore the conclusion reached in this section is not a final conclusion on the
selection among regions, but merely a conclusion on the correlation method.

5.3.2 Test data


The output regions from region growing and Hough transform are to be used as
input. All the regions actually containing license plates (106) have been mixed
with 700 random regions, to test whether the plates could be distinguished.

5.3.3 Test description


The correlation coefficient between the candidate regions and the template
is calculated, and if the value exceeds an experimentally set threshold the
region is marked as a possible plate. All regions that passed the test were then

88
5.3 Correlation

manually examined, to see if all plates were contained, and which type of other
regions passed.

5.3.4 Results
As can be seen in Table 5.2, all of the plates have been found.

Region type Regions accepted Percentage


Plates 106/106 100.0 %
Non-Plates 137/700 19.6 %
Table 5.2: Region selection by correlation.

The table also shows that regions that are not license plates, are not always
ruled out. This does not matter as it will be used in connection with other
methods. The only demand is, that the combination rules them out.
Looking at the non-plate regions that were not ruled out, there are some
common characteristics (see Figure 5.1). Many of the regions are as a), bright
regions with dark vertical lines, very similar to a license plate. The same goes
for b), where bright regions with darker ends are mistaken for the plates. It
should be noted that the correlation values for these types are much lower,
than a typical license plate would be expected to produce. Still, the threshold
value is set relatively low, so that even dark or slightly rotated license plates
are not omitted. c) is a different matter altogether. It might very well be
difficult to distinguish between a whole plate and a plate with the end cut off,
at least it is expected, that the region is not discarded as being a plate. Still,
the template has been designed so that the actual plate, where the characters
are placed as on the template, would yield a higher correlation coefficient.

a)

b)

c)

Figure 5.1: Examples of falsely accepted regions

This put aside, this test shows, that if the method was to be used alone,
more effort would have to be put in determining threshold values and perhaps
in template construction.

89
Chapter 5. Extraction test

5.4 Combined method


To test the actual algorithm for selecting the correct region, a combination of
the three methods described in Section 3.2.5 was tested.

5.4.1 Criteria of success


The algorithm for selecting the most probable candidate region must be capable
of sorting out a lot of different regions, that potentially bear close resemblance
to a license plate. Such regions could include bright areas of the sky or the
road. Even traffic signs pose a potential difficulty, since they obviously share
some of the characteristics of a license plate.

5.4.2 Test data


Input to the algorithm are the same images as were used to test the correlation
method. The optimal result therefore consists of finding all 106 plates, from a
mix with 700 random regions.

5.4.3 Test description


Since correlation was proven to be an effective discriminator, when sorting out
random regions, it is the first algorithm to be used. As shown in Section 5.3.4,
137 non-plate regions pass this test. Second, peak-to-valley is used to sort out
any uniform regions. After this step, only 45 non-plates remain, which have
to be sorted out by looking at the height-width ratio of the region. In this last
step, the region with the best ratio is chosen as the final candidate.

5.4.4 Result of test


The results of the test are summarized in Table 5.3, where it is seen, that 105
of the 106 license plates were found, and that none of the random regions were
mistaken for a license plate.

Method applied Regions sorted out Remaining


Correlation 563 / 806 243
Peak-to-valley 92/243 151
Height-width ratio 46/151 105
Total accuracy: 105 / 106 99.1%
Table 5.3: Test of selecting most probable region

90
5.5 Summary

The license plate not found was a blurred, small yellow plate, and by tweak-
ing the threshold values in the different steps, this plate could also be recog-
nized as a plate. The number of non-plates found to be plates, could be
seriously reduced, by tweaking the threshold values. For instance, a proper
selection of the threshold value for the correlation coefficient makes the corre-
lation validation more accurate. But it is pointless to do so, with the individual
threshold values out of context with the other methods. If the threshold for
the coefficient is increased, it is more likely that an actual plate is discarded.

5.5 Summary
The test showed, that the region growing method was capable of finding nearly
all license plates. The Hough transform found a smaller amount of the plates,
and it could not be established, that a combination would provide better results
than for region growing on its own.
The combined method for selecting the most probable license plate candi-
date is very effective. The three steps are each capable of sorting out a different
type of region, and the combination makes it possible to find the correct license
plate in a large collection of non-plate regions.
The amount of successfully extracted license plates is not as high as the
results above would indicate, however. This is due to the fact, that the amount
of found regions is very different from one image to another. Therefore, some
plates are not found when searching for the best region.

91
Chapter 6
Isolation test

6.1 Introduction
This chapter describes the test of the preprocessing step of isolating the char-
acters in the license plate. The purpose of the chapter is to verify that the
implementation of the isolation method described in Chapter 3 performs effi-
ciently.

6.2 Criteria of success


The purpose of the method is to divide the extracted license plate into exactly
seven subimages, each containing one of the seven characters. A successful
isolation fulfills all of the following criteria:

The plate must be divided into exactly seven subimages.

None of the seven characters may be diminished in any way.

The sequence of the subimages has to be in the correct order. This means
that the first character of the plate is the first of the subimages.

An example of an unsuccessful and a successful isolation is seen on Figure


6.1.

6.3 Test description


All the methods for isolating the characters in the license plate described in
Chapter 3 has been tested. The test is a simple black box test. A series
of input images is given to the algorithm, and the success of the test simply

93
Chapter 6. Isolation test

Figure 6.1: Example of unsuccessful and successful isolations

depends on the resulting output images. The test has been performed using the
connected component method, the pixel count method and the method using
static bounds. All methods are tested independently. Finally, the combined
method also described in Chapter 3, is tested as well.

6.4 Test data


The test is performed on images received from a successful extraction process,
meaning that each image does contain a readable license plate. As in the test of
the extraction process described in Chapter 5, the isolation process was tested
using two sets of input images. The first set consisted of the 36 images that
were used as a training set throughout the project. The second set was the
collection of 72 test images of which 70 images contain readable license plates.
Therefore the isolation methods will be tested on 106 (36+70) images.
All methods are tested both on images with no quality improvement (no
preprocessing) as well as images which have undergone a quality improvement,
such as cutting away irrelevant information or use of dynamic threshold, as
described in Chapter 3.

6.5 Connected components


The first test was performed using the connected component method for the
isolation of the characters, and the result is listed in Table 6.1. As described
in the design chapter, the method finds connected regions in a binary image.
First, the method was tested on images without using any form of qual-
ity improvement. The results proved almost identical for the two data sets.
For both sets approximately 73% of the images were successfully divided into
acceptable subimages.
Next the border was removed, which did not affect the results at all.
To improve the image quality further, a method for setting the threshold
value dynamically, was applied. Thereby the quality of the binary images

94
6.6 Pixel count

Data set Method Successful Percentage


Training set Without preprocessing 26 of 36 72.2 %
Border removed 26 of 36 72.2 %
Dynamic threshold 33 of 36 91.7 %
Both improvements 33 of 36 91.7 %
Test set Without preprocessing 52 of 70 74.3 %
Border removed 52 of 70 74.3 %
Dynamic threshold 68 of 70 97.1 %
Both improvements 68 of 70 97.1 %
Table 6.1: Result from the connected component test

improved drastically and this reflects clearly on the result of the test. Where a
total of 78 images succeeded before, the improved method was able to divide
another 23 images. for both sets, a total of 101 of the 106 images resulted
in successful isolation. In general the resulting images from the method is of
good quality, meaning that the bounds found by the method are accurate.
From the test it is also clear that there is no remarkable difference in the
result of the two data sets. The results are slightly better for the test set, but
for both sets, a success rate of above 90 % is achieved.
Figure 6.2 shows an example of a plate, that could not be successfully
divided into 7 subimages using the connected component method. The reason
is, that the third and fourth digit are part of the same component after the
image has been thresholded, and therefore they cannot be isolated.

Figure 6.2: Plate that fails, using connected components

6.6 Pixel count


The next method tested was the pixel count method which simply searches
for the characters by finding peaks and valleys in a horizontal projection. The
method was tested on the same images as the previous method. The result of
the test is shown in Table 6.2.
The test performed on images with no improvement proved insufficient.
Only 25 of the 106 images succeeded. Removing the border did not have any
effect whatsoever.

95
Chapter 6. Isolation test

Data set Method Successful Percentage


Training set Without preprocessing 8 of 36 24.3 %
Border removed 8 of 36 24.3 %
Dynamic threshold 26 of 36 72.2 %
Both improvements 27 of 36 75.0 %
Test set Without preprocessing 17 of 70 23.2 %
Border removed 17 of 70 23.2 %
Dynamic threshold 56 of 70 80.0 %
Both improvements 56 of 70 80.0 %
Table 6.2: Result from the pixel count test

The use of dynamic threshold gave a significantly better result. One image
which succeeded before dynamic threshold was applied, failed, but a total of 72
images succeeded. This means, that using a combination of the output without
improvement and the output when using dynamic thresholding, a total of 73
of 106 images resulted in a successful isolation.
As in the previous test, there was no significant difference between the
success percentage of the two data sets, when isolating without preprocessing.
As with the connected components, the results were slightly better for the test
set, when using dynamic thresholding. However, the 5 % difference is small
enough to be caused by statistical uncertainty.

6.7 Static Bounds


Next, the method for setting the character bounds statically was tested. Since
the method is based on dividing the plate at the same coordinates always,
the method is independent of the quality of the image received from the plate
extraction process. Therefore only one test was performed, using images with
no quality improvement. The result of the test can be seen in Table 6.3.

Data set Successful Percentage


Training set 32 of 36 88.9 %
Test set 66 of 70 94.3 %
Table 6.3: Result from the static bounds test

Naturally the method always succeeds in dividing the image into seven
subimages and thereby the first and the third criteria of success are always
fulfilled. The success of the method depends merely on the quality of the
isolated characters, meaning that the second criteria is fulfilled.

96
6.8 Combined method

The method succeeds in 98 of 106 images. The quality of the subimages is


not as good as subimages from the connected component method, although in
most cases they are useful. The image on the lefthand side of Figure 6.1 is an
example of an unsuccessful subimage produced by the static bounds method.
Once again, the test set provides the best results. This implies, that the
training set was very diverse, so that the development of the algorithms was
performed using ‘worse’ images, than those of the test set.

6.8 Combined method


Finally, the combined method described in Section 3.2.5 was tested. The
method is a combination of all the methods that were tested above. As seen in
Table 6.4, the result of the test is that all 106 images succeed, which indicates
that the methods complement each other.

Data set Successful Percentage


Training set 36 of 36 100.0 %
Test set 70 of 70 100.0 %
Table 6.4: Result from the combined method

The images that failed both the connected component and the pixel count
method, are all images that can be adequately divided by using static bounds.
The connected component method produces the best output images since
the bounds are very accurate, but when the method fails, both the pixel count
method and static bounds proved useful.

6.8.1 Summary
The success criteria for the isolation of the characters in the license plate was
that all seven characters should be isolated in the correct order and without
being diminished in any way.
The test shows that although the individual methods for isolating the char-
acters described in Chapter 3 cannot perform the task by themselves, the com-
bination was proven to be able to do this in all of the images used for testing.

97
Chapter 7
Identification test

7.1 Introduction
The final step in recognizing a license plate is identifying the single characters
extracted in the previous steps. Two methods for doing this was presented
earlier. The first that has been tested is the method based on statistical pattern
recognition, and second the normalized cross correlation coefficient.

7.2 Criteria of success


For this test to be a success, as high a percentage as possible of the characters
must be identified correctly. A few errors are acceptable, since a full imple-
mentation should be able to notify the operators, when it is unable to identify
a character, allowing for manual identification.
Since as many characters as possibly should be correctly identified, the
success criteria for the character identification test is a hit rate approaching
100 %.

7.3 Test data


The test data originates from the 106 test pictures that were extracted and
isolated during the previous tests. The actual test data are the single characters
extracted from the license plate, as mentioned in Section 6.2.
No attempts will be made to identify the letters in the plates using the
statistical pattern recognition, since this method has only been designed with
the numbers in mind. The letters are identified using template based identifi-
cation, because here only a single template is needed, in contrast to statistical

99
Chapter 7. Identification test

pattern recognition where several observations are required.

7.4 Feature based identification


In the testing of the statistical pattern recognition, the numbers will be identi-
fied using both Euclidean and Mahalanobis distance. Otherwise the tests are
performed in the exact same way, as described in the following section.

7.4.1 Test description


The test of feature based identification is performed by looking at what the
different characters are identified as, and what they actually are. Furthermore,
a different setting for the maximum correlation threshold is used when the
SEPCOR algorithm is performed, thereby altering the number of features used.

100

90

80

70
Identification percentage

60

50

40

30

20

10

0
0 50 100 150 200 250 300
Number of features

Figure 7.1: Identification percentage plotted against number of features.

7.4.2 Result of test using Euclidian distance


Result of the test with Euclidean distance is summarized in Table 7.1, where
different values of the max correlation have been tested. Table 7.2 shows the
result of the test with the training set, and a max correlation of 1. When
testing on the training set, an identification very close to 100 % of the digits
is expected, since this is also the data, the features have been derived from.

100
7.4 Feature based identification

The test was performed using a maximum correlation of 1, and this yields an
identification rate of 100 %. This was expected, since the system should be
able to identify the characters it was trained with.

Max. Features
corr. used Successful Percentage
0.1 5 130 of 350 37.1 %
0.2 13 247 of 350 70.6 %
0.3 25 267 of 350 76.3 %
0.4 42 316 of 350 90.3 %
0.5 69 337 of 350 96.3 %
0.6 96 338 of 350 96.6 %
0.7 142 344 of 350 98.3 %
0.8 210 344 of 350 98.3 %
0.9 262 345 of 350 98.6 %
1.0 306 345 of 350 98.9 %
Table 7.1: Result of the test on the test set

The best result obtained on the test set is an identification percentage of


98.9 %. As the maximum correlation rises, the recognition percentage rises
too. This is plotted in Figure 7.1.

Results
Correct 180
Training set size 180
Percentage (%) 100.0
Table 7.2: Result of the test on the training set

As it can be seen in both Table 7.1 and Figure 7.1, the recognition percent-
age rises quickly when the maximum correlation is below 0.5, and then settles
at about 98 %. The gain achieved from using a maximum correlation of 1 com-
pared to 0.7 is only 0.57 %. But since there are no hard timing restrictions,
the higher identification rate, although small, is preferable in this system.

(a) (b)

Figure 7.2: (a) shows the unidentifiable character, and (b) the optimal result.

101
Chapter 7. Identification test

The number of errors using the value 0.7 are spread on three different license
plates, and on two using 1 as maximum correlation. It should be noted, that
using the value 0.7, one license plate was responsible for four of the six errors.
Using a maximum correlation of 1, the same plate accounts for three out of
four errors. An example of an error originating from this plate can be seen
in Figure 7.2. This image actually produces some very nice subimages of the
digits, but the digits in these images are a bit small. The digit in the image
should fill the image horizontally, but fails to do so, probably because of the
use of static bounds.
This is believed to be the cause of the inability to identify the digits in this
plate.

7.4.3 Result of test using Mahalanobis distance


The test using Mahalanobis distance is similar to the test described in the
previous section. The only difference is that two tests are performed. In the
first test, the training set was used for training and in the second, the test set
was used for training. Both sets were tested in each test.
Table 7.3 shows the results of the tests when run on the same set that was
used for training using all features. As expected the identification percentage
is very high.

Results Test 1 Test 2


Correct 175 333
Training set size 180 350
Percentage (%) 97.2 95.1
Table 7.3: Result of the tests on the training sets

Table 7.4 shows the results of identifying the set not used for training. Not
surprisingly, the identification percentage is lower when identifying the set not
used for training. Also, contrary to the results with the Euclidean distance,
adding more features does not always increase the identification percentage.
The results of the tests are also shown in Figure 7.3 and Figure 7.4 as a
function of the number of features used. Here it becomes even more apparent
that a higher number of features does not always produce a better result.
The reason seems to be that the SEPCOR algorithm sometimes removes
good features, which although correlated with other selected features, had
a positive impact on the identification percentage. These features are then
replaced with other features, which provide an inferior result. Or to put it in
other words: When removing features, the SEPCOR algorithm does not look

102
7.4 Feature based identification

Test 1 Test 2
Max. Features Features
corr. used Correct (%) used Successful (%)
0.1 5 165 of 350 47.1 7 75 of 180 41.7
0.2 12 133 of 350 38.0 13 163 of 180 90.6
0.3 27 181 of 350 51.7 31 116 of 180 64.4
0.4 50 229 of 350 65.4 47 146 of 180 81.1
0.5 85 236 of 350 67.4 73 115 of 180 63.9
0.6 123 249 of 350 71.1 106 125 of 180 69.4
0.7 173 268 of 350 76.6 160 150 of 180 83.3
0.8 233 270 of 350 77.1 208 166 of 180 92.2
0.9 275 299 of 350 85.4 258 161 of 180 89.4
1.0 306 301 of 350 86.0 306 162 of 180 90.0
Table 7.4: Result of the tests

90

80
Identification percentage

70

60

50

40
Training set
Test set
30

0 50 100 150 200 250 300


Number of features

Figure 7.3: Result of identification using training set as training

at how good a feature is, that is how high its V-value is, before removing it.
This means that any correlated feature no matter how good can potentially be
removed.
Comparing identification percentages with those from the Euclidean dis-
tance, it is clear that identification using the Euclidean distance produces bet-
ter results. There can be several reasons as to why this is the case. First, the
amount of data used for training the system in the two tests might have been

103
Chapter 7. Identification test

too small, although the tests actually show that identification using the smaller
training set performs better. This might be a coincidence though. Secondly,
the assumption that the distribution of the data is normal might be wrong,
and as mentioned in Section 4.6.3 not all features can be considered as normal
distributed.

100

90

80
Identification percentage

70

60

50

40
Training set
Test set
30

20
0 50 100 150 200 250 300
Number of features

Figure 7.4: Result of identification using test set as training

7.5 Identification through correlation


Unlike the previous tests, this time the identification is performed on both the
digits and letters in the license plates.

7.5.1 Test description


Each character is checked against the appropriate template pool, one contain-
ing the templates for the letters and the the other for the digits, and by looking
at the calculated coefficients it is determined which character is in the image.

7.5.2 Result of test


The result of the test can be seen in Table 7.5. In the case of the letters, a
success rate lower than that of the digits was expected, as more work has been

104
7.5 Identification through correlation

Data set Successful Percentage


Test set Letters 126 of 140 90.0 %
Digits 344 of 350 98.3 %
Total for test set 470 of 490 95.9 %
Training set Letters 54 of 72 75.0 %
Digits 170 of 180 94.4 %
Total for training set 224 of 252 88.9 %
Table 7.5: Identifying the characters using templates.

put into the construction of the digit templates. Also there are more different
letters than digits.
Many of the errors were due to poor quality of the input images. The
major problems are connected to the size of the input images and the overall
brightness of the images.
Using the plate in Figure 7.5 as an example, it is noted that in spite of the
small size of the plate in the image, the plate is extracted and the characters
are isolated without problems. The character images are however of such a
poor quality that the correlation identification fails.

Figure 7.5: The size problem illustrated.

Many of the extracted license plates were so dark that no proper threshold
value could be found to remove the background of the plate and make the
characters stand out in a satisfactory way. The plate in Figure 7.6 serves as
an example of such a plate.
The problem is exaggerated in that image, as a better isolation of the
characters can be achieved using dynamical thresholding but it illustrates the
problem with this type of plates.
Almost half of the errors in the training set occur in plates with more
than one error, indicating that the input images from that plate is not of a
sufficient quality. Many of the other errors can also be attributed to these two
factors as the plates are borderline examples of the situations. Also there is

105
Chapter 7. Identification test

Figure 7.6: The brightness problem exaggerated.

no pattern in the types of errors. No letter or digit is represented noticeably


more than others, and all characters involved in errors have at some point
been correctly identified. This also indicates that the input plates are not of a
sufficient quality. In the test set, the errors were distributed so that only one
error occurred in each plate.
It can be concluded, that template identification of the license plate letters
cannot fulfill the purpose is was intended for, without a serious improvement
of the templates. The identification of the digits is satisfactory.

7.6 Summary
Although the hit rates are almost the same in the two of the tests, Euclidean
distance and template matching, three out of four errors in feature based iden-
tification are to be found in the same plate, whereas the errors in the template
method are to be found in different plates. This supports what was stated
in Section 4.7, where the sensitivity to noise of the feature based method was
discussed.
In Section 4.7, it was also stated that template matching was less suscepti-
ble to noise prone errors than feature based identification. This is hard to see
from these results, as the template method hit rate is lower. This indicates
that it is in fact the chosen features who are less susceptible to noise than the
template strategy.
All in all, the conclusion of the tests must be, that the feature based identifi-
cation suits our purpose the best, but that identification using the Mahalanobis
distance either needs a larger training set or that the assumption that the data
is normal distributed is not entirely correct.

106
Chapter 8
System test

8.1 Introduction
In the previous test chapters, the components of the system were examined
separately in terms of performance. It is also relevant to test the combination
of the components. When testing the individual components, only useful input
was used. In real life, an error made in the license plate isolation will ripple
through the system, and affect the overall performance. This chapter will
describe the tests performed on the final system.

8.2 Criteria of success


The ultimate goal of the program is to take an image containing a vehicle as
input, and output the registration number of the identification plate of the ve-
hicle. The identification of numbers was implemented using both feature based
identification and template matching. The identification of letters was only im-
plemented in the template matching, and therefore the letters are discarded
when deciding whether or not a license plate have been correctly identified.
A successful processing of an image is defined as a processing that extracts
the license plate, isolates all seven characters, and identifies all the digits.

8.3 Test data


The test was performed using all of the test images. The training images were
not included in the test, since it is more interesting to see, how well the system
performs when confronted with some previously unknown input images.

107
Chapter 8. System test

8.4 Test description


The test was run using region growing for license plate extraction, since it
proved to be the most robust method (Section 5.2.4).
For isolating the characters, the combined method described in Section 3.2.5
was used, which should prove sufficient, since the results demonstrated in Sec-
tion 6.8 were quite convincing.
For identifying the characters, three possibilities were tested, since the re-
sults were much closer than those of region growing and Hough transform.

8.5 Results
The results obtained for this test are seen in Table 8.1.
Region growing combined with the method for finding the most probable
region, found a total of 57 license plates. When using both region growing
and Hough transform, the number of license plates found was actually slightly
lower, due to the extra amount of random regions to be sorted out. Therefore,
the remaining algorithms automatically got these 57 license plates, and 15
random regions as input. Of course, it is not very interesting to see what the
further results will be for the 15 random regions.
The isolation of the individual characters worked in all of the 57 license
plate regions.

Data set Method Successful Percentage


Test set License plate extraction 57 of 72 79.2 %
Character isolation 57 of 57 100.0 %
Character identification
- Features - Euclidean 55 of 57 96.5 %
- Features - Mahalanobis 33 of 57 57.9 %
- Correlation 50 of 57 87.2 %
Training set Not performed
Maximum overall
successful plates 55 of 72 76.4 %
Table 8.1: Overall result for the system

As mentioned, all three methods were tested for the identification of the
characters. Here Mahalanobis turned out to provide the least impressive re-
sults. It turned out, that in many of the plates only a single character was
misidentified, but in this test, such a plate is considered a failure.

108
8.6 Summary

The correlation scheme failed in seven of the successfully extracted plates,


thus increasing the total amount of correctly identified plates to 50.
The feature based method using Euclidean distance proved to be the su-
perior method, given the features that were selected in the system. Here only
two plates failed, raising the overall success rate to 76.4 %. The two plates
that were wrongly identified, also were not found by the other two methods,
and the reason is probably, that they contain black spots due to the bolts (see
Figure 8.1).

Figure 8.1: Good quality license plates wrongly identified

On the other hand, several poor license plate images were correctly identi-
fied. Two examples of such plates are shown in Figure 8.2. These also contain
marks from the bolts, but the classifier is still capable of providing the correct
number for each character.

Figure 8.2: Poor quality license plates correctly identified

8.6 Summary
The test of the overall system established, that by using region growing for
license plate extraction and a Euclidean distance measure for the feature based
identification, a total of 76.4 % of the test images were successfully identified.
The main cause of the failures was the license plate extraction, which caused 15
of the 17 errors. Since it was shown in Section 5.2.4, that 70 license plates were
actually found, the algorithm for extracting the most probable region throws
away 13 license plates in favor of regions containing random information.
The results could be further enhanced, if some restrictions were imposed
on the input images. If, for instance, the images were taken in approximately
the same distance from the vehicle (such as in the currently used system), the
extraction of the license plate would be significantly easier to accomplish.

109
Part IV

Conclusion

111
Chapter 9
Conclusion

The purpose of this project has been to investigate the possibility of making
a system for automatic recognition of license plates, to be used by the police
force to catch speed violators. In the current system, manual labor is needed
to register the license plate of the vehicle violating the speed limit. The ma-
jority of the involved tasks are trivial, and the extensive and expensive police
training is not used in any way. Therefore an automatic traffic control system,
eliminating or at least reducing these tasks has been proposed.
We wanted to investigate the possibility of making such an automatic sys-
tem, namely the part dealing with automatically recognizing the license plates.
Given an input image, it should be able to first extract the license plate, then
isolate the characters contained in the plate, and finally identify the characters
in the license plate. For each task, a set of methods were developed and tested.
For the extraction part, the Hough transform, and in particular the region
growing method proved capable of extracting the plate from an image. Both of
them locate a large number of candidate regions, and to select between them,
template matching was utilized. Template matching, combined with height-
width ratio and peak-valley methods, provided a successful method of selecting
the correct region.
The methods developed for isolating the characters proved very reliable.
The method for finding character bounds using an algorithm that searches for
connected components proved to be the most useful, and combined with pixel
count and static bounds, the method proved to be extremely successful.
For the actual identification process, two related methods were developed;
one based on statistical pattern recognition and the other on template match-
ing. Both methods proved to be highly successful, with feature based identifi-
cation slightly better than template matching. This is because more features
are included in the statistical pattern recognition, and therefore this method

113
Chapter 9. Conclusion

was chosen to identify the digits on the plate. No attempts for identifying
the letters were made using features based identification, and this is the only
reason why template matching was used to identify the letters.
In order for the system to be useful, it should be able to combine the three
different tasks, and to recognize the license plates in a high percentage, so the
use of manual labor is reduced as much as possible. This implies, that the
success rate for the individual parts should be close to 100 %. The results
obtained are summarized in Table 9.1.

Mission Success rate Successful


Plate extraction 98.1 % ✓
Character isolation 100 % ✓
Character identification 98.9 % ✓
Overall performance 76.4 % ✓
Table 9.1: Main results obtained.

As can be seen, the individual parts perform very satisfactory, all with a
success rate close to 100 %. The plate extraction succeeds in 98.1 % of the
test images, and this is a very high success rate. The extraction fails in only
two of the images. This is acceptable, since the extraction works in more than
98 % of the images, thereby fulfilling the criteria of this task.
The part of isolating the characters contained in the license plate succeeds
in 100 % of the cases, and thus is very successful, achieving the goal set for
this task.
Out of the isolated digits, 98.9 % were correctly identified. This is also a
very high success rate, and it must be taken into account, that the wrongly
identified digits originates from only two plates.
The overall performance is not as high as for the individual tasks, but still
a large amount of license plates is correctly identified, namely 76.4 %.
In general, the conclusion of this report is, that a system for automatic
license plate recognition can be constructed. We have successfully designed
and implemented the key element of the system, the actual recognition of the
license plate.
The main theme of the semester was gathering and description of informa-
tion, and the goal is to collect physical data, represent these symbolically, and
demonstrate processing techniques for this data. This goal has been accom-
plished in this project.

114
Part V

Appendix

115
Appendix A
Videocamera

As described in Section 1.2, a standard video camera is to be used for obtain-


ing the data required to recognize the license plates of the vehicles. There
are certain requirements for this camera regarding resolution, color, and light
sensitivity.

Resolution
A certain resolution is required in order to distinguish between the indi-
vidual characters of a license plate. Generally, raising the resolution will
provide better images. This improvement in quality comes at the cost of
larger amounts of data, and more expensive equipment.

Color
Since Danish license plates are either yellow or white with black char-
acters, no color is needed to separate characters from the background.
However, most modern equipment defaults to color images. This does
not pose a problem, as a software conversion to gray scale images is pos-
sible. As with the resolution, an improvement in color depth will result
in more data to be processed.

Light sensitivity
Since the equipment will be placed outdoors, weather will have a large
impact on the images. Especially the camera should be able to automat-
ically adjust the light sensitivity to the conditions. If the light sensitivity
is either too high or too low, the images will tend to become blurry, so
that even high-resolution images are useless. A camera with good ability
to obtain sharp images at varying lighting conditions, is said to have a
large dynamic range.

117
Appendix A. Videocamera

A.1 Physical components


All of the factors described in the previous section depend upon the physical
components of the camera. These components include lens, shutter, and CCD-
chip (Charge-Coupled Device).

Lens
The camera lens sharpens images by focusing the light from the source at
a particular point, where the CCD-chip converts the light to an electrical
signal. Usually a camera contains several different lenses, to compensate
for e.g. the fact, that light at different wavelengths diffracts differently.
Lenses are also used for zooming.

Shutter
The shutter is simply a device, which lets different amounts of light pass
into the camera, depending upon the overall brightness of the surround-
ings. Usually the shutter consists of six ‘plates’, that synchronously move
away from, or closer to, each other.

CCD-chip
The CCD-chip converts an amount of light to an electrical signal. In its
most simple design, no concern is given to the color of the absorbed light.
The more sophisticated color CCD-chips work in one of two ways: Either
the light is spread into its three base colors by lenses and absorbed by
a CCD-chip for each color, or the CCD-chip uses a color filtered array
such as the Bayer pattern (see Figure A.1) to distinguish between colors.

Figure A.1: Color aliasing using the Bayer pattern

The chips using such techniques will experience a phenomenon called


color aliasing, also seen in Figure A.1
The image is constructed by using shifting registers as seen in Figure
A.2. The bottom line is ‘dropped’ into a buffering register, and the value
of each pixel in this line is sequentially written to an output channel.

118
A.1 Physical components

When an entire line has been written to output, a new line is shifted
downwards and so forth.

Figure A.2: CCD-chip with registers

119
Appendix B
Visiting the police force

In order to get an understanding of the case, we visited the local police de-
partment, so that we could see first hand how the system currently being used
operates.
The presentation of the system was split into two parts. First we saw how
the pictures are processed at the local police office, and the different stages
were explained. Secondly, we saw the recording of the images, as we drove out
to one of the surveillance vans. Here we got a feel for the circumstances under
which the images are recorded.

B.1 The system at the precinct


The first part of the visit was the police office where we saw a presentation of
how the pictures are processed.
In the office of the division responsible for processing the pictures, two peo-
ple worked with registering the license plates. When the system is expanded to
the entire northern Jutland, Aalborg will still be the only place the registration
takes place. The jobs of the people working in Aalborg begins when receiving
a disk containing the pictures taken during a surveillance session. Each picture
is processed in turn in the same way. In the top of the picture information
concerning the location of the surveillance is stored, and this information is
registered along side the license plate. This is information about specific lo-
cation, type of road monitored and out of the ordinary speed limits. Also the
speed of the recorded vehicle is checked. If the speed of the vehicle is below
the speed limits, naturally the picture is discarded.
After this information has been logged, the license plate is registered. The
images are often blurred and it is necessary to perform some image processing
in order for the plate to be readable. The license plate is selected by setting

121
Appendix B. Visiting the police force

a rectangular region of interest, the selection is enlarged and the enlarged


plate is sharpened through histogram equalization. A default setting is applied
automatically, but if the result is unsatisfactory the settings can be changed.
The same is done for the operator of the vehicle, because the driver has to
be clearly identifiable according to Danish traffic legislation. In addition, the
faces of any passengers must be concealed. If the driver of the vehicle does not
have a drivers license he will be fined accordingly, but any other misdemeanors
cannot be addressed. This means that an offense that might be finable if the
car was pulled over is ignored. This includes violations often easily observed on
the pictures, such as driving while talking on the cell phone or driving without
using the seat belt.
What happens to foreign traffic offenders, depends on where they are from.
Ideally, if the offender is from one of the Nordic countries or any other country
the Danish police cooperates with, the case is turned over to them. In many of
the other countries they could not be bothered with a Danish traffic violation.
In any case, if no such international arrangement exits, the offender is not
fined.
Once the information needed to issue the fine has been extracted from the
image, it is sent on to a second registrant, where the information is double
checked, and the plate is looked up in the Danish Motorized Vehicles registry.
If for obvious reasons the driver can be ruled out as the owner of the vehicle,
the owner must inform the police of who drove the vehicle at the time of the
offense, or be fined himself.

B.2 The system in the field


The system is fitted into a van, which can be parked alongside the road. Since
the speed estimator is radar based, it needs to overlook a certain length of
straight road, in order to be able to determine the speed of the passing vehicle.
Behind the van the road needs to be straight for approximately 40 to 60 meters.
Parking the van is no trivial matter. The position of the van in relation
to the road has to be very accurately parked parallel and at most 3.5 meters
from the curb. The way this is done is that a ring hangs from the ceiling of
the van and a small marker is painted on the windshield. Through the ring the
marker is aimed at a land marking stick held by another officer. The way the
van is parked enables it to monitor and capture four lanes at the same time,
of course only monitoring a single vehicle at a time.
Once the van has been parked, the system must be calibrated. The camera
is adjusted to give the best images under the given conditions. Also, the

122
B.3 Evaluation of the current system

speed limit to trigger the camera is set and the address of its position is also
registered.
When setting the “trigger-speed-limit” a 3 km/h insecurity of the radar is
taken into account, and only offenders driving more than 10% faster than the
speed limit of the road are fined. This means that for a section of road with a
speed limit of 50 km/h the trigger speed limit is set to 59 km/h.
Having initialized the system no further human involvement is needed.
The need for an officer to be present is based on the fact that different vehicle
types have different speed limits. If busses and lorries are to be fined he has
to manually trigger the camera when they pass by. Actually, what he does is
switching the system to a different trigger speed limit.
The radar is a sensitive piece of equipment. In addition to the 3 km/h
margin of error it is also affected by acceleration and deceleration. The use
of radar for measuring the speed of the vehicles also restricts the locations
available for surveillance. Large solid objects such as billboards can confuse
the radar.

B.3 Evaluation of the current system


This system was introduced quite recently. Actually it has only been im-
plemented in a few districts, and for testing purposes only. Up until now, the
system has proved to be a success. With over 100 fines issued a day in Aalborg
and a national average of fines on 798 DKR, the system is quite profitable.
The equipment needed is very expensive. A fully equipped van runs at
about 1.000.000 DKR, where the actual price of the van is only about 30%.
As mentioned the system is still in the testing phase, only a limited number
vans have been equipped. This fact combined with the limited number of places
suited for surveillance means that people are starting to recognize the vans.
Also, the system is not useable on freeways. The only restriction regarding
the use on freeways, is that it is not possible to set the type of road to a freeway.
This is caused by the fact, that the system has no option for setting the speed
limit to 110 km/h. Therefore the system can not issue the appropriate fine.
The effect of the system is unquestionable. After having been used for a
while the amount of speeding offenders drops. This is not only because the van
becomes recognizable. Cables buried under the asphalt in the cities monitoring
the traffic flow, show that the overall number of vehicles speeding drops.

123
Appendix C
Contents of CD

This purpose of this appendix is to give an overview of the attached CD. Below,
a quick overview of the contents is displayed:
Directory: CD
Acrobat Reader 5.05
Referenced web pages
Images
Test
Training
Source code
Final
Image
Misc
Installation
Required DLL’s
All of the instructions required to install and use the program are included
on the CD. If a browser window does not appear automatically when inserting
the CD, manually start your browser and select the file index.html in the root
of the CD. The report is also available for viewing via a link in the index.html
file.

C.1 The program


The program was written in a Windows environment, since there are several
well-documented image processing libraries available for Windows. Also, it
was chosen to ease the process of creating a graphical user interface for testing
purposes.
The code was written in Microsoft Visual C++, and the source code folder

125
Appendix C. Contents of CD

on the CD also contains the project files used when building the executable.
The libraries imported were Intel Image Processing Library (IPL) and Intel
JPEG Library (IJL). Also, a package called Open Computer Vision (OpenCV)
was utilized. All of these libraries contain routines for manipulating images, in-
cluding morphology operations, per-pixel operations and color manipulation.

126
Part VI

Literature

127
Bibliography

[1] David R. Anderson, Dennis J. Sweeney and Thomas A. Williams


Statistics for business and economics, 8. ed.
2002, South-Western ISBN 0-324-06672-4

[2] Vejledning for statens bilinspektion


http://www.fstyr.dk/fagomraader/krav/pdf sbi/afsnit16.pdf
November 2001

[3] Bernhard Jähne


Digital Image Processing
1993, Springer-Verlag
ISBN 3-540-53782-1

[4] Rafael C. Gonzalez, Richard E. Woods


Digital Image Processing
1993, Addison-Wesley
ISBN 0-201-60078-1

[5] J.P. Lewis


http://www.idiom.com/˜zilla/Work/nvisionInterface/nip.html
Industrial Light & Magic

[6] Moritz Störing and Thomas B. Moeslund


An Introduction to Template Matching
2001, Technical Report CVMT
ISSN 0906-6233

[7] Gorm Jespersen


Visit at Aalborg Police Department
13/03-2002

129
BIBLIOGRAPHY

[8] Trafikudvalgets evaluering af mål om nedbringelse af antal ulykker


http://www.folketinget.dk/Samling/21001/udvbilag/TRU/Almdel bilag611.htm

[9] Richard O. Duda, Peter E. Hart


Pattern Classification and Scene Analysis
1973, John Wiley & Sons

[10] David C. Lay


Linear Algebra and Its Applications, Second Edition
2000, Addison-Wesley
ISSN 0201824787

130

You might also like