Basepaper ngt1

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1
A Novel System for Nighttime Vehicle Detection

Based on Foveal Classifiers With
Real-Time Performance
Andrés Bell , Tomás Mantecón , César Díaz , Carlos R. del-Blanco ,
Fernando Jaureguizar , and Narciso García , Member, IEEE
Abstract— Vehicle monitoring using camera networks is an can be totally or partially avoided through the use of auto-
important task for traffic applications. Moreover, it becomes matic systems for traffic management. Therefore, there exists
critical in nighttime, when the probability of an accident consider- extensive literature addressing different particular problems
ably increases as visibility conditions worsen. Typical approaches
are mostly based on the assumption that regions delimiting closely related to those situations, like vehicle detection [3],
vehicle lights are well defined, so that they are segmented and collision avoidance [4], pedestrian and cyclist recognition [5],
then associated to vehicle entities. However, this assumption fails license plate identification [6], traffic monitoring [7], driver
in images acquired by existing traffic camera networks, where assistance [8], or self-guided vehicles [9].
vehicle lights are revealed as flashes and other complex light In the state of the art, works relying on very different types
patterns, occupying large and even disconnected image regions.
In this work, a real-time vehicle detection algorithm for nighttime of sensors can be found, such as LIDAR [10], RADAR [11],
situations has been presented, which is able to locate vehicles and cameras [12]. The latter are of special interest, given the
in the image by analyzing the previous complex light patterns. staggering number of video cameras that can be found in
For this purpose, a novel machine learning framework based on all kinds of scenarios, from static cameras oriented to traffic
a grid of foveal classifiers has been designed. Every classifier monitoring and/or surveillance up to vehicle onboard cameras.
in the grid processes the same global image descriptor (only
one descriptor is computed per image). However, every one The work presented in this paper belongs to this group.
of them is trained to predict a different output depending on Most works based on visual imagery are restricted to day-
the classifier position in the grid and the vehicle ground-truth time situations with favorable weather conditions [13], [14].
location. Additionally, only point-based annotations are required In comparison, there are few works dealing with images
to train the grid of foveal classifiers, speeding up the cost of captured under poor illumination conditions, that is, basically,
creating the required databases. Experimental results prove the
effectiveness of the proposed method in a new created nighttime in bad weather (e.g. rain, fog, snow…) or during night-
database with point-based annotations. time [15]. However, these cases are key to properly tackle
traffic problems, as the probability of an accident considerably
Index Terms— Foveal classifiers, nighttime, point-based anno-
tations, real-time, vehicle recognition. increases under reduced visibility conditions. Moreover, most
existing methods working with this type of imagery are
focused on unfavorable weather conditions [4], [16], [17], [18]
I. I NTRODUCTION rather than on nighttime situations, which, however, are more
frequent and even more challenging. Indeed, in nighttime,
T HE field of Intelligent Transportation Systems (ITS)
has been a subject of growing interest in recent years
coinciding with a considerable increase in the number of
vehicles are often represented in the image by just one or
several distorted lights (see Fig. 1 for an example). Thus, their
shape is usually virtually lost. This is especially common in
vehicles on the road worldwide [1]. This concurrence is due to
real traffic camera networks that are subject to practical and
the fact that more vehicles mean more problematic situations,
economical restrictions (no high dynamic range capability,
such as traffic congestion and circulation accidents [2], which
bandwidth limitations, etc.). In summary, there is a lack
Manuscript received October 28, 2019; revised April 23, 2020, of strategies that satisfactorily face this key but challenging
July 15, 2020, and October 20, 2020; accepted January 15, 2021. This work situation.
was supported in part by the Ministerio de Ciencia, Innovación y Univer- Most works addressing the task of nighttime vehicle detec-
sidades (AEI/FEDER) of the Spanish Government under Project TEC2016-
75981 (IVME) and in part by the NVIDIA Corporation with the Donation tion are based on the detection of vehicle front or rear
of the Titan Xp GPU. The Associate Editor for this article was K. Wang. lights [19]. In particular, in [20], the authors aim at finding
(Corresponding author: Tomás Mantecón.) vehicle rear lights. Haar-like features over gray-level images
The authors are with the Grupo de Tratamiento de Imágenes, Information
Processing and Telecommunications Center and ETSI Telecomunicación, are computed, which are then delivered to a modified version
Universidad Politécnica de Madrid, 28040 Madrid, Spain, and also with the of the Adaboost classifier [21] to allow active learning. This
ETSI Telecomunicación, Universidad Politécnica de Madrid, 28040 Madrid, is also the basis of the work by Dave et al. [22], focused
Spain (e-mail: abn@gti.ssr.upm.es, tmv@gti.ssr.upm.es; cdm@gti.ssr.upm.es;
cda@gti.ssr.upm.es; fjn@gti.ssr.upm.es; narciso@gti.ssr.upm.es). on recognizing just one of the vehicle rear lamps. For this
Digital Object Identifier 10.1109/TITS.2021.3053863 purpose, a color segmentation algorithm is applied to detect
1558-0016 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Rutgers University. Downloaded on May 15,2021 at 09:40:44 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
the classifier of Deformable Part Models (DPM) with infrared

imagery. In [29], the positions of the vehicles are predicted
using Local Binary Patterns (LBP) along with an Adaboost
detector. The result is refined by a KAZE-based bag-of-words
strategy. The work in [30] investigates local patterns in depth
imagery to improve the performance of state-of-the-art region
proposal networks. In [31], the detection is based on the Active
Basis Model (ABM) that uses edge information. Then, shape
symmetry is used to discard false positives, followed by a
random forest algorithm that classifies the different types of
vehicles. However, as before, these approaches cannot be used
indiscriminately, as they either use specialized hardware to
‘see’ in the dark to emphasize the outline of vehicles, such as
infrared or thermal cameras [32], or they assume that the vehi-
cles’ shape is sufficiently distinguishable without using this
kind of hardware. With respect to the first assumption, traffic
camera networks typically do not include specialized cameras
because of their high cost. Regarding the second, in the same
Fig. 1. Examples of nighttime images with flashes and complex illumination
patterns. Observe that the shape of the vehicles in the input image in (a) is lost,
way as for lights, vehicle shapes cannot usually be sufficiently
and only distorted lights can be appreciated. Additionally, in (b), the bright well distinguished if illumination conditions are hard enough,
regions produced by the vehicle lights (spotlights and the projected light on as glinting and blinding lights produce arbitrary bright regions
the road) are even disconnected.
in the images. This is the scenario considered in this paper.
Therefore, in this paper, a real-time algorithm for vehicle
bright blobs in the HSV color space. Then, the Color Inherited detection in nighttime scenarios that successfully faces pre-
Optical Flow (CIOF) algorithm [23] is used to track the vious challenges without making unrealistic assumptions is
blobs. In [24], vehicle detection is performed by detecting the presented. The proposed method uses existing traffic camera
brake light. To that end, they propose a three-step algorithm. networks that lack specialized camera sensors (such as infrared
First, a contrast enhancement technique is applied over the or thermal ones) to deal with complex and poorly illuminated
whole RGB image. Second, the brake light is detected using a images with heavily distorted lights, where neither the vehicle
Nakagami distribution model. Finally, an adaptive threshold is lights nor the vehicle shape can be identified most of the
computed to confirm or reject the previous selection of brake time. Since the light beams produced by the vehicles typically
light candidates. The work by Zhang et al. [25] addresses the cover large image areas, much larger than the space occupied
detection of vehicle headlights by means of a combination by the actual vehicle shape, a global image descriptor per
of a reflection intensity map and a reflection-suppressed map, frame is used, which is delivered to a new grid of foveal
which are obtained from the analysis of a light attenuation classifiers that cooperatively determine the location of the
model. Then, a bidirectional reasoning algorithm is used to vehicles in the scene. On top of that, the foveal classifiers
track the lights, and the resulting trajectories are used to use point-based annotations for the training process, speeding
estimate the speed of the vehicle. Shan et al. [26] also propose up the creation of the required databases, and in turn reducing
to detect headlights. A segmentation algorithm over image costs and deployment time of the proposed system. Although
intensity values is used to extract candidates. Then, lights a general framework is proposed, the system can be almost
belonging to the same car are matched using a distance crite- directly applied to different purposes (e.g. instantaneous or
rion. In [27], both rear and headlights are detected by using over-time vehicle count to indicate traffic occupation, simple
a bright object segmentation process based on a multilevel tracking of detections to predict vehicle failures or potential
histogram thresholding algorithm. The segmentation output is collisions…). In addition, other specific problematic situations
refined by a connected-component filtering of the brightest could be addressed by performing an application-oriented
objects. The main problem with all the previous works is that post-processing of the results.
they use sequences captured under not-so-severe illumination The organization of the paper is as follows. Section II
conditions, in which it can be assumed that vehicle lights describes the proposed vehicle detection algorithm. Section III
are always well defined. However, this is actually not the introduces the created point-based database. Section IV sum-
case for many existing traffic camera networks. In most marizes the obtained results with different variants of the
occasions, vehicle lights are heavily distorted, blinding the proposed framework and makes a comparison with other state-
camera, or producing large bright regions and artifacts that of-the-art works. Finally, conclusions and future work are
occupy substantial portions of the image (see Fig. 1 and drawn in Section V.
Fig. 8). These effects make very hard, or even impossible,
to individually identify the lights, and so to carry out the
II. A LGORITHM D ESCRIPTION
proposed methods successfully.
Other works are focused on detecting not only the lights, The framework proposed in this work for the detection of
but also the entire vehicle. In [28], they detect vehicles using vehicles under nighttime conditions is based on the use of a
BELL et al.: NOVEL SYSTEM FOR NVD BASED ON FOVEAL CLASSIFIERS WITH REAL-TIME PERFORMANCE 3
grid of foveal classifiers, which is able to successfully manage classification problems, and allows real-time performance.
unshaped light patterns using a global image descriptor and Nonetheless, additional image global descriptors have been
point-based annotations. Foveal classifiers are focused on a implemented and tested, as it is described in Section IV.
specific zone of the image (the fovea), and so their output Subsection II-A describes in more detail the proposed global
is limited to that area. However, their inputs are global, that image descriptor, called Global HOG (GHOG). Note that
is, they refer to the whole image. Furthermore, the size and a unique descriptor per image is computed, unlike existing
shape of the output zone or fovea are not manually specified, works that generate a myriad of descriptors per image, corre-
but intrinsically learned in the training process, since the sponding to different image regions. Thus, the computational
inputs, despite being global, contain the information related cost of the feature extraction task is heavily reduced, boosting
to this fovea. This fact allows that an individual classifier can real-time operation without the need of specialized hardware.
accurately locate vehicles in the image, despite their visual The second stage, Prediction of a score map using the grid
appearance being just bright regions covering large image of classifiers, receives as input the GHOG descriptor to be
areas, much larger than the actual space occupied by them. processed by the grid of foveal classifiers. Again, any type of
Additionally, point-based annotations are used instead of classifier can be employed in this stage, if it suits the require-
bounding boxes, as they present two important advantages. ments of the system. In the current implementation, SVMs are
First, under poor illumination conditions, the shape of the used (although Section IV describes additional results obtained
vehicles is not clear at all, and therefore bounding boxes are using other types of classifiers) along with an associated 2D
extremely difficult to place in the scene (even for generating reference point, in such a way that all of them form a 2D grid.
ground-truth annotations). Second, by using points, the system The output is a score map (arranged as the previous grid) in
deployment is sped up and its cost reduced. The reason is which a high positive value indicates that a vehicle is close
that the creation of the required databases with point-based to the reference point of the corresponding foveal classifier.
annotations is significantly quicker than with bounding-box As each classifier has all the image information, it can deal
annotations. In this regard, note that the annotation time not effectively with the nighttime scenarios, where the light pat-
only increases with the number of annotation points (two terns that vehicles produce can cover large image areas, even
for bounding boxes and one for points), but also due to disconnected ones. Regarding the SVMs, a linear kernel has
the fact that, when working with bounding-box annotations, been chosen to allow real-time performance in the recognition
the operator has to deal with changes in the object size, which stage. On the other hand, the specific configuration of the grid
is not the case with point-based annotations. of foveal classifiers is defined by the geometric arrangement
The proposed framework can be implemented using differ- of their reference points and the total number of classifiers.
ent global image descriptors and types of classifiers. Below, A regular hexagonal geometric pattern is used in this work,
a detailed description of the framework is offered using an as in this way the closest neighbors of any element of the grid
evolution of the Histogram of Oriented Gradients (HOG) as are at the same distance from this element [34]. Fig. 3 shows
the image descriptor, and Support Vector Machines (SVM) an example of this pattern, where each green point represents
as the type of classifier used in the grid. This combination the reference point of each classifier. The number of classifiers
is especially appealing due to its computational efficiency in the grid can be adapted according to the image resolution
using only a standard PC. However, the framework has been and the average size of the vehicles. Note that the presence of a
also implemented using other global image descriptors (based vehicle in an image can activate several foveal classifiers (high
on Haar wavelets, LBP, and pretrained VGG16 and ResNet positive score values in the corresponding score map), such as
features) and classifiers (neural networks -NN-). These are shown in Fig. 2, where the active foveal classifiers are marked
briefly described in Section IV along with a comparison of in magenta and the inactive ones in green. This situation
the results obtained by the different implementations of the is managed by the third and last stage, Vehicle location
proposed framework. determination, which uses the score values of the classifiers
The block diagram of the proposed machine learning frame- to accurately determine all the vehicle locations (blue points
work, shown in Fig. 2, is divided into two phases: prediction in Fig. 2). To that end, a non-maxima suppression process
system and training system. with the following steps is performed. First, it loops through
1) Prediction System: The prediction system phase predicts all the active classifiers and, for each of them, the scores of
point-based vehicle locations in input images by using a set all the active classifiers in its neighborhood (the seven closest
of learned models obtained in the training system phase. The classifiers) are summed up. If the result is not high enough,
prediction system can be further divided into three stages they are deactivated and set to zero. Later on, a grayscale
(see left part of Fig. 2): 1) Computation of the global image morphological dilation is performed using the same neighbor-
descriptor, 2) Prediction of a score map using the grid of hood expressed before. To end the non-maxima suppression
classifiers, and 3) Vehicle location determination. In the first process, the original map is subtracted from the dilated map
stage, Computation of the global image descriptor, a method obtained in the previous step. As a result, only maxima have
that computes a feature vector for representing a whole image positive values. Also, there will be as many vehicle detections
is required. For this purpose, multiple alternatives can be used as maxima. The final location of the vehicles is obtained by
from the literature, although in the current implementation, bilinear interpolation of the locations of the active classifiers
an evolution of HOG [33] has been used, as this descriptor belonging to the neighborhood of a maximum value, weighted
has been proven to obtain good results in solving pattern by their scores. Fig. 4 shows an example where magenta points
Fig. 2. Block diagram of the proposed solution.
classifier is potentially different, despite the fact that they all

share the same input GHOG. This is the result of the cus-
tomized (but automatic) training framework, illustrated in the
right part of Fig. 2 and described in the following subsection.
2) Training System: The training phase can be divided
into three stages: 1) Computation of global image descriptor,
2) Positive and negative sample association, and 3) Training
of the grid of classifiers. The first stage is the same as the one
described in the prediction stage. It computes a unique GHOG
vector per image in the training database. The second stage
adaptively associates the GHOG of an image in the training
database with a positive or negative label (indicating either the
Fig. 3. Hexagonal grid pattern representing the set of reference points of presence or not of a vehicle) for every foveal classifier. As a
the foveal classifiers. hexagonal grid is used, a positive label is assigned to the seven
foveal classifiers whose reference points are closest to the
represent classifiers with positive scores (in green numbers) point-based annotation. A negative label is assigned to the rest
and blue points represent the final estimations of each vehicle of the classifiers in the grid. As above-mentioned, the GHOG
point-based location. Note that the output of every foveal vector for each image can be at the same time a positive sample
Fig. 4. Illustration of the prediction process. In magenta, active classifiers belonging to a neighborhood of a local maximum score value (values in green),
and, in blue, final vehicle locations.
Fig. 5. Example of the dynamic label association for a GHOG vector and Fig. 6. Example of distribution of trained and non-trained classifiers.
the grid of foveal classifiers in the presence of a vehicle. Non-trained classifiers are represented using red points, and trained classifiers
using green crosses.
for a foveal classifier and a negative one for another. The result
is a map of positive and negative labels that indicate for every prediction stage. Regarding the third stage, Training of grid of
classifier if the only GHOG descriptor should be considered classifiers, the SVM classifiers are trained using a grid search
as a positive or negative training sample. This association is strategy, where several values of regularization are evaluated
illustrated in Fig. 5, where the vehicle ground-truth location per classifier [35].
is represented by the blue triangle, the reference points of Note that every foveal classifier has all the image informa-
the foveal classifiers that are closer to the ground-truth point tion available through the GHOG descriptor, instead of just
are marked using black crosses, the distance between the a limited image area. However, each of them will focus on
reference points and the ground-truth point is marked with specific components of this descriptor to detect vehicles close
a light blue line, and the reference points of the rest of foveal to their corresponding reference points. More specifically,
classifiers are marked with green points. Obviously, the total the training procedure via the dynamic sample association
number of positive and negative samples varies from classifier induces for every foveal classifier the following twofold behav-
to classifier. This fact can provoke that there can be classifiers ior: 1) to filter out the components of the GHOG descriptor that
with very few or no positive samples at all, because their are useless to detect vehicles around an image region vaguely
reference points are far from any vehicle annotation in the defined by the classifier reference point, and 2) to check if the
considered database. Many times, this is due to the absence descriptor components of interest present a pattern that can be
of vehicles in some image regions, very likely because they compatible with the presence of a vehicle.
show pieces of land or other elements where vehicles do
not pass (e.g. outside roads, countryside, shoulders, walls or
median strip). This situation is settled by finally training only A. Computation of the Global Image Descriptor
those foveal SVM classifiers with enough positive samples. The feature extraction stage computes a unique global
Fig. 6 shows an example of trained and non-trained classifiers, descriptor for each image, instead of obtaining a number of
using green crosses and red points, respectively. As a result, local descriptors (typical from sliding window and region
the output of this stage consists of a set of trained models for proposal approaches). Thus, a great reduction in the com-
the grid of foveal SVM classifiers, which will be used in the putational cost is achieved. As mentioned, in the current
Fig. 7. Block diagram of the global image descriptor GHOG.
Fig. 8. Sample images from the different scenarios of the NVD database. The regions of interest of each scenario are delimited by the green boundaries.
implementation, a variation of HOG is used to obtain a global subset of consecutive components in GHOG containing the
descriptor per image, called Global HOG (GHOG). It is HOG vector representation of such block.
computed in four steps (see Fig. 7). First, a multiresolution
pyramid strategy over the image is carried out, where the III. NVD DATABASE
biggest scale corresponds to the original image, and the others To the best of the authors’ knowledge, there is no database
are subsampled versions. This strategy makes the system large enough for vehicle detection in nighttime. Therefore,
invariant to the size of the vehicles and of their distorted light a new database, called Nighttime Vehicle Detection (NVD)
beams, which highly depends on the position of the camera database, has been created and made publicly available [36].
with respect to the road and the camera zoom. The second It is composed of sequences from five different highways. The
step divides every pyramid level into non-overlapping blocks acquisition was performed using static cameras belonging to
(instead of overlapping blocks) to speed up the process and actual traffic camera networks. These cameras provide low
reduce computational and memory costs. The third step com- quality imagery (highly compressed and with limited frame
putes the HOG descriptor for every block. Finally, in the rate), which boosts the complexity of the vehicle detection
fourth step, all the HOG descriptors (from all the pyramids task. A sample image from every scenario is shown in Fig. 8.
levels and corresponding blocks) are concatenated to form As can be observed, vehicles are not easily distinguished, and
the GHOG descriptor that represents the whole image. Note their lights usually produce flashes and complex illumination
that the resulting GHOG descriptor preserves, to some degree, patterns.
the spatial information of the original image, since there is The ground-truth information was generated by manually
a direct correspondence between every image block and the selecting and annotating a point for each vehicle depending on
TABLE I
S UMMARY OF THE M AIN C HARACTERISTICS OF THE D IFFERENT S EQUENCES OF THE NVD D ATABASE
its appearance in the image. When the vehicle is represented The recall is the percentage of correct detections (true pos-
as a host of heavily distorted lights (which is the usual case, itives) with respect to the total number of existing vehicles
as stated in the introduction), that point is approximately (true positives and false negatives), expressed by:
situated between the two front or rear lights. And when True Positives
the vehicle’s shape can be better distinguished (for example, R= (2)
True Positives + False Negatives
in the case of the closest vehicles to the camera in the
Richmond scenario, illustrated in Fig. 8), the centroid of The F1-Score is the harmonic mean of precision and recall,
the vehicle is approximately selected. The reason to that given by:
divergence is that the selected point was considered to be P·R
the most representative one in the underlying circumstances. F =2· (3)
P+R
In any case, estimating such representative point for each
Finally, the error distance D determines the localization error
vehicle is quite difficult and somewhat subjective in many
in pixels for the positives. It is defined as the Euclidean dis-
situations. Nevertheless, the main goal of this process is to
tance in pixels between the point-based ground-truth position
produce consistent annotations along time, so as to reliably
of a vehicle and the predicted position:
follow the vehicle’s trajectory.
In Fig. 8, each blue triangle placed over a vehicle represents D = (x gt − x d̂ )2 + (ygt − yd̂ )2 (4)
the corresponding ground-truth. Additionally, ground-truth is
restricted to the regions of interest (ROIs) delimited by the where x gt and ygt are the pixel coordinates of the vehicle
boundaries in green. Thus, outside the boundaries, there are position according to the ground-truth information, and x d̂ and
no ground-truth annotations, because either the size of vehicles yd̂ represent the pixel coordinates of the corresponding true
is too small to allow the annotation by a human operator, or no positive detection, d̂, which is the one that satisfies D ≤ Td ,
vehicles are expected in those regions (e.g. outside roads). where Td is a distance threshold.
A summary of the main characteristics of the sequences of
each scenario of the NVD database is presented in Table I. The
A. Analysis of the Primary Implementation
differences among them allow vehicle detection algorithms
to be trained and tested in a wide range of situations. For First, a study to evaluate the performance in terms of accu-
example, the image resolution makes considerable changes in racy and operation time of the primary variant is presented,
the size of vehicles measured in number of pixels. In addition, that is, the original implementation with the GHOG+SVM
the frame rate determines the perceived movement of vehicles, composition. To that end, the configuration of the global
and so the number of samples obtained of each one. descriptor and the grid of classifiers are initially introduced.
Later on, the influence of the grid resolution on the overall
performance of the system is studied. Lastly, some insights
IV. R ESULTS
about the performance of the system in the presence of a high
The developed algorithm has been tested with the NVD amount of vehicles are given.
database using different image descriptors (including pre- 1) Configuration of the Descriptor and the Grid of Clas-
trained neural networks such as ResNet [37] and VGG16 [38]) sifiers: The parameter values of the HOG descriptors have
and classifiers, leading to different implementations. Addition- been set as follows: window size of 16 × 16 pixels, block
ally, it has been compared with the state-of-the-art algorithms normalization of Nbl = 16 × 16 pixels, cells of Nc = 4 × 4
Faster R-CNN [39] and YOLOv3 [40]. pixels, and Nbi = 9 bins per cell histogram, which are a
The following metrics have been utilized to measure the reasonable trade-off between computational cost and classi-
performance of each algorithm in terms of classification fication performance. Regarding the multiresolution pyramid,
accuracy and computational cost: precision (P), recall (R), two scaled versions have been used: half and one-quarter
F1-Score (F), average processing time per image t¯, and the resolutions.
mean (μ D ) and standard deviation (σ D ) of the error distance A linear kernel has been used for all the SVMs, along with
in pixels (D). The precision is the percentage of correct an individually tuned regularization parameter. In addition,
detections, i.e. the percentage of vehicles correctly detected with the aim of using equivalently dense grids of classifiers in
(true positives) with respect to all the detected vehicles (true all the scenarios, the grid resolution has been set in such a way
positives and false positives), expressed as follows: that each classifier ‘controls’ a similar portion of the image,
True Positives regardless of the sequence, and that the number of pixels
P= (1)
True Positives + False Positives assigned to each classifier is approximately the same in both
TABLE II
C OMPARISON OF R ESULTS W ITH D IFFERENT G RID R ESOLUTIONS U SING THE P ROPOSED F RAMEWORK
W ITH GHOG D ESCRIPTORS AND SVM C LASSIFIERS
dimensions. Thus, the grid resolution has been determined an increase in the recall (R) can be perceived. Nevertheless,
considering the image resolution (width, W, and height, H ) using too many classifiers can lead to an increase of false
and the percentage of pixels assigned to each classifier in positives, also resulting in a poor performance. In this last
the smaller dimension, denoted as p p . So, given that H is case, however, it might be more damaging the fact that some
lower than W, the number of rows in the grid of classifiers of the classifiers of the grid can be badly trained, due to a lack
is computed as 100/ p p and the number of columns as of samples in the areas assigned to them. This can result in a
(W/H ) × 100/ p p . Furthermore, the scenario determines drop in the recall.
the number of finally trained classifiers, which depends on Therefore, as a general rule of thumb, the system is able to
the portion of the image where the ground-truth is annotated, detect very close vehicles, provided that a grid of classifiers
which is different for each scenario. dense enough is used, and the database has a sufficient number
Finally, Td (the distance threshold to consider a detection of samples to properly train every classifier. It is then key to
as true positive) has been set to be proportional to the image select a resolution of the grid of classifiers that is in line with
size: 8% of the image height, as it has been found to be a the expected change in size of the vehicles in the image, which,
reasonable trade-off between accuracy and applicability in all in turn, as said, depends on the resolution of the image.
scenarios. Fig. 9 illustrates the performance of the proposed system
2) Performance: Table II reveals the impact of the grid in crowded scenarios of the NVD database. As a reference,
resolution on the performance of the proposed foveal system a grid resolution of 50 × 89 classifiers was used for the
using GHOG descriptors and SVM classifiers in the five examples. It can be seen that the system produces good results,
scenarios included in the database. The results show that, although it was been specifically oriented to deal with crowded
if too few classifiers are used, the system yields mediocre scenarios (these situations are special cases usually addressed
results, as vehicles moving close together might be detected by explicitly dedicated approaches in the object detection
as just one. This is particularly pronounced in the furthest literature). Fig. 9 also shows that lights from the very end
sections of the road, since vehicles appear smaller and closer of the road are not classified as vehicles, because those areas
to each other due to the camera perspective. The performance were not included in the human-based annotation process.
improves as denser grids are used, at the cost of increasing Although this is not strictly a limitation of the developed
the number of classifiers to be trained, and also the prediction algorithm, if ground-truth annotations were available in these
time (see t¯). Regarding the error distance D, it tends to areas, the classification performance could probably decrease
decrease (as shown by the μ D and σ D values) with the grid there.
resolution size, as expected. This also implies that vehicles that With respect to the computational cost of the prediction
are relatively close to each other can be better distinguished phase presented in Table II, this has been obtained by using
with a denser grid resolution (providing that classifiers remain a single-thread-based implementation, executed on an Intel
properly trained with enough positive samples). In this case, Core i9-7900X @ 3.30 GHz, and therefore improvements are
image block, according to the considered image division.

Similarly to GHOG, the global image descriptors have been
created by concatenating all the obtained local feature vectors
(see Section II-A). The parameters that have led to the best
experiment results are listed next. For Haar: window size of
72 × 96 pixels, stride step of 64 pixels in both horizontal
and vertical, and compression of windows to the resolution
of 24 × 32 pixels have been selected considering the memory
limitations to train the classifiers. In the case of LBP: window
size of 32 × 32 pixels and neighborhoods of 3 × 3 pixels have
been selected. Lastly, for the VGG16-based and ResNet-based
descriptors, pretrained ImageNet challenge [41] weights have
been used to extract a global image descriptor directly from
the layer ‘block5 pool’ (18th layer) of the VGG16 back-
bone [38] and from the previous layer to the final block of
dense layers of the ResNet backbone [37]. This process has
implied to re-scale all images to 224 × 224 pixels. These
pretrained backbones were chosen for two main reasons.
First, they both are oriented to boost the recognition per-
formance, achieving state-of-the-art results in many different
applications. In addition, they can work in real-time using
the appropriate (high-performance) hardware. Second, these
backbones are frequently used in the literature as benchmarks
for the evaluation of the performance of new systems or frame-
Fig. 9. Examples of nighttime vehicle detection under dense traffic situations, works. As a consequence, other neural network architectures
where the final point-based detections are marked as blue dots. more oriented to real-time applications using hardware with
restricted computational capabilities, such as MobileNet [42],
were discarded, since they are less complex, and hence,
expected if a multi-thread implementation was used instead,
worse results are expected in general (independently of the
as several classifiers could be evaluated in parallel on different
application).
threads and cores. Computational cost values are shown for
Additionally, NNs have been implemented as an alternative
several parameter configurations: image resolution (different
to SVMs. These NNs are composed of three fully connected
for each scenario) and grid resolution. As expected, the greater
(or dense) layers: the first one with 128 elements, the second
the image and grid resolutions, the greater the processing
one with 64 elements, and the final one with 2 elements. After
time is, although all the values are compatible with real-time
every fully connected layer, a rectifier linear unit (ReLU) is
requirements.
used as activation function, followed by a dropout layer with
a factor of 50% for the two first fully connected layers. The
B. Comparison With Alternative Implementations and last layer is a sigmoid.
State-of-the-Art Algorithms Finally, all the considered variations of the primary system
In this second part of the experiments, the performance of have been implemented to be executed on a single-thread
the primary implementation has been compared with other CPU (hence, no specialized hardware is used). In addition,
alternatives, but still using the same framework, where the the variation consisting in pretrained VGG16 and ResNet
original GHOG and SVM algorithms have been replaced features with NN classifiers has been implemented on a
with other methods. Also, the comparison includes the GPU, too.
state-of-the-art works Faster R-CNN and YOLOv3. First, 2) Adaptation and Configuration of the State-of-the-Art
the configuration of the alternative descriptors and classi- Algorithms: Before presenting the parameters selected for
fiers is introduced. Next, the configuration of the competing Faster R-CNN and YOLOv3, some considerations must be
algorithms and the necessary adaptations that were carried taken into account. Most of the vehicle detection techniques
out to enable a proper comparison are described. Finally, (and, more in general, object detection ones) require bounding-
an overall performance comparison including the primary, box-based annotations for the training, instead of points. This
and the alternative implementations and the state-of-the-art is the case of Faster R-CNN and YOLOv3. Therefore, an adap-
algorithms is presented. tation was required to enable them to work with point-based
1) Description and Configuration of the Alternative annotations (instead of bounding boxes), as the ones provided
Descriptors and Classifiers: The following four descriptors by the proposed NVD database, and also to make them deliver
have been implemented as alternatives to GHOG: Global Haar point-based predictions to unify the metrics criteria for eval-
(GHaar), Global LBP (GLBP), VGG16, and ResNet. The uation. For the training part, bounding boxes of several sizes
descriptors GHaar and GLBP use the local image descriptors (width and height) are generated around every ground-truth
Haar and LBP, respectively, to compute a feature vector per point-based location, and then fed to the algorithms. Hence,
TABLE III
C OMPARISON OF THE D ETECTION P ERFORMANCE A MONG T HREE VARIATIONS OF THE P ROPOSED S YSTEM ,
FASTER R-CNN, AND YOLOV 3 FOR E ACH S CENARIO
the previous algorithms can be properly trained with the Network (RPN) and thresholds of the non-maxima suppres-
required bounding box annotations. Depending on their size, sion (NMS) for filtering RPN proposals have been tested (see
the bounding boxes encompass larger or smaller vehicle and Table III). The number of ROIs has been set to 2000 in
background regions, affecting detection performance. The one training, and 1000 for prediction. For the YOLOv3 algorithm,
that offers better results is finally selected. With respect to the different configurations of parameters have been used for the
predictions, the bounding boxes delivered by Faster R-CNN anchor scales and the score threshold. To improve the training
and YOLOv3 are transformed into point-based detections by convergence and the quality of the finally trained models,
just computing their centroids. pretrained weights from the ImageNet challenge [41] have
Considering the parameters of the adapted Faster R-CNN, been used to initialize both algorithms, instead of random ones.
different sets of anchor scales for the Region Proposal The use of pretrained models also implies resizing the images
to 640 × 640 and 416 × 416, for Faster R-CNN and YOLOv3, Finally, the primary variant (GHOG+SVM) achieves the
respectively. This image resizing is also mandatory due to the best overall performance considering both detection accuracy
memory limitations of the GPU devices for training. Finally, and operation time, closely followed by the VGG16+NN
the prediction stage of the algorithms has been tested under variant. Moreover, this is accomplished using just a standard
two different configurations: 1) using exclusively a CPU and CPU. Also, its localization accuracy is very close to the best
2) using high-end GPU (Nvidia Titan Xp). In the case of CPU, results achieved by YOLOv3. Lastly, better results in detection
a multi-threading implementation has been used, since it is and localization accuracy are expected using larger databases,
natively supported by the deep learning library used for the since no pretrained models have been used, unlike with Faster
implementation. R-CNN and YOLOv3.
3) Performance Comparison: Table III presents a detection
performance comparison, using the NVD database, among V. C ONCLUSION AND F UTURE W ORK
Faster R-CNN, YOLOv3, and the three best variants within
Visual vehicle surveillance is an important task for traffic
the proposed framework: GHOG+SVM, VGG16+SVM, and
VGG16+NN. It is important mentioning that the detection applications and it becomes critical in the nighttime, when
the probability of an accident peaks. In this work, a real-time
performance of the alternatives based on the GHaar and GLBP
vehicle detection algorithm for nighttime situations has been
have been significantly worse than the rest, and thus they
presented to face this challenging task, where images basically
have not been included in the table. Specifically, the best
include flashes that occupy large image regions, which can
F1-Score that results from using GHaar is 0.402, obtained
in the Walnut sequences, and the best one using GLBP is even be disconnected from the light focuses, and the actual
shape of vehicles is not well defined. By using a global
0.366, obtained for the California scenario. These very poor
image descriptor along with a grid of foveal classifiers, vehicle
results are attributed to different reasons. In the case of
GHaar, results were due to the significantly lower robustness positions are accurately and efficiently estimated. Every clas-
sifier shares the same global image descriptor (unlike most
to illumination changes and image noise of this descriptor in
methods, which use the traditional sliding window approach),
comparison with GHOG, which are very common in the NVD
but it is trained to detect vehicles in specific image regions
database scenarios. Concerning GLBP, results respond to the
by mainly analyzing the complex light patterns. Additionally,
fact that LBP was originally conceived for texture recognition,
and not exactly for object recognition with a global struc- only point-based annotations are required to train the system,
speeding up the costly task of creating databases. The output
ture. Similarly, ResNet+SVM, and ResNet+NN were finally
of the proposed system could be used for specific applications
not included in the table since their F1-Score figures were
clearly lower (around 10%) than the ones obtained using the such as traffic congestion problems or collisions avoidance.
Finally, a new nighttime database with point-based annotations
VGG16 network.
has been created and made publicly available to assess the
As can be observed in the table, the two best results
effectiveness of the proposed method and allow for comparison
in terms of detection performance have been obtained by
under realistic conditions.
two different implementations of the proposed framework,
GHOG+SVM and VGG16+NN, according to the mean values Additionally, even though the proposed system is focused
on nighttime situations, the convenience of upgrading it to
of the F1-Score. It is also noticeable that these two vari-
also work in daytime scenarios is highly considered, so as
ants, along with VGG16+SVM, are much faster than Faster
R-CNN and YOLOv3 using a CPU. Even more, the variation to have a complete solution. In this sense, albeit not tested in
these situations, the system could have the potential to be used
GHOG+SVM using a single-thread CPU is faster than R-CNN
in daytime scenarios with minor modifications by training it
and YOLOv3 using GPU. This encourages the deployment of
using suitable databases. Hence, this extension is proposed as
the proposed system in a network of traffic cameras, where
future work.
the cost of the processing hardware is a factor of paramount
relevance.
Regarding the accuracy of the detections, Faster R-CNN R EFERENCES
and YOLOv3 achieve the best results according to the error [1] International Organization of Motor Vehicle Manufacturers (OICA).
distance metrics μ D and σ D . Nonetheless, GHOG+SVM (2019). World Motor Vehicle Sales 2005-2018. [Online]. Available:
performs very closely to them. The main explanation for this http://www.oica.net/wp-content/uploads/total_sales_2018.pdf
[2] World Health Organization (WHO). (2019). Global Sta-
fact is that Faster R-CNN and YOLOv3 have been pretrained tus Report on Road Safety 2018. [Online]. Available:
with the external ImageNet database, which alleviates the https://www.who.int/violence_injury_prevention/road_safety_status/
fact that the size of the NVD database is relatively small 2018/English-Summary-GSRRS2018.pdf
[3] J.-W. Park and B. C. Song, “Night-time vehicle detection
in comparison with other databases used by deep learning using low exposure video enhancement and lamp detection,” in
techniques. In this sense, a larger nighttime vehicle database Proc. Int. Conf. Electron., Inf., Commun. (ICEIC), Jan. 2016,
could significantly improve the different implementations of pp. 3–4.
[4] S. E. Shladover, J. Vanderwerf, and D. R. Ragland, “Robust vehicle
the proposed framework (which are trained solely based on the detection and distance estimation under challenging lighting condi-
NVD database), since reliable denser grids would be possible tions,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 5, pp. 2723–2743,
(there would be more samples to properly train a higher Oct. 2015.
[5] X. Li et al., “A unified framework for concurrent pedestrian and cyclist
number of classifiers), leading in turn to a better localization detection,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 2, pp. 269–281,
accuracy. Feb. 2017.
[6] S. Zain Masood, G. Shu, A. Dehghan, and E. G. Ortiz, “License [29] Y.-S. Chen, J.-C. Chien, and J.-D. Lee, “KAZE-BOF-based large vehi-
plate detection and recognition using deeply learned convolu- cles detection at night,” in Proc. Int. Conf. Commun. Problem-Solving
tional neural networks,” 2017, arXiv:1703.07330. [Online]. Available: (ICCP), Sep. 2016, pp. 2–3.
http://arxiv.org/abs/1703.07330 [30] V. D. Nguyen, D. T. Tran, J. Y. Byun, and J. W. Jeon, “Real-time
[7] A. Mhalla, T. Chateau, S. Gazzah, and N. Essoukri Ben Amara, vehicle detection using an effective region proposal-based depth and
“An embedded computer-vision system for multi-object detection in 3-Channel pattern,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 10,
traffic surveillance,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 11, pp. 3634–3646, Oct. 2019.
pp. 4006–4018, Nov. 2019. [31] S. Kamkar and R. Safabakhsh, “Vehicle detection, counting and classi-
[8] F. Biondi, D. L. Strayer, R. Rossi, M. Gastaldi, and C. Mulatti, fication in various conditions,” IET Intell. Transp. Syst., vol. 10, no. 6,
“Advanced driver assistance systems: Using multimodal redundant warn- pp. 406–413, Aug. 2016.
ings to enhance road safety,” Appl. Ergonom., vol. 58, pp. 238–244, [32] Y. Choi et al., “KAIST multi-spectral day/night data set for autonomous
Jan. 2017. and assisted driving,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 3,
[9] B. Yang and C. Monterola, “Efficient intersection control for minimally pp. 934–948, Mar. 2018.
guided vehicles: A self-organised and decentralised approach,” Transp. [33] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
Res. C, Emerg. Technol., vol. 72, pp. 283–305, Nov. 2016. detection,” in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
[10] S. Lange, F. Ulbrich, and D. Goehring, “Online vehicle detection using Recognit. (CVPR), Jun. 2005, pp. 886–893.
deep neural networks and lidar based preselected image patches,” in [34] E. Dubois, “The sampling and reconstruction of time-varying imagery
Proc. IEEE Intell. Vehicles Symp. (IV), Jun. 2016, pp. 954–959. with application in video systems,” Proc. IEEE, vol. 73, no. 4,
[11] X. Wang, L. Xu, H. Sun, J. Xin, and N. Zheng, “On-road vehicle pp. 502–522, Apr. 1985.
detection and tracking using MMW radar and monovision fusion,” IEEE [35] I. Syarif, A. Prugel-Bennett, and G. Wills, “SVM parameter optimization
Trans. Intell. Transp. Syst., vol. 17, no. 7, pp. 2075–2084, Jul. 2016. using grid search and genetic algorithm to improve classification per-
[12] B. Tian et al., “Hierarchical and networked vehicle surveillance in ITS: formance,” TELKOMNIKA (Telecommun. Comput. Electron. Control),
A survey,” IEEE Trans. Intell. Transp. Syst., vol. 18, no. 1, pp. 25–48, vol. 14, no. 4, p. 1502, Dec. 2016.
Jan. 2017. [36] Nighttime Vehicle Database (NVD). Accessed: Jan. 15, 2021. [Online].
[13] A. I. Maqueda, C. R. del Blanco, F. Jaureguizar, and N. García, “Struc- Available: http://www.gti.ssr.upm.es/data/NVD_database
tured learning via convolutional neural networks for vehicle detection,” [37] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
Proc. SPIE, vol. 10223, May 2017, Art. no. 1022302. image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
[14] X. Hu et al., “SINet: A scale-insensitive convolutional neural network (CVPR), Jun. 2016, pp. 770–778.
for fast vehicle detection,” IEEE Trans. Intell. Transp. Syst., vol. 20, [38] K. Simonyan and A. Zisserman, “Very deep convolutional networks
no. 3, pp. 1010–1019, Mar. 2019. for large-scale image recognition,” 2014, arXiv:1409.1556. [Online].
[15] W. Tian, L. Chen, K. Zou, and M. Lauer, “Vehicle tracking at nighttime Available: http://arxiv.org/abs/1409.1556
by kernelized experts with channel-wise and temporal reliability estima- [39] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
tion,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 10, pp. 3159–3169, time object detection with region proposal networks,” in Proc. Adv.
Oct. 2018. Neural Inf. Process. Syst., C. Cortes, N. D. Lawrence, D. D. Lee,
M. Sugiyama, and R. Garnett, Eds. Red Hook, NY, USA: Curran
[16] C.-C. Yu, H.-Y. Cheng, and Y.-F. Jian, “Raindrop-tampered scene detec-
Associates, 2015, pp. 91–99.
tion and traffic flow estimation for nighttime traffic surveillance,” IEEE
[40] J. Redmon and A. Farhadi, “YOLOv3: An incremental
Trans. Intell. Transp. Syst., vol. 16, no. 3, pp. 1518–1527, Jun. 2015.
improvement,” 2018, arXiv:1804.02767. [Online]. Available:
[17] R. Gallen, A. Cord, N. Hautiere, E. Dumont, and D. Aubert, “Nighttime
http://arxiv.org/abs/1804.02767
visibility analysis and estimation method in the presence of dense fog,”
[41] O. Russakovsky et al., “ImageNet large scale visual recognition chal-
IEEE Trans. Intell. Transp. Syst., vol. 16, no. 1, pp. 310–320, Feb. 2015.
lenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, Dec. 2015.
[18] X. Dai, X. Yuan, J. Zhang, and L. Zhang, “Improving the performance
[42] A. G. Howard et al., “MobileNets: Efficient convolutional neural
of vehicle detection system in bad weathers,” in Proc. IEEE Adv.
networks for mobile vision applications,” 2017, arXiv:1704.04861.
Inf. Manage., Communicates, Electron. Autom. Control Conf. (IMCEC),
[Online]. Available: http://arxiv.org/abs/1704.04861
Oct. 2016, pp. 812–816.
[19] H. Kuang, X. Zhang, Y.-J. Li, L. L. H. Chan, and H. Yan, “Night-
time vehicle detection based on bio-inspired image enhancement and
weighted score-level feature fusion,” IEEE Trans. Intell. Transp. Syst.,
vol. 18, no. 4, pp. 927–936, Apr. 2017.
[20] R. K. Satzoda and M. M. Trivedi, “Looking at vehicles in the night:
Andrés Bell received the B.Eng. degree in telecom-
Detection and dynamics of rear lights,” IEEE Trans. Intell. Transp. Syst.,
munication technologies and services and the mas-
vol. 20, no. 12, pp. 4297–4307, Dec. 2019.
ter’s degree in telecommunication engineering from
[21] G. Rätsch, T. Onoda, and K.-R. Müller, “Soft margins for AdaBoost,” the Universidad Politécnica de Madrid (UPM),
Mach. Learn., vol. 42, no. 3, pp. 287–320, 2001. Madrid, Spain, in 2015 and 2017, respectively,
[22] P. Dave, N. M. Gella, N. Saboo, and A. Das, “A novel algorithm for night where he is currently pursuing the Ph.D. degree.
time vehicle detection even with one non-functional taillight by CIOF Since 2014, he has been a member of the Grupo de
(Color inherited optical Flow),” in Proc. Int. Conf. Pattern Recognit. Tratamiento de Imágenes (Image Processing Group),
Syst. (ICPRS), 2016, pp. 2–7. UPM, where he has been actively involved in Span-
[23] P. Dave, N. M. Gella, N. Saboo, and A. Das, “A novel algorithm for night ish and European projects. His research interests
time vehicle detection even with one non-functional taillight by CIOF include in the area of computer vision, machine
(Color inherited optical Flow),” in Proc. Int. Conf. Pattern Recognit. learning, and video and image analysis and processing.
Syst. (ICPRS), 2016, pp. 1–6.
[24] D.-Y. Chen, Y.-H. Lin, and Y.-J. Peng, “Nighttime brake-light detection
by Nakagami imaging,” IEEE Trans. Intell. Transp. Syst., vol. 13, no. 4,
pp. 1627–1637, Dec. 2012.
[25] W. Zhang, Q. M. J. Wu, G. Wang, and X. You, “Tracking and pairing
vehicle headlight in night scenes,” IEEE Trans. Intell. Transp. Syst., Tomás Mantecón received the joint B.Sc. and
vol. 13, no. 1, pp. 140–153, Mar. 2012. M.S. degree telecommunication engineering and
[26] X. Shan, Z. Hong, Y. Shunyuan, W. Dong, and Z. Bingyan, “An the Ph.D. degree in telecommunication engineering
algorithm for headlights region detection in nighttime,” in Proc. IEEE from the Universidad Politécnica de Madrid (UPM),
Int. Conf. Inf. Autom. (ICIA), Aug. 2016, pp. 1578–1582. Madrid, Spain, in 2012 and 2018, respectively. Since
[27] Y.-L. Chen, B.-F. Wu, H.-Y. Huang, and C.-J. Fan, “A real-time vision 2013, he has been a member of the Grupo de
system for nighttime vehicle detection and traffic surveillance,” IEEE Tratamiento de Imágenes (Image Processing Group),
Trans. Ind. Electron., vol. 58, no. 5, pp. 2030–2044, May 2011. UPM, where he has been actively involved in Span-
[28] H. Tehrani, T. Kawano, and S. Mita, “Car detection at night using ish and European projects. His research interests
latent filters,” in Proc. IEEE Intell. Vehicles Symp. Proc., Jun. 2014, include in the area of image processing, computer
pp. 839–844. vision, pattern recognition, and machine learning.
César Díaz received the joint B.Sc. and M.S. degree Fernando Jaureguizar received the degree in
in telecommunication engineering and the Ph.D. telecommunication engineering and the Ph.D. degree
degree in telecommunication engineering from the in telecommunication from the Universidad Politéc-
Universidad Politécnica de Madrid (UPM), Madrid, nica de Madrid (UPM), in 1987 and 1994, respec-
Spain, in 2007 and 2017, respectively. Since 2008, tively. Since 1987, he has been a member of the
he has been a member of the Grupo de Tratamiento Image Processing Group, UPM. Since 1991, he has
de Imágenes (Image Processing Group), UPM, been a member of the faculty of UPM, and since
where he has been actively involved in Spanish 1995, he has been an Associate Professor of signal
and European projects. His research interests include theory and communications. His professional inter-
in the area of multimedia delivery and immersive ests include digital image processing, video coding,
communications. 3-DTV, computer vision, and design and develop-
ment of multimedia communications systems. He has been actively involved in
European Projects (Eureka, ACTS, ITEA, and EIT-RM) and national projects
in Spain.
Narciso García (Member, IEEE) received the

degree in ingeniero de telecomunicación (five years
engineering program) and the Ph.D. degree in
ingeniero de telecomunicación (in communications)
from the Universidad Politécnica de Madrid (UPM),
Madrid, Spain, in 1976 and 1983, respectively. Since
1977, he has been a member of the faculty of UPM,
where he is currently a Professor of Signal Theory
and Communications. He also leads the Grupo de
Tratamiento de Imágenes (Image Processing Group),
Carlos R. del-Blanco received the degree in UPM. He has been actively involved in Spanish
telecommunication engineering and the Ph.D. degree and European research projects, also serving as an Evaluator, a Reviewer,
in telecommunication from the Universidad Politéc- an Auditor, and an Observer of several research and development programs
nica de Madrid (UPM), in 2005 and 2011, respec- of the European Union. He was a Co-Writer of the EBU proposal, base of
tively. Since 2005, he has been a member of the ITU standard for digital transmission of TV at 34–45 Mb/s (ITU-T J.81).
the Image Processing Group, UPM. Since 2011, He was an Area Coordinator of the Spanish Evaluation Agency (ANEP) from
he has also been a member of the Faculty of the 1990 to 1992. He was the General Coordinator of the Spanish Commission for
E. T. S. Ingenieros de Telecomunicación, as an the Evaluation of the Research Activity (CNEAI) from 2011 to 2014. He has
Assistant Professor of signal theory and communica- been the Vice-Rector of International Relations, UPM, from 2014 to 2016.
tions with the Department of Signals, Systems, and His current research interests include digital video compression, computer
Communications. His professional interests include vision, and quality of experience. He was a recipient of the Junior and Senior
signal and image processing, computer vision, pattern recognition, and Research awards of UPM, in 1987 and 1994, respectively. He received the
machine learning, and stochastic dynamic models. Spanish National Graduation Award and the Ph.D. Graduation Award.

Basepaper ngt1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Basepaper ngt1

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 1

A Novel System for Nighttime Vehicle Detection

2 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

the classifier of Deformable Part Models (DPM) with infrared

4 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Fig. 2. Block diagram of the proposed solution.

classifier is potentially different, despite the fact that they all

6 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Fig. 7. Block diagram of the global image descriptor GHOG.

8 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

image block, according to the considered image division.

10 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

12 IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS

Narciso García (Member, IEEE) received the

You might also like