Automated Accident Detection System: Charles Harlow and Yu Wang

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

90 ■ Transportation Research Record 1746

Paper No. 01-2453

Automated Accident Detection System


Charles Harlow and Yu Wang

The development of a system for automatically detecting and reporting Consider the problem of determining vehicle accidents from
traffic accidents at intersections was considered. A system with these acoustic signals. The problem of identifying vehicle accidents from
properties would be beneficial in determining the cause of accidents and acoustic records may be stated generically as an object identifica-
could also be useful in determining the features of the intersection that tion problem. The objects to be recognized in this situation are
have an impact on safety. A complete system would automatically detect vehicle accidents that must be distinguished from the background
and record traffic conditions associated with accidents such as time of traffic and other environmental noise. The 1-D signal in this case
the accident, video of the accident, and the traffic light signal controller would be audio measurements of vehicular traffic. An important
parameters. The basic research required to develop the system is con- step in object identification is to determine measurements that can
sidered. This involves developing methods for processing acoustic sig- form the basis for a classification system. These measurements
nals and recognizing accident events from the background traffic form a feature vector that is the input to a classification system. It
events. A database of vehicle crash sounds, car braking sounds, con- is necessary for each object to be identified to have distinguishable
struction sounds, and traffic sounds was created. The mel-frequency feature vectors. In this application, it is necessary for accidents to be
cepstral coefficients were computed as a feature vector for input to the detected from the feature vector extracted from the acoustic signal.
classification system. A neural network was used to classify these fea- There are several factors that complicate the object recognition
tures into categories of crash and noncrash events. The classification process. One problem is the communication channel that will vary
testing results achieved 99 percent accuracy. even if the object is the same. That is, sensors placed in different
environments will behave somewhat differently. They will produce
different signals for the same vehicle crash. Different traffic inter-
In this research the authors consider the development of a system sections and different weather conditions will affect the signals and
for automatically detecting and reporting traffic accidents. Such a distort the feature vectors for the objects. Another complication is
device could be useful at high-traffic intersections where accidents that objects in the same class, for example, collisions of the same
are likely to occur. These intersections will usually have signal type, will produce somewhat different signals. Finally, noise and
lights for traffic control. A complete system would automatically multiple signals from different sources will distort the signals.
detect and record traffic conditions associated with accidents such There are many different sources of acoustic signals that may occur
as time of the accident, video of the accident, and the traffic light at different intersections. These include normal traffic sounds, large
signal controller parameters. A system with these properties would trucks, and other vehicles that produce loud sounds in normal opera-
be beneficial in determining the cause of accidents. The informa- tion. There may also be construction, maintenance, and industrial
tion could also be useful in determining features of the intersection
activity in the vicinity. In addition, various types of accidents may
that have an impact on safety.
occur. Vehicles may crash into other vehicles or stationary objects.
In this project the basic research required to develop a system to
Vehicles may have head-on, rear-end, or side collisions. There may
detect accidents was considered. This involved developing methods
be braking sounds, tires skidding, and horns.
for processing acoustic signals and recognizing accident events from
the background traffic events that occur at intersections. If methods
can be developed for reliably recognizing accident events at intersec-
tions, then a system can be developed with complete functionality. An BACKGROUND
acoustic system was chosen for its simplicity and cost-effectiveness.
It is also the case that accidents create distinctive sound events. Most of the work in classifying acoustic signals has been done in
At the present time a device to detect accidents and report traffic association with speech recognition (4, 5). A number of methods
signal conditions does not exist. Work has been conducted on video have been explored for extracting measurements from the acoustic
systems to monitor traffic conditions and provide input to signal signal. A common method used is the mel-warped cepstra trans-
controllers for traffic control (1, 2). Video systems have also been form. The mel-scale is based on the nonlinear human perception of
used for incident detection (3). Incidents are detected when a vehi- sound frequencies (6, 7). Cepstral coefficients are often the pre-
cle stops or dramatically reduces its speed. The system proposed in ferred measurements extracted from an acoustic signal for speech
this work will seek to determine accidents directly from the acoustic recognition (4).
signal of the accident. The system is relatively simple because an Acoustic data have been used in traffic monitoring (1). The goal was
acoustic signal rather than a video signal is being processed. The to count the vehicles in traffic patterns on roadways. The system was
accident detection system can operate in a stand-alone configuration found to perform better when located closer to the traffic and away
or, alternately, provide complementary information when other from echo-producing structures such as bridges. It was affected
sensors are available. somewhat by cold weather. High traffic volumes caused under-
counting. Other technologies such as passive infrared, radar, and
Department of Electrical and Computer Engineering, Louisiana State University, Doppler microwave had better performance at vehicle counting.
Baton Rouge, LA 70803. Although acoustic technology is not the best for counting vehicles, it
Harlow and Wang Paper No. 01-2453 91

has a role in detecting accidents. Accidents are more likely to have a Feature Extraction
distinctive acoustic signal than individual vehicles in a traffic stream.
Consider the problem of extracting features from the acoustic signal.
An audio signal can be characterized by the short-time Fourier spec-
trum of a signal (6, 10). The short-time Fourier transform (STFT) is
METHODOLOGY defined as
This section provides descriptions of the data used in the study and ∞
the manner in which they were acquired, the features that were
extracted from the acoustic samples, and the neural network classi-
sˆ( w ) = ∫ s(t )g(t ) exp(− j 2πtw)dt
−∞

fication approach.
where g(t) is a windowing function. The function g should have
support over a finite time interval so that it localizes the response
in time. One common form for g is the Gaussian function
Vehicle Accident Database Formation
 (t − µ ) 
2
The authors developed an acoustic database of traffic sounds, con- g(t ) = exp − 
struction sounds, and accident sounds. The information contained in  2σ 2 
the database consists of accident location, acoustic records, a descrip-
tion of the event (type of accident) and vehicles involved, and envi- The Hamming window (11) where
ronmental information. The equipment used in the field for recording
2 πn 
vehicle acoustic signals consisted of sound-recording equipment and g(n) = 0.54 − .46 cos 0 ≤ n < N
a form to manually enter information related to the event. The  N − 1
acoustic equipment consisted of a Larson-Davis Model 712 sound
level meter, Electro-Voice RE55 microphones, and a Sony TCD-D8 is often used in acoustic processing.
DAT Walkman. The sound data were recorded on the DAT recorder A common method used for feature extraction is the mel-warped
at 44.1 kHz. cepstra transform. Let FFT be the fast Fourier transform and FFT −1
Acoustic data of accidents are difficult to obtain. Arrangements be its inverse. Let s be the signal over a frame, then the cepstra
were made with the Texas Transportation Institute (TTI) Crash Test transform is cep(s) = FFT −1(log|FFT(s)|). Since (log|FFT(s)|) is real
Facility located in College Station, Texas, to collect data of vehicle and symmetric, the FFT and FFT−1 are equal up to a multiplicative
crashes. The authors recorded traffic sounds at intersections and con- constant. The cepstrum is, therefore, the spectrum of the log of the
struction sounds from construction sites in Baton Rouge to complete spectrum. The mel-warped spectrum is obtained by attenuating
the database. The traffic sounds were recorded at an intersection with high frequencies in (log|FFT(s)|) before taking the inverse trans-
a high volume of traffic, including a substantial amount of large truck form. The mel-scale is based on the nonlinear human perception
traffic. The total amount of data available for analysis and testing con- of sound frequencies (6, 7 ). Generally, about 10 to 14 of the low
sisted of 28 crash sounds from vehicle crashes, 2 car braking sounds, cepstral coefficients are required for recognition. The use of the
8 sounds of a pile driver at a construction site, and 200 sound records cepstra has often proved an effective approach in audio object
of routine traffic at intersections. The crash sounds consisted of small recognition tasks (5).
cars, regular-size cars, pickups, and large delivery trucks hitting crash Figure 1 shows the sound prms signal from a vehicle crash. The
barriers of different types and, in some cases, a rear-end collision. data here have been significantly reduced from the raw sound data
Because the accident data were recorded at a crash test facility, the collected for plotting purposes. Figure 2 is a plot of a sound signal
word “crash” is often used to refer to an accident.
Sound data are often processed in the root mean square (rms) of the
sound signal pressure p(t). The form is

1

T2
prms = p 2 (t )dt
T2 − T1 T1

The term Lp refers to the logarithmic form

 p
Lp = 20 log10  
 p0 

with units of decibels. The equivalent continuous sound level


over a specified time interval is defined as (8)

p 
Leq = 20 log10  rms 
 p0 

It is important to relate these basic units to the data obtained from


the recording system (9). FIGURE 1 Vehicle crash sound signal.
92 Paper No. 01-2453 Transportation Research Record 1746

units that are linked via weighted interconnections (13). A process-


ing unit is an equation that is often referred to as a transfer function.
A processing unit takes weighted signals from other neurons and
combines them, transforms them, and outputs a numeric result. An
individual computational element sums input values, which produces
a nonlinear output. Large assemblies of these simple elements can
solve problems requiring massive constraint satisfaction. ANNs have
the additional advantage of learning the optimal connection weights
between processing elements. This learning process eliminates the
tedious programming that often accompanies complex problems.
ANNs can be grouped into simple-layer and multiple-layer nets.
They may have a number of hidden layers between the input and the
output layer. Backpropagation (BP) neural networks are probably the
most common neural architecture implemented today. BP networks
are multilayer feed-forward networks. Data flow in one direction,
thus “feed-forward,” and have multiple layers, thus “multilayer.” BP
is actually an iterative gradient descent learning algorithm used to set
the weights.
FIGURE 2 Normal traffic sound signal. One reason BP neural networks are so common is because they
perform relatively well on a wide variety of applications (13). They
can also be used for functional modeling, classification, diagnostic,
from normal traffic at an intersection. All the sound signals were and time series types of applications. Probabilistic neural networks
saved in “wav” format at intervals 3 s in length each and were sam- (PNNs) also can be used as classifiers, but they require many data.
pled at 22.05 kHz. To reduce the calculations during the signal Because of a limitation of data, the authors chose the BP as their
processing, the data were resampled to 8 kHz, which proved ade- classifier with one hidden layer.
quate. A Hamming window function is applied to every 100-ms
frame of the data with a 50-ms frame overlapping. The FFT is Classifier Implementation
applied to the signal, and data are retained corresponding to the
largest frequency allowed for the sampling rate of 8,000 Hz. The The sound database was composed of 28 vehicle crash samples,
inverse FFT of this signal is taken to get the cepstral coefficients. 200 normal traffic samples, and 10 samples classified as “other.”
Only the first few cepstral coefficients give detailed signal compo- The TTI collected the crash samples, and the normal traffic sounds
nents. The rest of the coefficients provide the envelope information were recorded from a local intersection. These traffic samples
of the signal. The first 12 cepstral coefficients are used as the input include various types of vehicles such as passenger cars, pickups,
for the neural network. An improvement over the mel-frequency cep- trucks, buses, and motorcycles. The authors also collected some car
stral coefficients (MFCCs) is accomplished by applying linear filter- braking sounds as well as pile driving sounds from a construction
ing to the signal after taking the FFT and before taking the logarithm. site and placed these samples in the classification named “other.”
Nonlinear filter banks are then applied on the cepstrum. The classification stage consisted of two steps. The first step was
the training of the neural network. Part of the available samples
Overview of Feature detailed above were used as training samples. The authors trained the
Extraction and Classification neural network with the features extracted from each training sample
where the network knows the desired output. The neural network
The following are the basic steps in the processing of the sound sig- adjusted the weight matrix during the training according to the input
nals. First, the sound events to be classified are digitized. Second, and output pairs to reach the minimum error. The second step was
the first 12 MFCCs that represent the spectral-domain content of the testing; once the training was completed, the system was tested with
sound are computed. These features are computed over small time the remaining test data as samples.
intervals of the signal called a frame. Third, a feed-forward neural After the features were extracted, they were rearranged in the
network is used to classify these features into categories of crash and required format of the Matlab Neural Network Toolbox as the
noncrash at each frame. inputs to a feed-forward neural network. A threshold was set for
the minimum signal strength so that weak sound signals would not
be processed, thus avoiding the expense of unnecessary computa-
Classification with Neural Networks tions. Only two classification results were considered: crash or
noncrash.
Consider signal classification, which is the process of identifying the
object associated with a given input signal. Once the features have
been extracted from the input data, there will be an n-dimensional Classification Experiments
feature vector that is the input to a classifier. Neural networks were
selected for the classifier for their power and convenience, since a The neural network was trained and tested with frame lengths set at
number of implementations are readily available. The Matlab Neural 20, 32, 100, 200, 300, 400, and 500 ms. It was shown that a shorter
Network Toolbox was used for the implementation (12). frame length gives better classification results. The results suggested
Artificial neural networks (ANNs) attempt to solve complex prob- use of a 20-, 32-, or 100-ms frame length. It was decided to use a
lems with an architecture that mimics the nervous system. ANNs are 100-ms frame length because the 20- and 32-ms frame lengths take
mathematical systems that are comprised of a number of processing longer to process.
Harlow and Wang Paper No. 01-2453 93

The features were extracted from the frames in the order that the Acoustic detectors were affected by snow, extreme cold, and heavy
frame appears in the sound record. Each set of features extracted from traffic and may be affected by location (1). The acoustic sensor must
a frame was fed into the network for classification. Weak sound sig- be located near the intersection. The acoustic sensor was located
nals were detected and ignored. Some of the frames of the crash sig- near the signal controller box. It would be useful to obtain more data
nal may have been mistakenly classified because the training samples under a wide variety of conditions. Building a prototype system and
do not represent all possible patterns of a crash. Thus, a method was installing it at intersections for further data collection and evalua-
employed to remove the misclassified parts of the signal and give a tion could accomplish this. The classification system has been con-
correct final decision. The total number of frames labeled a crash and structed so that it can be readily trained as new data are obtained.
the total number of frames labeled a noncrash were computed. If the Larger training sets would make the classification more reliable.
total number of frames labeled a crash sound were greater than the The system should be cost-effective. The cost of a system
number of frames labeled noncrash, then the signal for classification depends very much on the vendor and the size of the market. Any
purposes was considered a crash sound. Otherwise, it was considered modern PC or digital signal processing unit could meet the pro-
a noncrash sound. cessing requirements. The necessary computing and sound equip-
In the final experiment, the 100-ms frame length was used. The ment can be purchased for under $1,000, with the price decreasing
training samples were selected such that the number of training and every year.
crash sounds was the same. This experiment used 16 crash sounds,
16 traffic sounds, 1 braking sound, and 1 pile-driving sound for train-
ACKNOWLEDGMENTS
ing. Once the neural network was trained, the performance of the net-
work was tested by using the test data. The test data had 184 traffic This work was supported by the Louisiana Transportation Research
samples, 12 crash samples, and 8 other sounds. The classification Center. The Texas Transportation Institute Crash Test Facility located
errors are shown below: in College Station, Texas, assisted the authors in obtaining acoustic
Category Crash Sounds Traffic Sounds Other Sounds data of vehicle crashes. Dick Zimmer allowed the authors to collect
Crash Sounds 0 0 0
data at the facility, and Richard Badillo helped with the data collec-
Traffic Sounds 2 0 0 tion. James Manning, who works with the city of Durham, North
Other Sounds 0 0 0 Carolina, suggested the value of an acoustic system for detecting
traffic accidents.
The detection rate (true positives) was 100 percent. All the crash
sounds were correctly classified. The test of the traffic samples has
only two mistakenly classified samples. The other data sounds, REFERENCES
including car braking and pile-driving sounds, were classified with-
out any error. The false alarm rate (false positives) was 1 percent; 1. Kranig, J., E. Minge, and C. Jones. Field Testing of Urban Vehicle Oper-
two traffic sounds were classified as crash sounds. ations Using Non-Intrusive Technologies. Report FHWA-PL-97-018.
Minnesota Department of Transportation, St. Paul, Minn., 1997.
The authors processed data in 3-s intervals and classified the data 2. Coulombe, R. F. Trends and Applications in CCTV, Part One: The
as crash or noncrash events. The false alarm rate was low, but there USA. Traffic Technology International, Oct./Nov. 1999, pp. 29–33.
was nevertheless potential for generating a substantial number of 3. Ritchie, S. G., B. Abdulhai, A. E. Parkany, J.-B. Sheu, R. L. Cheu, and
S. I. Khan. A Comprehensive System for Incident Detection on Freeways
false alarms. For example, if one assumes that 30 percent of the sig-
and Arterials. Intelligent Transportation: Serving the User Through
nals have sufficient acoustic power to be classified, then one would Deployment. Proc., 1995 Annual Meeting of ITS America, Washington,
process 360 acoustic records in 1 h. With a false alarm rate of 1 per- D.C., 1995, pp. 617–622.
cent, this could generate four false alarms per hour. The classifica- 4. Mammone, R. J., X. Zhang, and R. Ramachandran. Robust Speaker
Recognition. IEEE Signal Processing Magazine, Sept. 1996, pp. 58–71.
tion of a sound as an accident would imply that relevant information 5. Gish, H., and M. Schmidt. Text-Independent Speaker Identification.
about the accident such as traffic signal settings would be recorded. IEEE Signal Processing Magazine, Oct. 1994, pp. 18–32.
If an actual accident had been reported, then the data would be avail- 6. Rabiner, L. R., and B. H. Juang. Fundamentals of Speech Recognition.
able for review. If no accident had been reported, this might still Prentice-Hall, Englewood Cliffs, N. J., 1993.
7. Stevens, S. S., and J. Volkmann. The Relation of Pitch of Frequency:
mean that an accident had occurred. For safety studies, one would A Revised Scale. American Journal of Psychology, Vol. 53, 1940,
need to review the data to determine the actual conditions that pp. 329–353.
caused the sound event. It is possible that the false alarm rate may 8. Couvreur, C. Environmental Sound Recognition: A Statistical Approach.
need to be further reduced. Experience with the system would be Ph.D. dissertation. Faculte Polytechnique de Mons, Mons, Belgium, 1997.
9. Harlow, C. A. Automated Accident Detection System. Final Report.
required to determine the required rate. Louisiana Transportation Research Center, Baton Rouge, La., July 1999.
10. Deller, J. R., J. G. Proakis, and J. H. L. Hansen. Discrete-Time Processing
of Speech Signals. Macmillan, New York, 1993.
CONCLUSIONS 11. Pratt, W. K. Digital Image Processing. John Wiley & Sons, New York,
1991.
This project has demonstrated a promising approach for developing 12. MathWorks. User’s Guide to Neural Network Toolbox. Math Works
an acoustic system for automatically detecting and reporting traffic Inc., Natick, Mass., 1991.
13. Fausett, L. Fundamentals of Neural Networks: Architectures, Algorithms,
accidents at intersections. The system needs to be further evaluated and Applications. Prentice-Hall, Englewood Cliffs, N.J., 1994.
in situations with normal traffic flow and accident occurrences. The
data were not obtained under a wide variety of environmental con- The contents of this paper reflect the views of the authors, who are responsible
ditions as would be encountered in practice because crash tests are for the facts and accuracy of the data presented. The contents do not necessarily
reflect the official views or policies of the Louisiana Department of Transportation
not normally conducted under adverse weather conditions. It has and Development or the Louisiana Transportation Research Center. This report
previously been observed in tests of equipment used for counting does not constitute a standard, specification, or regulation.
vehicles that the equipment is affected by environmental conditions, Publication of this paper sponsored by Committee on Safety Data, Analysis, and
traffic conditions, lighting conditions, and intersection geometry. Evaluation.

You might also like