Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

2011 14th International IEEE Conference on

Intelligent Transportation Systems


Washington, DC, USA. October 5-7, 2011

Agent-based Evaluation of Driver Heterogeneous Behavior during


Safety-Critical Events
Montasir Abbas, Member IEEE, Linsen Chong, Bryan Higgs, Alejandra Medina, and C. Y. David Yang

Abstract—Heterogeneous driver behavior during driver-dependent driving rules and characteristics through
safety-critical events is more complicated than normal driving training and can react as a clone when training is completed.
situations and is difficult to capture by statistical models. This By using safety-critical events of different drivers to train
paper applies an agent-based reinforcement learning method to different agents, heterogeneous behavior can be represented
represent heterogeneous driving behavior for different drivers by actions from different agents.
during safety-critical events. The naturalistic driving data of
different drivers during safety-critical events are used in agent B. Agent Training: Reinforcement Learning
training. As an output of the Neuro-Fuzzy Actor Critic Reinforcement learning (RL) is a relatively new
Reinforcement Learning (NFACRL) training technique,
methodology designed to develop artificial agents. RL can
behavior rules are embedded in different agents to represent
heterogeneous actions between drivers. The results show that determine actions an intelligent agent is supposed to take in
the NFACRL is able to simulate naturalistic driver behavior an environment to maximize some concept of long-term
and present heterogeneity. goals[2]. The objective of RL algorithms in this study is to
find a policy that maps traffic states to their naturalistic
Introduction actions. RL reinforces actions when agents perform close to
the naturalistic actions and penalizes actions which are far
A. Agent-based Modeling of Driver Behavior away. The only information available for learning is the
Agent-based modeling (ABM) is a relatively new system feedback, which describes in terms of reward and
paradigm used to explore the behavior of complex systems punishment the task the agent has to realize [3]. The
[1]. Within the transportation domain, ABM is particularly procedure involves optimizing not only the direct
appropriate for modeling systems in which human decision reinforcement but the total amount of reinforcements the
making and action are critical components. Studies about the agent can receive in the future.
ABM of driver behavior include driver response to incidents,
interaction between cars and trucks, driver behavior C. The Proposed Reinforcement Learning Methodology
approaching a work zone, etc. Bonabeau [1] suggests ABM is During this research, we proposed a revised RL
best applied to simulations when the interactions of agents are methodology to solve the traffic state dimensional problem
complex, nonlinear or discontinuous; the agents are and the continuous action generation problem based on recent
heterogeneous; and the agents exhibit complicated behavior, RL algorithms used in traffic research [4, 5]. In fact, this
including learning and adaptation. In this study, ABM is used revised Neuro-Fuzzy Actor Critic Reinforcement Learning
to model individual human driving behavior. Agents learn (NFACRL) approach is capable of dealing with
multi-dimensional traffic state problems and can generate
continuous actions that can adjust to driver behavior
Manuscript received April 10th, 2010. simulation problems in traffic.
Montasir Abbas is an Assistant Professor at Virginia Tech. He received
his Ph.D. in Civil Engineering from Purdue University in 2001. He has In terms of the paper layout, the methodology of testing
previously worked as an Assistant Research Engineer at Texas heterogeneous driver behavior (i.e., the NFACRL
Transportation Institute and as a Visiting Assistant Professor at Texas A&M methodology) is briefly presented (for more detail, please see
(e-mail: abbas@vt.edu). [5]). Subsequently, the naturalistic driving database and
Linsen Chong is a Graduate Research Assistant with Virginia Tech Signal
Control & Operation Research and Education Systems Lab at Virginia Tech safety-critical events extraction process are described. Finally,
(e-mail: linsenc@vt.edu). cross validation between different agents is performed, driver
Bryan Higgs is a Graduate Research Assistant with Virginia Tech Signal heterogeneity is illustrated, and the idea of a “Mega Agent” is
Control & Operation Research and Education Systems Lab at Virginia Tech proposed.
(e-mail: bryan.higgs@vt.edu).
Alejandra Medina is a Senior Research Associate at the Virginia Tech
Transportation Institute (e-mail: ale@vtti.vt.edu).
C. Y. David Yang is a Senior Research Engineer at the Office of
Operations R&D at Turner-Fairbank Highway Research Center. He received
his Ph.D. from Purdue University in 1997. He has previously worked for the
U.S. Department of Transportation Volpe National Transportation Systems
Center (e-mail: david.yang@dot.gov).

978-1-4577-2197-7/11/$26.00 ©2011 IEEE 1797


Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
where  is the linguistic term of fuzzy set (either “Low” or
II. THE PROPOSED NFACRL METHOD “High”) for the   input state variable.
A. NFACRL Structure A fuzzy rule can be represented as:
When “ is low,” “% is low,” “& is high,” “' is high,”
“( is high,” and “ is high,” then action, “Deceleration
 )0.2,.”
The fourth layer is the actor critic layer, which includes
Actor and Critic nodes. Critic nodes are associated with the
value of next state under the current policy; Actor nodes
represent actions to be selected by fuzzy rules. In this study,
five discrete longitudinal acceleration values and five lateral
yaw angle values are considered as action candidates. Two
sets are used to store actions: one for acceleration  and one
for yaw angle -. Ten action values are considered as constant
parameters during training.

  . / , /% , /& , /' , /( 0 (2)


-  .1/ , 1/% , 1/& , 1/' , 1/( 0 (3)
Fig. 1. NFACRL structure.
 =the   input variable (state)
 =number of input variables B. Weights and Action Selection
 =number of fuzzy sets or membership functions for Weights  connect  fuzzy rule with critic output.
the  
2 connect the  fuzzy rule with the 3  action output.

 =   fuzzy set or membership function for the 
Action weight values 2 show competition between actions.

 input variable 
 The reinforcement learning algorithm updates  and 2 by
 =the  fuzzy rule
comparing to the naturalistic actions. Eventually, when
=number of fuzzy rules
training is finished, the fuzzy rules will select actions with the
 =weight between  fuzzy rule and critic  
 maximum weights 2 . For more detail, please refer to [5].
 =weight between  fuzzy rule and action 
 critic output C. Action Output
 output of   action For each fuzzy rule, one acceleration and one yaw angle are
Where   1, … ,   1, . .  ,   1, … ,  and selected. Accordingly, the NFACRL output actions
  1, . . ,  acceleration and yaw angle 1 are the weighted average of
 all the selected discrete actions by all the rules. Firing strength
As Fig. 1 shows, the NFACRL uses a neural network is used as weight for the fuzzy rules so the output actions are
framework with four layers of neurons and their associated generated as
weights between two layers. The first layer is the input layer.

Each node represents a state variable  . Vehicle speed,  ∑'
$ !5 ∗ /2 (4)
spatial distance, relative speed to the leading vehicle, 1 ∑' !5 ∗

1/2 (5)
$
acceleration and yaw angle of the previous state, and lane
offset were defined as state variables  to  .
The second layer is the fuzzy set layer. States are where is the continuous acceleration and 1 is the yaw
“fuzzified” into linguistic terms such as, “Speed is High” and, angle. In this study, acceleration is considered as longitudinal
“Speed is Low.” As a fuzzy set, each node is associated with a action and yaw angle is considered as representative of lateral
membership function. The output of membership function is action.
the input of the third layer. The third layer is the fuzzy rule
layer. Each rule is connected with a number of antecedents III. NATURALISTIC DRIVING DATABASE
from the second layer. Fuzzy rules in the third layer provide a A. Brief Description of the Database
state action mapping policy to determine which action to
We used safety-critical events in agent training from the
select for the current state. A firing strength function is
8-Truck Database of the Naturalistic Truck Driving Study
calculated by using the product of the six membership
(NTDS) collected by the Virginia Tech Transportation
function values of its antecedent fuzzy sets.
Institute (VTTI). The NTDS uses drivers who operate
Firing strength for  fuzzy rule is vehicles that have been equipped with specialized sensor,

processing, and recording equipment. The drivers operate and
!  ∏$ #   (1) interact with these vehicles while the data collection
equipment is continuously recording numerous items of
interest during the entire driving periods. This system

1798
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
provides a diverse collection of both on-road driving and was completed after all the events from an individual agent
driver (participant, non-driving) data, including measures were trained. A large number of iterations are needed in
such as driver input and performance (e.g., lane position, training. Theoretically, when the differences of Critic/Actor
headway, etc.), four camera video views, and driver activity weights between two consecutive iterations become very
data. small, the training process is assumed to be completed.
The safety-critical events were identified and analyzed However, the acceptance threshold parameter and other
during previous work conducted by VTTI [6]. The method learning parameters indicate that the revised NFACRL is a
used to identify the safety-critical events was triggers or heuristic methodology so no global optimal agent behavior is
thresholds on individual variables that were collected. For an guaranteed, which means that even the convergence of the
event to be flagged, only one of the triggers had to be met. weights may result in a local optimal behavior. One way to
Those triggers are as follows: avoid this premature convergence is to provide the agent
• Longitudinal Acceleration greater than or equal to sufficient training iterations. During this study, we tested 400
-0.2g iterations during training for each agent. As each agent has
• Forward Time-to-Collision of less than or equal to 2 approximately 1,500 timing steps, driver behavioral rules
s were trained at 1,500 x 400=600,000 times. The agent should
• Swerve greater than or equal to 2 rad/s2 produce a near optimal approximation of the driving
• Lane Tracker Status equals abort (lane deviation) behavior.
• Critical Incident Button
• Analyst Identified IV. AGENT TRAINING RESULTS
In our test, the following vehicle was the instrumented A. Preliminary Training Results
vehicle. The measured vehicle trajectory data included
Fig. 2(a) and Fig. 2(b) show the longitudinal agent action
speedometer output, longitudinal accelerations, yaw angle,
acceleration and lateral action yaw angle during one event of
braking, and acceleration actions. Range and range-rate were
Driver Agent A. The blue scatter plots represent the
collected by instrumented forward viewing radar from the
naturalistic driving actions, and the green curves show the
following vehicle. Speed was collected from the speedometer.
agent actions. The NFACRL effectively captured naturalistic
Yaw angle and lane offset were extracted from video
driver behavior during this event with an R squared degree of
recording. Acceleration from the accelerometer was used as
accuracy of 0.981 and 0.967 for acceleration and yaw angle,
longitudinal traffic action, and the yaw angle was used as
respectively. We also tested the NFACRL performance on the
lateral action.
other event of the same driver (Fig. 3[a] and Fig. 3[b]).
B. State and Action Variables Selection 0.2
Longitidinal Action Estimation

Based on our preliminary efforts, two drivers were selected. Naturalistic


Agent
0.1
We used all the safety critical-events available, a technique
that is sufficient for training purposes and avoiding bias 0
effects. Although conditions and casualties of different events
Acceleration (g)

can substantially vary, the NFACRL theoretically can still -0.1

effectively capture rules of individual drivers as different


-0.2
events are located at different state spaces dominated by
different fuzzy rules. -0.3
Before training, the upper and lower bound parameters of
fuzzy sets and the discrete action set values are extracted from -0.4

the database. First, all the event data from one driver were
-0.5
aggregated together (Driver A had two events, and Driver B 0 50 100 150 200 250 300 350
Time (0.1s)
had three). Secondly, the upper and lower bounds of six state
variables were extracted directly. Subsequently the five Fig. 2(a). Acceleration of Agent A, Event A1.
quartiles of two action variables were extracted by MATLAB
functions to assign the value of discrete actions in sets  and Not surprisingly, Agent A almost captures Driver A’s
- . To exclude measurement errors, we arbitrarily set up naturalistic behavior during Event 2. It proves our
additional constraints to ensure that state data were plausible. aforementioned statement that the NFACRL methodology
The training parameters were constant during the training will not result in an average driving behavior even when
process. using more events as training inputs. By comparison, using
During our designed training process, at one time step of statistical analysis the agent will perform an average behavior
one event fuzzy rules scanned their associated weights, and deteriorate performances of both events.
selected the optimal actions, and updated the weights. The
NFACRL updated the weights from the beginning of the
events until the end. In fact, weights were trained and updated
by 10 times the length of events (10Hz resolution data). Then
the agent switched to another event and received training
from the beginning of the event until the end. One iteration

1799
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
Lateral Action Estimation
0.05
Naturalistic
0.04 Agent B. Cross Validation
0.03 The use of cross validation shows the different behavior
of different drivers when they experience the same state. To
0.02
achieve this goal, driving behavior of one driver needs to be
Yaw Angle (radius)

0.01
trained first then validated using the events experienced by
0 another driver. During our test, we trained Agent A using
-0.01
Driver A’s event states then simulated Agent A’s actions
using the event states from Driver B. Similarly, we simulated
-0.02
Agent B’s actions using Driver A’s events. To compare, Fig.
-0.03 4(a) and Fig. 4(b) show an event example (Event B1) of
-0.04 Agent B trained by using all its safety-critical events.
0 50 100 150 200 250 300 350
Time (0.1s)

Longitidinal Action Estimation


Fig. 2(b). Yaw angle of Agent A, Event A1. 0.1
Naturalistic
0.05 Agent

Longitidinal Action Estimation


0.04 0
Naturalistic
0.03 Agent -0.05

Acceleration (g)
0.02 -0.1

0.01 -0.15
Acceleration (g)

0 -0.2

-0.01 -0.25

-0.02 -0.3

-0.03 -0.35
0 100 200 300 400 500 600
Time (0.1s)
-0.04
Fig. 4(a). Acceleration of Agent B, Event B1.
-0.05
0 50 100 150 200 250 300 350 400 Lateral Action Estimation
Time (0.1s) 0.07
Naturalistic
0.06 Agent
Fig. 3(a). Acceleration of Agent A, Event A2.
Lateral Action Estimation 0.05
0.04
Naturalistic 0.04
Yaw Angle (radius)

Agent
0.03
0.03

0.02 0.02
Yaw Angle (radius)

0.01
0.01

0
0
-0.01

-0.01
-0.02
0 100 200 300 400 500 600
Time (0.1s)
-0.02
Fig. 4(b). Yaw Angle of Agent B, Event B1.
-0.03
0 50 100 150 200 250 300 350 400
Time (0.1s)
Fig. 5(a) and Fig. 5(b) show the longitudinal and lateral
Fig. 3(b). Yaw angle of Agent A, Event A2. actions of Agent B during one event from Driver A. Fig. 6(a)
and Fig. 6(b) show the longitudinal and lateral actions of
Fig. 2 and Fig. 3 show that during some part of the events Agent A during one event from Driver B. It is clear that
such as 7  289 to :09 in Fig. 2(a), the action of Agent A Driver A and Driver B perform heterogeneous behavior.
diverges from naturalistic data by a small amount. Two Table 1 shows the R squared values as a statistical
reasons may explain the differences: data collection errors in representation of the degree of accuracy on agent
traffic state variables and unstable driver behavior within the performance. The upper left part and lower right part of Table
events. Occasionally, the leading vehicle does not fall into the 1 represent the degree of approximation when using events of
range of the radar detection zone so the agent assumes there is the same driver during training and validation. It is greater
no vehicle in front. Consequently, the wrong traffic state than the cross-validation part (upper right and lower left),
leads to the wrong action. Also, human behavior is difficult to which means that heterogeneity of different drivers are clear.
maintain as constant. Within one event, a driver may react
differently even during similar traffic states.

1800
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
Longitidinal Action Estimation Lateral Action Estimation
0.3 0.06
Naturalistic Naturalistic
Agent 0.05 Agent
0.2

0.04
0.1
0.03

Yaw Angle (radius)


Acceleration (g)

0 0.02

-0.1 0.01

0
-0.2
-0.01

-0.3
-0.02

-0.4 -0.03
0 50 100 150 200 250 300 350 0 100 200 300 400 500 600
Time (0.1s) Time (0.1s)

Fig. 5(a). Acceleration of Agent B, Event A1. Fig. 6(b). Acceleration of Agent A, Event B1.

Lateral Action Estimation


C. The “Mega Agent” Idea
0.05
Naturalistic
We want to design an imaginary agent that can capture the
0.04 Agent behaviors of both Driver A and Driver B but not an average
0.03 behavior. Our revised NFACRL methodology can meet this
challenge. According to the nature of traffic state variables,
0.02
the state space in this problem has six dimensions. Even the
Yaw Angle (radius)

0.01
differences in one dimension (e.g., lower speed and high
0 speed) can separate two states in the state space. As we found
-0.01
that state variables from events vary substantially, it is less
likely that two events from the naturalistic data overlap in the
-0.02
state space. Notice that the different NFACRL fuzzy rules
-0.03 dominate different regimes of the state space divided by fuzzy
-0.04 sets of state variables. Therefore, the training process of the
0 50 100 150 200 250 300 350
Time (0.1s) imaginary “Mega Agent” actually involves adjusting the
Fig. 5(b). Yaw Angle of Agent B, Event A1. fuzzy rules that are taking charge of relevant state space
regimes where safety-critical events are located.
Longitidinal Action Estimation
We used all the events of Driver A and Driver B to train the
0.1
Naturalistic
“Mega Agent.” Performances of the “Mega Agent” (using
0.05 Agent Events A1 and B1) are presented in Fig. 7 and Fig. 8.
0
Longitidinal Action Estimation
-0.05 0.3
Naturalistic
Acceleration (g)

-0.1 Agent
0.2

-0.15
0.1
-0.2
Acceleration (g)

0
-0.25

-0.3 -0.1

-0.35
0 100 200 300 400 500 600 -0.2
Time (0.1s)

Fig. 6(a). Acceleration of Agent A, Event B1. -0.3

-0.4
Table 1. R squared values used for cross validation. 0 50 100 150 200 250 300 350
Time (0.1s)

Event Agent A Agent B Fig. 7(a). Acceleration of “Mega Agent,” Event A1.
long lat long lat
Compared to Fig. 5 and Fig. 6, the “Mega Agent” is
Driver A 0.98 0.97 0.81 0.83 capable of differentiating between Agent A and Agent B.
In reality, a conservative driver may never experience a
Driver B 0.82 0.60 0.97 0.92 safety-critical event while an aggressive driver may
experience more. In such a case, the conservative driver
would not know about the actions to undertake should he or
she suddenly experience a safety-critical event. Through this

1801
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
“Mega Agent” training, the conservative driver will “learn” Table 2 shows the R squared values of the “Mega Agent,”
the crash avoidance actions from the aggressive driver and which are high values. This means that the “Mega Agent” is
will help him/herself to evade upcoming crashes. capable of mimicking the behaviors of Driver A and Driver B
0.05
Lateral Action Estimation at the same time without losing driver specificity.
Naturalistic
0.04 Agent
V. CONCLUSIONS AND FUTURE RESEARCH
0.03
This paper applied an agent-based artificial intelligence
0.02 learning machine known as the NFACRL methodology to test
Yaw Angle (radius)

0.01 driver heterogeneous behavior during safety-critical events


0
with some interesting results. The next step of this research is
to extend the capability of the NFACRL to simulate driver
-0.01
behavior during other traffic regimes such as lane-changing
-0.02 behavior, driver merging behavior at the upstream and
-0.03
downstream of ramps, and evacuation behavior. It would be
interesting to model individual driver behavior and the
-0.04
0 50 100 150 200 250 300 350 decision making process under these traffic conditions.
Time (0.1s)

Fig. 7(b). Yaw Angle of “Mega Agent,” Event A1. ACKKNOWLEDGMENT


Longitidinal Action Estimation
0.1
Naturalistic
This material is based upon work supported by the Federal
0.05 Agent Highway Administration under Agreement No.
0
DTFH61-09-H-00007. Any opinions, findings, and
conclusions or recommendations expressed in this
-0.05
publication are those of the Author(s) and do not necessarily
Acceleration (g)

-0.1 reflect the view of the Federal Highway Administration.


-0.15 The authors would like to express thanks to individuals at
-0.2
Virginia Tech and VTTI who contributed to the study in
various ways: Greg Fitch, Shane McLaughlin, Brian Daily
-0.25
and Rebecca Olson.
-0.3

-0.35 REFERENCES
0 100 200 300 400 500 600
Time (0.1s)
[1] E. Bonabeau, "Agent-based modeling: Methods and
Fig. 8(a). Acceleration of “Mega Agent,” Event B1. techniques for simulating human systems," vol. 99,
0.06
Lateral Action Estimation 2002.
Naturalistic [2] R. S. Sutton and A. G. Barto, Reinforcement
Agent
0.05 Learning:An Introduction. London, England: The
0.04
MIT Press Cambridge, Massachusetts, 1988.
[3] L. Jouffe, "Fuzzy inference system learning by
Yaw Angle (radius)

0.03
reinforcement methods," IEEE Transactions on
0.02 Systems, Man and Cybernetics Part C: Applications
and Reviews, vol. 28, pp. 338-355, 1998.
0.01
[4] Y. Zhang, et al., "Development and Evaluation of a
0 Multi-Agent Based Neuro-Fuzzy Arterial Traffic
Signal Control System," ed, 2007.
-0.01
[5] L. Chong, et al., "A revised Reinforcement Learning
-0.02
0 100 200 300 400 500 600
Algorithm to Model Vehicle Continuous Actions in
Time (0.1s) Traffic " Submitted to IEEE ITSC 2011, 2011.
Fig. 8(b). Yaw Angle of “Mega Agent,” Event B1. [6] R. Olson, et al., "DRIVER DISTRACTION IN
COMMERCIAL VEHICLE OPERATIONS,"
Table 2. R squared values of the “Mega Agent.” Center for Truck and Bus Safety
Virginia Tech Transportation Institute, Blacksburg
Event Agent A Agent B Mega Agent VA FMCSA-RRR-09-042, 2009.

long lat long lat long lat


Driver
A 0.98 0.97 0.81 0.83 0.98 0.95

Driver B 0.82 0.6 0.97 0.92 0.97 0.91

1802
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.

You might also like