Professional Documents
Culture Documents
Agent-Based Evaluation of Driver Heterogeneous Behavior During Safety-Critical Events
Agent-Based Evaluation of Driver Heterogeneous Behavior During Safety-Critical Events
Abstract—Heterogeneous driver behavior during driver-dependent driving rules and characteristics through
safety-critical events is more complicated than normal driving training and can react as a clone when training is completed.
situations and is difficult to capture by statistical models. This By using safety-critical events of different drivers to train
paper applies an agent-based reinforcement learning method to different agents, heterogeneous behavior can be represented
represent heterogeneous driving behavior for different drivers by actions from different agents.
during safety-critical events. The naturalistic driving data of
different drivers during safety-critical events are used in agent B. Agent Training: Reinforcement Learning
training. As an output of the Neuro-Fuzzy Actor Critic Reinforcement learning (RL) is a relatively new
Reinforcement Learning (NFACRL) training technique,
methodology designed to develop artificial agents. RL can
behavior rules are embedded in different agents to represent
heterogeneous actions between drivers. The results show that determine actions an intelligent agent is supposed to take in
the NFACRL is able to simulate naturalistic driver behavior an environment to maximize some concept of long-term
and present heterogeneity. goals[2]. The objective of RL algorithms in this study is to
find a policy that maps traffic states to their naturalistic
Introduction actions. RL reinforces actions when agents perform close to
the naturalistic actions and penalizes actions which are far
A. Agent-based Modeling of Driver Behavior away. The only information available for learning is the
Agent-based modeling (ABM) is a relatively new system feedback, which describes in terms of reward and
paradigm used to explore the behavior of complex systems punishment the task the agent has to realize [3]. The
[1]. Within the transportation domain, ABM is particularly procedure involves optimizing not only the direct
appropriate for modeling systems in which human decision reinforcement but the total amount of reinforcements the
making and action are critical components. Studies about the agent can receive in the future.
ABM of driver behavior include driver response to incidents,
interaction between cars and trucks, driver behavior C. The Proposed Reinforcement Learning Methodology
approaching a work zone, etc. Bonabeau [1] suggests ABM is During this research, we proposed a revised RL
best applied to simulations when the interactions of agents are methodology to solve the traffic state dimensional problem
complex, nonlinear or discontinuous; the agents are and the continuous action generation problem based on recent
heterogeneous; and the agents exhibit complicated behavior, RL algorithms used in traffic research [4, 5]. In fact, this
including learning and adaptation. In this study, ABM is used revised Neuro-Fuzzy Actor Critic Reinforcement Learning
to model individual human driving behavior. Agents learn (NFACRL) approach is capable of dealing with
multi-dimensional traffic state problems and can generate
continuous actions that can adjust to driver behavior
Manuscript received April 10th, 2010. simulation problems in traffic.
Montasir Abbas is an Assistant Professor at Virginia Tech. He received
his Ph.D. in Civil Engineering from Purdue University in 2001. He has In terms of the paper layout, the methodology of testing
previously worked as an Assistant Research Engineer at Texas heterogeneous driver behavior (i.e., the NFACRL
Transportation Institute and as a Visiting Assistant Professor at Texas A&M methodology) is briefly presented (for more detail, please see
(e-mail: abbas@vt.edu). [5]). Subsequently, the naturalistic driving database and
Linsen Chong is a Graduate Research Assistant with Virginia Tech Signal
Control & Operation Research and Education Systems Lab at Virginia Tech safety-critical events extraction process are described. Finally,
(e-mail: linsenc@vt.edu). cross validation between different agents is performed, driver
Bryan Higgs is a Graduate Research Assistant with Virginia Tech Signal heterogeneity is illustrated, and the idea of a “Mega Agent” is
Control & Operation Research and Education Systems Lab at Virginia Tech proposed.
(e-mail: bryan.higgs@vt.edu).
Alejandra Medina is a Senior Research Associate at the Virginia Tech
Transportation Institute (e-mail: ale@vtti.vt.edu).
C. Y. David Yang is a Senior Research Engineer at the Office of
Operations R&D at Turner-Fairbank Highway Research Center. He received
his Ph.D. from Purdue University in 1997. He has previously worked for the
U.S. Department of Transportation Volpe National Transportation Systems
Center (e-mail: david.yang@dot.gov).
1798
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
provides a diverse collection of both on-road driving and was completed after all the events from an individual agent
driver (participant, non-driving) data, including measures were trained. A large number of iterations are needed in
such as driver input and performance (e.g., lane position, training. Theoretically, when the differences of Critic/Actor
headway, etc.), four camera video views, and driver activity weights between two consecutive iterations become very
data. small, the training process is assumed to be completed.
The safety-critical events were identified and analyzed However, the acceptance threshold parameter and other
during previous work conducted by VTTI [6]. The method learning parameters indicate that the revised NFACRL is a
used to identify the safety-critical events was triggers or heuristic methodology so no global optimal agent behavior is
thresholds on individual variables that were collected. For an guaranteed, which means that even the convergence of the
event to be flagged, only one of the triggers had to be met. weights may result in a local optimal behavior. One way to
Those triggers are as follows: avoid this premature convergence is to provide the agent
• Longitudinal Acceleration greater than or equal to sufficient training iterations. During this study, we tested 400
-0.2g iterations during training for each agent. As each agent has
• Forward Time-to-Collision of less than or equal to 2 approximately 1,500 timing steps, driver behavioral rules
s were trained at 1,500 x 400=600,000 times. The agent should
• Swerve greater than or equal to 2 rad/s2 produce a near optimal approximation of the driving
• Lane Tracker Status equals abort (lane deviation) behavior.
• Critical Incident Button
• Analyst Identified IV. AGENT TRAINING RESULTS
In our test, the following vehicle was the instrumented A. Preliminary Training Results
vehicle. The measured vehicle trajectory data included
Fig. 2(a) and Fig. 2(b) show the longitudinal agent action
speedometer output, longitudinal accelerations, yaw angle,
acceleration and lateral action yaw angle during one event of
braking, and acceleration actions. Range and range-rate were
Driver Agent A. The blue scatter plots represent the
collected by instrumented forward viewing radar from the
naturalistic driving actions, and the green curves show the
following vehicle. Speed was collected from the speedometer.
agent actions. The NFACRL effectively captured naturalistic
Yaw angle and lane offset were extracted from video
driver behavior during this event with an R squared degree of
recording. Acceleration from the accelerometer was used as
accuracy of 0.981 and 0.967 for acceleration and yaw angle,
longitudinal traffic action, and the yaw angle was used as
respectively. We also tested the NFACRL performance on the
lateral action.
other event of the same driver (Fig. 3[a] and Fig. 3[b]).
B. State and Action Variables Selection 0.2
Longitidinal Action Estimation
the database. First, all the event data from one driver were
-0.5
aggregated together (Driver A had two events, and Driver B 0 50 100 150 200 250 300 350
Time (0.1s)
had three). Secondly, the upper and lower bounds of six state
variables were extracted directly. Subsequently the five Fig. 2(a). Acceleration of Agent A, Event A1.
quartiles of two action variables were extracted by MATLAB
functions to assign the value of discrete actions in sets and Not surprisingly, Agent A almost captures Driver A’s
- . To exclude measurement errors, we arbitrarily set up naturalistic behavior during Event 2. It proves our
additional constraints to ensure that state data were plausible. aforementioned statement that the NFACRL methodology
The training parameters were constant during the training will not result in an average driving behavior even when
process. using more events as training inputs. By comparison, using
During our designed training process, at one time step of statistical analysis the agent will perform an average behavior
one event fuzzy rules scanned their associated weights, and deteriorate performances of both events.
selected the optimal actions, and updated the weights. The
NFACRL updated the weights from the beginning of the
events until the end. In fact, weights were trained and updated
by 10 times the length of events (10Hz resolution data). Then
the agent switched to another event and received training
from the beginning of the event until the end. One iteration
1799
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
Lateral Action Estimation
0.05
Naturalistic
0.04 Agent B. Cross Validation
0.03 The use of cross validation shows the different behavior
of different drivers when they experience the same state. To
0.02
achieve this goal, driving behavior of one driver needs to be
Yaw Angle (radius)
0.01
trained first then validated using the events experienced by
0 another driver. During our test, we trained Agent A using
-0.01
Driver A’s event states then simulated Agent A’s actions
using the event states from Driver B. Similarly, we simulated
-0.02
Agent B’s actions using Driver A’s events. To compare, Fig.
-0.03 4(a) and Fig. 4(b) show an event example (Event B1) of
-0.04 Agent B trained by using all its safety-critical events.
0 50 100 150 200 250 300 350
Time (0.1s)
Acceleration (g)
0.02 -0.1
0.01 -0.15
Acceleration (g)
0 -0.2
-0.01 -0.25
-0.02 -0.3
-0.03 -0.35
0 100 200 300 400 500 600
Time (0.1s)
-0.04
Fig. 4(a). Acceleration of Agent B, Event B1.
-0.05
0 50 100 150 200 250 300 350 400 Lateral Action Estimation
Time (0.1s) 0.07
Naturalistic
0.06 Agent
Fig. 3(a). Acceleration of Agent A, Event A2.
Lateral Action Estimation 0.05
0.04
Naturalistic 0.04
Yaw Angle (radius)
Agent
0.03
0.03
0.02 0.02
Yaw Angle (radius)
0.01
0.01
0
0
-0.01
-0.01
-0.02
0 100 200 300 400 500 600
Time (0.1s)
-0.02
Fig. 4(b). Yaw Angle of Agent B, Event B1.
-0.03
0 50 100 150 200 250 300 350 400
Time (0.1s)
Fig. 5(a) and Fig. 5(b) show the longitudinal and lateral
Fig. 3(b). Yaw angle of Agent A, Event A2. actions of Agent B during one event from Driver A. Fig. 6(a)
and Fig. 6(b) show the longitudinal and lateral actions of
Fig. 2 and Fig. 3 show that during some part of the events Agent A during one event from Driver B. It is clear that
such as 7 289 to :09 in Fig. 2(a), the action of Agent A Driver A and Driver B perform heterogeneous behavior.
diverges from naturalistic data by a small amount. Two Table 1 shows the R squared values as a statistical
reasons may explain the differences: data collection errors in representation of the degree of accuracy on agent
traffic state variables and unstable driver behavior within the performance. The upper left part and lower right part of Table
events. Occasionally, the leading vehicle does not fall into the 1 represent the degree of approximation when using events of
range of the radar detection zone so the agent assumes there is the same driver during training and validation. It is greater
no vehicle in front. Consequently, the wrong traffic state than the cross-validation part (upper right and lower left),
leads to the wrong action. Also, human behavior is difficult to which means that heterogeneity of different drivers are clear.
maintain as constant. Within one event, a driver may react
differently even during similar traffic states.
1800
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
Longitidinal Action Estimation Lateral Action Estimation
0.3 0.06
Naturalistic Naturalistic
Agent 0.05 Agent
0.2
0.04
0.1
0.03
0 0.02
-0.1 0.01
0
-0.2
-0.01
-0.3
-0.02
-0.4 -0.03
0 50 100 150 200 250 300 350 0 100 200 300 400 500 600
Time (0.1s) Time (0.1s)
Fig. 5(a). Acceleration of Agent B, Event A1. Fig. 6(b). Acceleration of Agent A, Event B1.
0.01
differences in one dimension (e.g., lower speed and high
0 speed) can separate two states in the state space. As we found
-0.01
that state variables from events vary substantially, it is less
likely that two events from the naturalistic data overlap in the
-0.02
state space. Notice that the different NFACRL fuzzy rules
-0.03 dominate different regimes of the state space divided by fuzzy
-0.04 sets of state variables. Therefore, the training process of the
0 50 100 150 200 250 300 350
Time (0.1s) imaginary “Mega Agent” actually involves adjusting the
Fig. 5(b). Yaw Angle of Agent B, Event A1. fuzzy rules that are taking charge of relevant state space
regimes where safety-critical events are located.
Longitidinal Action Estimation
We used all the events of Driver A and Driver B to train the
0.1
Naturalistic
“Mega Agent.” Performances of the “Mega Agent” (using
0.05 Agent Events A1 and B1) are presented in Fig. 7 and Fig. 8.
0
Longitidinal Action Estimation
-0.05 0.3
Naturalistic
Acceleration (g)
-0.1 Agent
0.2
-0.15
0.1
-0.2
Acceleration (g)
0
-0.25
-0.3 -0.1
-0.35
0 100 200 300 400 500 600 -0.2
Time (0.1s)
-0.4
Table 1. R squared values used for cross validation. 0 50 100 150 200 250 300 350
Time (0.1s)
Event Agent A Agent B Fig. 7(a). Acceleration of “Mega Agent,” Event A1.
long lat long lat
Compared to Fig. 5 and Fig. 6, the “Mega Agent” is
Driver A 0.98 0.97 0.81 0.83 capable of differentiating between Agent A and Agent B.
In reality, a conservative driver may never experience a
Driver B 0.82 0.60 0.97 0.92 safety-critical event while an aggressive driver may
experience more. In such a case, the conservative driver
would not know about the actions to undertake should he or
she suddenly experience a safety-critical event. Through this
1801
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.
“Mega Agent” training, the conservative driver will “learn” Table 2 shows the R squared values of the “Mega Agent,”
the crash avoidance actions from the aggressive driver and which are high values. This means that the “Mega Agent” is
will help him/herself to evade upcoming crashes. capable of mimicking the behaviors of Driver A and Driver B
0.05
Lateral Action Estimation at the same time without losing driver specificity.
Naturalistic
0.04 Agent
V. CONCLUSIONS AND FUTURE RESEARCH
0.03
This paper applied an agent-based artificial intelligence
0.02 learning machine known as the NFACRL methodology to test
Yaw Angle (radius)
-0.35 REFERENCES
0 100 200 300 400 500 600
Time (0.1s)
[1] E. Bonabeau, "Agent-based modeling: Methods and
Fig. 8(a). Acceleration of “Mega Agent,” Event B1. techniques for simulating human systems," vol. 99,
0.06
Lateral Action Estimation 2002.
Naturalistic [2] R. S. Sutton and A. G. Barto, Reinforcement
Agent
0.05 Learning:An Introduction. London, England: The
0.04
MIT Press Cambridge, Massachusetts, 1988.
[3] L. Jouffe, "Fuzzy inference system learning by
Yaw Angle (radius)
0.03
reinforcement methods," IEEE Transactions on
0.02 Systems, Man and Cybernetics Part C: Applications
and Reviews, vol. 28, pp. 338-355, 1998.
0.01
[4] Y. Zhang, et al., "Development and Evaluation of a
0 Multi-Agent Based Neuro-Fuzzy Arterial Traffic
Signal Control System," ed, 2007.
-0.01
[5] L. Chong, et al., "A revised Reinforcement Learning
-0.02
0 100 200 300 400 500 600
Algorithm to Model Vehicle Continuous Actions in
Time (0.1s) Traffic " Submitted to IEEE ITSC 2011, 2011.
Fig. 8(b). Yaw Angle of “Mega Agent,” Event B1. [6] R. Olson, et al., "DRIVER DISTRACTION IN
COMMERCIAL VEHICLE OPERATIONS,"
Table 2. R squared values of the “Mega Agent.” Center for Truck and Bus Safety
Virginia Tech Transportation Institute, Blacksburg
Event Agent A Agent B Mega Agent VA FMCSA-RRR-09-042, 2009.
1802
Authorized licensed use limited to: TU Delft Library. Downloaded on September 10,2023 at 17:01:27 UTC from IEEE Xplore. Restrictions apply.