Visual Detection and Deep Reinforcement Learning-Based Car Following and Energy Management For Hybrid Electric Vehicles

IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO.
2, JUNE 2022 2501
Visual Detection and Deep Reinforcement

Learning-Based Car Following and Energy
Management for Hybrid Electric Vehicles
Xiaolin Tang , Member, IEEE, Jiaxin Chen , Kai Yang, Mitsuru Toyoda , Member, IEEE,
Teng Liu , Member, IEEE, and Xiaosong Hu , Senior Member, IEEE
Abstract— Practical vision-based technology is essential for DL Deep learning.

the autonomous driving of intelligent hybrid electric vehicles. DP Dynamic programming.
In this article, a hierarchical control structure is proposed, which
combines you only look once-based object detection and learning-
DQN Deep Q-network.
based intelligent control by deep reinforcement learning. After DRL Deep reinforcement learning.
modeling a typical car-following scene, the leading car is detected ECMS Equivalent consumption minimization strategy.
in the driving image, and the real-time distance between two cars EMS Energy management strategy.
is evaluated by vision-based distance measurement. Then, a deep EV Electric vehicle.
Q-network is adopted to learn the car-following control strategy
and energy management strategy, which achieves multiobjective
FCN Fully connected network.
control of the hybrid powertrain while maintaining a reasonable FCV Fuel cell vehicle.
distance for the following car. When completing off-line training, HCU Hybrid control unit.
the online processor-in-the-loop test by the edge computing HEV Hybrid electric vehicle.
device NVIDIA Jetson AGX Xavier is performed. Results show HIL Hardware-in-the-loop.
that the hierarchical control strategy gets a fuel economy of
5.76 L/100 km while keeping a safe following distance. Moreover,
ICE Internal combustion engine.
the time consumed to run a driving cycle of 1797 s in the embed- LB Learning-based.
ded device is 476.87 s, which means that a control loop, including MG Motor/generator.
target recognition, distance measurement, car-following control, ML Machine learning.
and energy management, takes 0.26 s. This study proves that vehi- MPC Model predictive control.
cle vision can lay the technical foundation for intelligent driving,
and the results illustrate that the hierarchical control structure
OB Optimization-based.
is capable of achieving considerable computing efficiency on PIL Processor-in-the-loop.
embedded devices and has the potential for real vehicle control. PMP Pontryagin’s minimum principle.
Index Terms— Car following, deep reinforcement learning RB Rule-based.
(DRL), distance measurement, energy management, you only look R-CNN Region-convolutional neural network.
once (YOLO). RL Reinforcement learning.
SOC State of charge.
N OMENCLATURE SSD Single-shot detector.
A3C Asynchronous advantage actor-critic. TD Temporal-difference.
AI Artificial intelligence. TL Transfer learning.
CVT Continuously variable transmission. V2V Vehicle to vehicle.
DDPG Deep deterministic policy gradient. V2X Vehicle to everything.
V2I Vehicle to Internet.
Manuscript received October 4, 2021; revised December 3, 2021; accepted YOLO You only look once.
January 2, 2022. Date of publication January 20, 2022; date of current version
April 20, 2022. This work was supported in part by the National Natural
Science Foundation of China under Grant 52072051 and in part by the Natural
Science Foundation of Chongqing under Grant cstc2020jcyj-msxmX0956. I. I NTRODUCTION
(Corresponding authors: Xiaolin Tang; Xiaosong Hu.)
Xiaolin Tang, Jiaxin Chen, Kai Yang, and Xiaosong Hu are with the
State Key Laboratory of Mechanical Transmissions and the College of
Mechanical and Vehicle Engineering, Chongqing University, Chongqing
400044, China (e-mail: tangxl0923@cqu.edu.cn; 201932132050@cqu.edu.cn;
T HE intelligence and new energy of automobile products
have become the main trend of development, and related
technologies are actively researched and developed in auto-
kai-yang0401@gmail.com; xiaosonghu@ieee.org). mobile manufacturers and scientific research institutes, which
Mitsuru Toyoda is with the Department of Mechanical Systems Engi-
neering, Tokyo Metropolitan University, Tokyo 191-0065, Japan (e-mail: can drive a significant change [1]. The technological route of
toyoda@tmu.ac.jp). intelligent vehicles generally includes environment perception,
Teng Liu is with the Department of Mechanical and Mechatronics Engi- driving decisions, and vehicle control [2]. First, the vehi-
neering, University of Waterloo, Waterloo, ON N2L 3G1, Canada (e-mail:
tengliu17@gmail.com). cle detects the surrounding environment by nonvision-based
Digital Object Identifier 10.1109/TTE.2022.3141780 radar or vision-based camera, which achieves the purpose of
2332-7782 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2502 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
grasping real-time driving conditions and possible obstacles. as DP, PMP, ECMS, and MPC. However, the above two types
Second, a driving decision influenced by the environment is of EMSs always exist certain inherent flaws that are inevitable
employed to determine the behavior, such as cruising, overtak- or even unavoidable in terms of efficiency, adaptability, and
ing, changing lanes, and car-following. Finally, by controlling robustness, making it difficult to get breakthroughs. With the
the command of driving, braking, and steering, autonomous deepening of research about ML, especially the adoption of
driving getting rid of the human driver can be realized while RL [22], some AI programs have gotten outstanding achieve-
ensuring a higher level of security [3]. ments, such as Alpha Go [23], AlphaGo Zero [24], and Alpha
For new energy vehicles, EVs, HEVs, and FCVs are Star [25] designed by the British AI team Deep Mind. Relying
regarded as the main targets [4]–[6]. By equipping batteries as on combining the knowledge of DL [26], Deep Mind succes-
the energy source and an electric motor as the power source, sively proposed various DRL algorithms, such as DQN [27],
EVs realize zero emission, considerable comfort, and lower DDPG [28], and A3C [29]. The series of algorithms have
noise. FCVs will also play an important role in future driving, significantly improved the scope of application and optimized
especially for long-distance commercial vehicles, which relies the learning effect.
on the chemical reaction of hydrogen and oxygen to generate Research scholars in the field of HEV EMSs have also
electrical energy. However, the defects of the above two achieved many remarkable results [30]–[32]. Liu et al. [33]
vehicles are obvious, related to insufficient driving mileage, proposed an RL-based EMS combined with velocity predic-
the safety of batteries, and hydrogen storage [7]. In contrast, tion and verified the optimization and control efficiency by
HEVs have more mature technology and reliable performance HIL experiment. Li et al. [34] established a hybrid action
at present. A hybrid powertrain equipped with an ICE and space integrating discrete and continuous variables, designed
an MG is popular. The technical route includes powertrain a pretraining stage based on DP-based results, and proposed
selection, parameter matching, and energy management. After a DDPG-based EMS considering terrain information for a
a suitable topological configuration is determined based on hybrid electric bus. Lian et al. [35] first tried TL to propose
the actual requirements, the optimal fuel economy and lower an EMS suitable for different types of HEVs, which will play
emissions can be achieved by reasonably distributing the a significant role in the future. Wang et al. [36] proposed a
power flow between the ICE and the MG and selecting the DDPG-based EMS that masters traffic lights and surrounding
driving mode [8]. cars by target detection to analyze the traffic flow. Li et al. [37]
Once a novel direction will be created in combination with proposed a DDPG-based EMS for EVs equipped with a hybrid
the abovementioned, the intelligent HEV is a potential object. battery system (high-power and high-energy batteries). While
The concept of intelligent HEVs has been proposed [9], but optimizing the electrical and thermal safety, energy loss, and
there are not many achievements, and the studies on smart cars aging cost, it combines cloud computing and edge computing
and new energy vehicles are relatively independent. Moreover, as a reliable way for DRL.
advanced communication technologies, such as V2X, are With the joint efforts of the above scholars, LB EMSs
currently only available on a small scale, and the installation of have achieved considerable results. Meanwhile, current studies
communication facilities is expensive, which cannot guarantee indicate that integrating more information about the exter-
suitability in any driving environment. After all, not every nal environment obtained by sensors, radars, cameras, and
street and every car will be equipped with V2X equipment in communications can promote the rationalization and safety of
daily life. By combining vehicle vision with intelligent control, decision-making and control. Dedicated to the development of
a pair of eyes is added to the virtual driver of intelligent HEVs. intelligent HEVs with visualization control technology through
For object detection, there are two basic types of algorithms, the integration of computer vision, this article proposes a
classified as one stage and two stages [10]. The standard two- Vision&DRL-based hierarchical control structure in the car-
stage algorithm searches the candidate region and classifies following scene. After modeling the environment by a 3-D
the object in the region. R-CNN algorithms [11]–[13] are modeling software CATIA and collecting some driving images
representative. The one-stage algorithm can directly detect at different following distances, the details of the hierarchical
targets, including YOLO [14]–[16] and SSD [17]. R-CNN control structure can be described as follows:
has better accuracy, but the computational efficiency is not
ideal. SSD can fulfill the needs of accuracy and efficiency, but A. Upper Layer (Vision-Based Distance Measurement)
hyperparameters that depend on experience are an obstacle. The YOLO v3 algorithm is adopted to detect the leading
For YOLO, considerable detection effects for tiny targets and car in the driving images, and the following distance between
outstanding efficiency are treated as the key features and two cars is visually evaluated after precalibrated mapping.
advantages. Therefore, the subsequent work will also treat
YOLO as one of the base algorithms. B. Middle Layer (Car-Following Control)
Furthermore, the driving performance of HEVs mainly After grasping the real-time distance, the acceleration of the
depends on EMSs loaded into the onboard controller. Accord- following car is controlled by DQN to maintain the following
ing to the current research progress, EMSs are divided into distance within a safe and reasonable range.
three categories [18]: RB, OB, and LB. Before the advent
of LB EMSs, a series of predefined rules based on expert C. Lower Layer (Energy Management Strategy)
experience or various optimization algorithms oriented to the The DQN algorithm is used to synchronously control the
global or instantaneous is mainstream methods [19]–[21], such amount of change in the power of the engine and the gear ratio
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2503
resistance Fslope , and air resistance Fair should be overcome,

and the above four resistances are defined as follows:
⎧
⎨
⎨ Froll = m vehicle g cos θslope f
⎨
⎨ F = 1/2ρ A C v 2
air area D veh
(1)
⎨
⎨ Fslope = m vehicle g sin θslope
⎨
⎩
Facc = δ · m vehicle · acc
where ρ is the air density, Aaera is the fronted area of the
vehicle, C D is the coefficient of the aerodynamic resistance,
Fig. 1. Powertrain of a parallel HEV with the P2 structure.
f is the coefficient of the rolling resistance, g is the gravity
TABLE I acceleration, θslope is the road slope angle, m vehicle is the vehicle
D ETAILED PARAMETERS OF THE PARALLEL HEV mass, v veh is the longitudinal velocity, acc is the longitudinal
acceleration, and δ is the mass factor caused by the moment
of inertia of wheels and powertrain rotating parts.
The simplified power balance equation in (2) requires
the power output by multiple power sources to be equal
to the demand power consumed by the transmission loss and
the driving resistance
1
Peng + Pmg = Froll + Fair + Fslope + Facc v veh (2)
ηT
where ηT is the mechanical transmission efficiency, Peng is the
engine power, and Pmg is the MG power.
of the CVT. At this time, the former achieves the distribution For simplifying the complexity of the model, the efficiency
of the power flow, while the latter maintains the economic of each component is regardless, and the total efficiency from
speed of the ICE and MG, which enables multiobjective the wheel to the CVT input shaft is determined as 85%.
control of the hybrid powertrain.
For completing a comprehensive comparison, seven kinds
of EMSs are designed according to the different control targets B. Component Modeling
of the ICE and CVT shift strategies. Moreover, an embedded Both the engine and the MG adopt quasi-static maps to
edge device, named NVIDIA Jetson AGX Xavier, is employed obtain real-time data, such as fuel consumption and efficiency.
to complete a PIL test, which verifies the real-time perfor- After a long period of driving, the total fuel consumption is
mance and optimization of the convolutional neural network- calculated cumulatively as follows:
based object detection and the FCN-based intelligent control. T

The remainder of this article is organized as follows. Mfuel = ṁ Teng , n eng dt (3)
Section II introduces the dynamic model of vehicles and the 0
modeling of the hybrid power system. Section III illustrates where Teng and n eng are the engine torque and speed, T is the
YOLO and DRL and proposes the hierarchical control struc- total driving time, ṁ is the instantaneous fuel consumption,
ture. Section IV carries on the off-line training and analyzes and Mfuel is the total fuel consumption. According to the map
the training results. Section V discusses the results of the of fuel consumption obtained by AUTONOMIE, the instan-
online PIL test. Conclusion and future work are described in taneous fuel consumption can be obtained by interpolating
Section VI. the speed and torque, and the total fuel consumption is the
cumulative fuel consumption of the powertrain after driving a
II. H YBRID P OWERTRAIN M ODELING time T .
A parallel HEV with the P2 structure is treated as the Lithium-ion batteries take the zero-order equivalent circuit
research object. The powertrain, as shown in Fig. 1, contains based on the internal resistance to express the dynamic change
an ICE, clutch, MG, lithium-ion battery, hydraulic torque in the SOC, which is defined as follows [39]:
converter, CVT, final drive, and so on. According to the power
Voc − Voc2 − 4Rint Pbatt
flow, the driving modes can be divided into pure electric, SOC· = − (4)
pure engine, hybrid drive, driving charging, and regenerative 2Rint Q batt
braking modes. Detailed parameters about the car are listed where Pbatt is the power of the battery, Voc is the open-circuit
in Table I, which are derived from the simulation software voltage, Q batt is the nominal capacity of the battery pack,
AUTONOMIE [38] developed by the U.S. Argonne National Rint is the internal resistance, and SOC ˙ is the amount of
Laboratory. instantaneous change in the SOC.
Furthermore, the safety of the powertrain model requires the
A. Longitudinal Dynamics real-time state to satisfy constraints of energy sources, such
Based on the characteristics of longitudinal dynamics, as batteries, and power sources, such as the engine and the
rolling resistance Froll , acceleration resistance Facc , slope motor in (5) and (6). The speed, torque, and SOC are limited
Fig. 2. Car-following environment model built by CATIA. (a) Car-following

environment. (b) Location of the camera. (c) Following distance = 5 m.
(d) Following distance = 10 m. (e) Following distance = 15 m. (f) Following
distance = 20 m.
within the upper and lower thresholds. Moreover, due to the

complexity of the real environment, some preassumptions need
to be proposed.
1) Ignoring the energy consumption of other loads on the
vehicle, such as air conditioning.
2) The vehicle drives on a flat and well road in an envi-
ronment, resulting in no slope resistance. Fig. 3. Network structure of YOLO v3.
3) The torque converter is not treated as a controlled
component, and the speed ratio and torque ratio are
constant to 1. 2) determining the real-time distance between two cars by
4) Ignoring the transient response of clutch and CVT. completing the image calibration.
5) Ignoring the influence of temperature on the powertrain For the target recognition stage, YOLO can identify and
locate the position of the leading car in the images captured
by the onboard camera. As a kind of popular object detection
socmin ≤ soc(t) ≤ socmax algorithm, the structure of YOLO v3 is drawn in Fig. 3.
Battery (5)
Pbatt_ min ≤ Pbatt (t) ≤ Pbatt_ max Images of any size are compressed to the predefined size
⎧
⎨ of “416 × 416” and input to the main network, Darknet-53.
⎨ ωmin ≤ ω(t) ≤ ωmax
With the computation of the different convolutional layers
Engine and Motor Tmin ≤ T (t) ≤ Tmax (6)
⎨
⎩ and residual blocks, three feature layers of the images are
Pmin ≤ P(t) ≤ Pmax . extracted. Meanwhile, relying on the residual structure [40],
the depth of networks can be increased, while the accuracy
III. V ISION &DRL-BASED H IERARCHICAL C ONTROL can be improved. The result of YOLO includes the location
S TRUCTURE of the predicted bounding box in (7), the confidence in (8),
and the class of the object
A. Vision-Based Distance Measurement
The car-following scene as a base environment is modeled bx = σ (tx ) + cx , b y = σ (t y ) + c y , bw = pw etw , bh = ph eth
in Fig. 2(a) by the 3-D software CATIA, and the installation (7)
position of the camera is shown in Fig. 2(b). Following the
where cx and c y are the position of the center point of the
different distances between the two cars, 190 images collected
anchor box, σ (.) is sigmoid, (tx , t y ) is the center offset, (tw , th )
from a distance of 1–20 m are used as the material for
is the aspect ratio, pw and ph are the width and height of
object recognition. Subsequently, the vision-based distance
anchor boxes, and bx , b y , bw , and bh are the sizes of the
measurement mainly consists of two stages:
predicted box
1) recognizing the leading car from the image based on the
YOLO v3 algorithm; confidence = Pr(object) × IOUtruth
pred (8)
maximum action-value corresponding to each state

TD error = r + γ max Q s , a − Q(s, a) (10)
a
Q(s, a) ← Q(s, a)

+α r + γ max Q s , a − Q(s, a) (11)
a
where π is the control strategy, s is the state, a is the action,

r is the reward, α is the learning rate, s is the next state, a
is the next action, and γ is the discount rate that represents
the importance of future reward
−T

N
Q π (s, a) = E r (s(t), a(t))γ |s(0) = s, a(0) = a
t
(12)
t=0
where T is the total time step and Q(s, a) means the expected
value of the discounted accumulated rewards.
However, the Q-learning suffers from the “curse of dimen-
sionality” and “discretization error” [42] by using a tabular to
record the action-value function. Hence, neural networks with
Fig. 4. Algorithm framework of DQN. powerful fitting capabilities are introduced to overcome these
inherent defects in (13), The original update of the action value
is also changed to the way of updating the neural network
where Pr(object) is the probability of object existence and parameters after calculating the loss in (14) and the gradient
IOUtruth in (15). Meanwhile, the loss in (14) can be understood as a
pred is the ratio of the intersection and union of the true
box and the predicted box. quantitative way of evaluating how it is beneficial to execute
Parts of the recognition results at different distances are action a in state s. Without grasping the next action a , the
shown in Fig. 2(c)–(f). Once the distance between two cars difference between the predicted value and the target value
exceeds 20 m, the pixel area occupied by the leading car in could be an important criterion. The smaller the loss, the closer
the image is too small to detect; thus, the urban driving cycles to the optimal action
are selected due to the flaw. Then, the predicted bounding
Q(s, a|θ ) ≈ Q(s, a) (13)
box of YOLO is treated as the base information to complete
the distance measurement. However, during the recognition where θ is the parameters of neural networks, including the
of YOLO, the size of the bounding box is often jittery, and weights w and the biases b of all neurons
the center point is relatively more stable. The center point 2
of the predicted box is used to calibrate the visual distance L(θ ) = E r + γ max Q(s , a |θ ) − Q(s, a|θ ) (14)
on the v-axis in the pixel coordinate system, and a quartic
∇θ L(θ ) = E r + γ max Q s , a |θ
polynomial is fitted to estimate the real-time distance
− Q(s, a|θ ) ∇θ Q(s, a|θ ) (15)
D = −6.257 × 10−6 · v 4 − 4.4 × 10−3 · v 3 + 1.195 · v 2
where r + γ maxQ(s , a |θ ) is the target value and Q(s, a|θ )
−142.47 · v + 6387.46 (9) is the predicted value.
Moreover, the experience replay and the target network
where v is the coordinate of the center point of the predicted are designed to optimize the convergence during the learning
bounding box on the v-axis and D is the visual distance. period. For the former, by storing the samples generated by
the iterative process of RL in an experience pool of limited
capacity and randomly selecting minibatches of samples when
B. Deep Reinforcement Learning
updating neural networks, the correlation between the training
As the first DRL algorithm, the structure of DQN, as shown samples is broken to a certain extent, thus ensuring the
in Fig. 4, contains four modules: the Q-learning algorithm, completeness of the exploration of strategies [43]. For the
the neural network, the experience replay, and the target latter, the target network avoids the drawback of abnormal
network [41]. Based on the process of data interaction fluctuation of the loss caused by using only an online network
between the environment and the agent in the Q-learning to output the target value r + γ maxQ(s , a |θ ) with the
algorithm, the reward r (t) is generated after the Environment predicted value Q(s, a|θ ). Although the target network takes
executes the action a(t) in the state s(t), and the TD error the same structure as the online network, the update method
in (10) is determined to update the action value Q(s, a) is to directly copy the parameters of the online network after
in (11), which represents the expected value of the discounted a time interval. With the help of this time-delay effect, the
accumulated rewards in (12). In this way, the optimal strategy advantages of the different actions in each state can be more
π ∗ obtained after iterative learning can be represented by the clearly highlighted.
3) The lower layer is the energy management module. Once

obtaining driving data, such as speed and acceleration,
the driving resistance and the demand power can be
calculated. DQN is used to synchronously control the
amount of change in the engine power and the gear ratio
of CVT. In this way, the former achieves the reasonable
distribution of the power flow, while the latter maintains
the economic speed of the engine and MG.
During the learning process of control strategies by
RL/DRL, off-line training and online testing are two neces-
sary steps. By running the global environment iteratively, the
optimal action corresponding to each state can be explored in
the agent, and the mark of the completion of the training lies
in the stable convergence of the loss or the total accumulative
reward. At that time, the optimal control strategy under the
current environment fitted by neural network parameters has
been formed. In a new environment, the online test verifies
the mapping relationship of the current parameterized control
strategy. The new driving cycle is run only once during the
test, and the online neural network is prohibited from updating,
which leads to the neural network fitting a deterministic high-
dimensional strategy function.
For the definition of variable space of the proposed
Vision&DRL-based hierarchical control structure, the state
space S, the action space A, and the reward function R are
defined separately, which can directly affect the final learning
effect
Scar-follow = {dis12 , vel2 } (17)
SEMS = {soc,vel2 , acc2 , RatioCVT } (18)
Fig. 5. Vision&DRL-based hierarchical control structure.
where dis12 is the visual distance between the two cars, vel2
and acc2 are the speed and acceleration of the following car,
soc is the state of charge, and RatioCVT is the CVT gear ratio
C. Hierarchical Control Structure
of the following car:
Aiming at forming vehicle vision from images captured
by onboard cameras to achieve the autonomous control of Acar-follow = acc2
HEVs, a Vision&DRL-based hierarchical control structure = [−2, −1.5, −1, −0.5, 0, 0.5, 1, 1.5, 2](m/s2 )
in Fig. 5 is proposed, which includes YOLO-based target (19)
recognition, vision-based distance measurement, DQN-based
Peng = [−5, −3, −1, 0, 1, 3, 5](kW)
acceleration control, and DQN-based EMS. AEMS = (20)
RatioCVT = [−0.1, −0.05, 0, 0.05, 0.1]
1) The upper layer belongs to the environment perception
module. Driving images in front of the car are collected where P eng is the amount of change in engine power and
by the onboard camera, and YOLO v3 is adopted to RatioCVT is the amount of change of CVT gear ratio
detect and locate the leading car. Furthermore, the real-
time distance can be evaluated by precalibrated mapping. Rcar-follow = −1 × α · abs(dis12 − dissafe ) + β · punishdis
2) The middle layer is designed for maintaining the dis-
tance. The safe distance as a reference is defined as (21)
follows [44]:
REMS = −1 × χ · abs(RatioCVT − Ratioref )

S0 = Th v f + d0 (16) + ψ · ṁ + τ · abs SOC-SOCtarget
+ γ · ηeng (22)
where Th is the linear coefficient, v f is the speed of the
following car, and d0 is the safe distance after stopping. where α, β, χ, ψ, τ , and γ are weights, dissafe is the safe
The linear coefficient Th that relates to the driving style distance to follow the car, punishdis is a penalty item, Ratioref
equals 1 s, and the safe distance after stopping d0 equals is the reference of CVT gear ratio, ṁ is the fuel consumption,
2.35 m. After determining the real-time safe distance, the ηeng is the engine efficiency, and abs(.) means the absolute
DQN algorithm is employed to control the acceleration value. After several adjustments, the above six weights are
of the following car to keep the optimal distance. defined as α = 0.01, β = 1, χ = 1, ψ = 3, τ = 1, and γ = 8.
For all of the above definitions, something related should TABLE II

be clearly explained. H YPERPARAMETERS AND THE N ETWORK S TRUCTURE
1) The distance dis12 between two cars in the state space
Scar-follow belongs to the visual distance obtained by the
method of the vision-based distance measurement, while
the real distance can be calculated by the velocity of two
cars, and the image input to the YOLO v3 network is
also selected based on the real distance.
2) Simulations have demonstrated that defining the action
space as the amount of change to the target can achieve
better control effects for RL or DRL algorithms, which
is recommended.
3) In the early stages of the training process, there will TABLE III
exist some abnormal phenomena due to trial-and-error T RAINING P SEUDOCODE OF THE DRL-BASED EMS
actions, such as rear-end collision or overtaking, result-
ing in the penalty term punishdis added to the reward
in (21)
1, if dis12 ≤ 0
punishdis = (23)
0, else
where the penalty of 1 is given when a collision occurs.
4) For the DQN-based CVT shift strategy, an evaluation
criterion is served to guide the selection. Therefore,
aiming at making the engine working point concentrate
in the high-efficiency speed range, the economic speed,
which is the speed of the CVT input shaft, is defined
as 3000 rpm based on the engine efficiency, and the
reference of CVT gear ratio is determined by inverse cal-
culation. Experiments prove that DRL has good tracking
for reference values.
5) If a reward function contains multiple optimization terms
and the corresponding weights, the method of adjusting
the weights is mainly based on personal experience.
Generally, it is sufficient to ensure that the actual data for
each term are maintained at the same order of magnitude resulted in a total distance of 10.409 km and a total time
and then fine-tune them appropriately according to the of 1680 s. It should be noted that DRL reflects the mapping
importance. However, for the reward function about the relationship between random states and optimal actions, which
EMS, it is necessary to note that the optimization term is independent of the global characteristics of the driving cycle.
for the CVT shift strategy is independent, while there is Hence, the completeness of the state space in the training
a tradeoff between the other three terms regarding power environment will directly affect the applicability of the learned
distribution. In some cases, the term on SOC deviation control strategy. The structure of YOLO and DQN is program-
cannot maintain the trajectory well, and increasing the ming by Pytorch and TensorFlow, respectively, and a high-
weight of fuel consumption significantly reduces the performance computer equipped with an NVIDIA GeForce
level of final SOC, which does not satisfy the per- RTX 3080 and an INTEL i7-10700K CPU is employed for
formance requirements of HEVs. Therefore, by adding completing image processing and iterative training. Because
an optimization term about the engine efficiency, it is the training of the YOLO network requires a lot of images to
possible to guide the engine power and maintain the ensure the accuracy of detection, the official trained parameter
SOC trajectory while improving work efficiency. file downloaded from GitHub is used. Meanwhile, seven types
of EMSs are used for comprehensive comparison, and detailed
Finally, details of the hyperparameters and the network
settings of seven types of EMSs are listed in Table IV.
structure are listed in Table II, while the training pseudocode
According to the difference between the control object of
of the DRL-based EMS is listed in Table III.
the engine and the CVT shift strategy, four types of DP-based
EMSs are adopted. In addition, DP and QL, as the comparison
IV. O FF -L INE T RAINING
algorithms, both complete the energy management at the
A. Off-Line Training Settings same demand power without controlling the velocity of the
Inrets Urban Cycle as a standard driving cycle is selected as following car. Furthermore, as mentioned before, DP, QL, and
the training environment, and the speed trajectory is running DQN are only available for discrete control tasks, and the
three times repeatedly to extend the driving mileage, which first two algorithms need discretization of the state space as
TABLE IV
D ETAILED S ETTINGS OF S EVEN T YPES OF EMS S FOR C OMPARISON
well, which is an advantage of DQN. As shown in Table IV

and the action space AEMS , when the target is the throttle, the
range of throttle [0, 1] is divided into 101-step grids; when the
change amount of engine power is controlled, the range can
be determined, as shown in (20). For CVT shift strategies,
the ratios are equally divided into multiple gears according
to 0.1 spacing, and the command includes upshift, downshift,
or maintaining the current gear. The discretization of the state
space is described in Table IV.
The specific training process can be briefly described as
follows: after modeling the network space of YOLO and Fig. 6. Trajectories of total cumulative reward.
DQN algorithms in the agent and the environment of the
car following scene and the powertrain, as mentioned in
Section II and III, the two main functional modules, car-
following and energy management, are trained in turn, and the
parameters of the neural network are saved, which facilitates
adjustment of hyperparameters and weights based on the actual
results. When it is time to prepare for the online test, the two
previous functional modules are combined to form a complete
hierarchical control structure; all of the network parameters
fitting the learned strategies are loaded simultaneously under
a new environment to verify uniformly the optimality and
adaptability for random driving environments.
B. Analysis of Training Results Fig. 7. Training results of speed trajectory and car-following distance. (a) Red
First, DQN-based car-following and DQN-based EMSs and blue lines are the speed trajectories of the leading car and the following
car, respectively. (b) Red line is the defined as safe following distance, the
included in the Vision&DRL-based hierarchical control strat- blue line is the actual following distance, and the green line represents the
egy are trained. After 100 rounds of iterations, the total following distance by YOLO-based visual estimation.
cumulative reward of the above two control strategies is drawn
in Fig. 6. Basically, the reward keeps in a stable convergence Second, the results of the DQN-based car-following strategy
after about 60 rounds, which also marks the completion of the are shown in Fig. 7, including the speed tracking in Fig. 7(a)
training. The parameters of the online network in the agent and distance maintenance in Fig. 7(b), and the speed trajectory
will be saved and reloaded in the new environment for the of the leading car is the same as that of Inrets Urban Cycle.
online test. The results prove that the calibrated YOLO-based visual
TABLE V
T RAINING R ESULTS OF E IGHT EMS S ( 0.0001 = 0.1783 g)
TABLE VI
D EVELOPER K IT T ECHNICAL S PECIFICATIONS
Fig. 8. SOC trajectories of eight kinds of EMSs in the training driving cycle.
distance has good accuracy compared with the real distance,

and the DQN-based car-following control can maintain the
ideal distance near the expected distance. When parking, the
real distance is slightly larger than the shortest safe distance,
and when driving, it can follow the safe distance.
Third, training results of the DQN-based EMS include SOC
trajectories in Fig. 8, working points of the engine in Fig. 9,
and sequences of the CVT gear ratio in Fig. 10. Because of the
parallel HEV treated as the target, maintaining the initial SOC
close to the final SOC is a basic requirement. The eight SOC Finally, detailed results are listed in Table V. Due to the
trajectories in Fig. 8 can maintain good maintainability, and difference between the engine control object and the CVT shift
all trajectories also reveal a high degree of correlation with the strategy, four DP-based EMSs have achieved different fuel
characteristics of the driving cycle. Combining working points consumption and working time. Treating the DP(T)/DP-based
of the engine with sequences of the CVT gear ratio, it can EMS as the benchmark, SOC deviation can be equivalent
be found four types of CVT shift strategies can maintain the to fuel consumption based on the relationship ( 0.0001 =
working point within a predefined range. Aiming at keeping 0.1783 g), ensuring a fair comparison of fuel economy.
the engine to work in a high-efficiency speed range for a longer After completing that, the unit of fuel economy is unified
period according to the map, the RB CVT shift strategy allows to L/100 km, where the density of gasoline is defined to
the engine speed to be more maintained at around 3000 rpm. 0.727 g/ml. Results show that, when the throttle is the control
Moreover, the suboptimal learning effect of the DQN on the object, the fuel economy of4.66 L/100 km and 4.73 L/100 km,
RB CVT shift strategy can be reflected in Fig. 9(h). Although and the times consumed of 147.81 and 10.75 s are obtained
the working range is not strictly restricted, it is centered due to the different CVT shift strategies. When the object is
at 3000 rpm, showing a symmetrical distribution. Generally, changed to the amount of change in engine power, the fuel
when the DP-based CVT shift strategy is adopted, the working economy of 5.52 L/100 km and 6.73 L/100 km and the times
points are maintained at [1000 rpm, 2000 rpm]. When the consumed of 3071.24 and 1068.36 s are achieved, respectively.
RB-based CVT shift strategy is employed, the engine speed is The analysis shows that the difference in shift strategy has a
concentrated below 3000 rpm, and the working points reveal significant impact on fuel economy, and the control strategy
the same characteristics, while the working points based on the will produce a larger amount of calculation when the amount
Vision&DRL-based EMS show the distribution characteristics of change is the object. Although the throttle used as the object
of high speed and low torque. can get the theoretically optimal fuel economy, the result
Fig. 9. Working points of the engine in the training driving cycle. (a) DP(T)/DP-based EMS. (c) DP(T)/RB-based EMS. (e) QL( )/DP-based EMS.
(g) DQN( )/DP-based EMS. (b) DP( )/DP-based EMS. (d) DP( )/RB-based EMS. (f) QL( )/RB-based EMS. (h) Vision&DRL-based EMS.
Fig. 10. Sequences of the CVT gear ratio in the training driving cycle.
proves that this kind of control strategy has a serious torque under the new environment NYC Traffic Cycle that is run
mutation phenomenon, which should never exist in reality. three times repeatedly, the trained network parameter file is
Therefore, taking the change of power as the control tar- reloaded into the online network of the agent. The total time
get can significantly optimize comfort. For the four types is 1797 s, and the total mileage is 5.662 km. It should be
of LB EMSs, QL-based EMSs get the fuel economy of noted that the continued update of the network is prohibited
6.68 L/100 km and 7.08 L/100 km, and both consume about during testing, and the network only represents a deterministic
1.86 s to run a cycle, while the Vision&DRL-based EMS gets control strategy. Whether the driving cycle is run once or more
a fuel economy of 7.62 L/100 km and consumes 1.92 s. times, the final result will not change too much.
V. O NLINE P ROCESSOR - IN - THE -L OOP E XPERIMENT A. Processor in the Loop Experimental Equipment
The Vision&DRL-based hierarchical control strategy The Vision&DRL-based hierarchical control strategy is
involves the FCN of DRL and the convolutional network verified by the online PIL experiment. The main equipment
of YOLO. As key performance indicators, optimization and NVIDIA Jetson AGX Xavier is connected to the HP monitor
real-time efficiency are tested on the embedded device, and by an HDMI cable and is controlled by a wireless mouse
Fig. 13. SOC trajectories of eight kinds of EMSs in the testing driving cycle.
Fig. 11. PIL equipment.

relationship with that of off-line training. One reason is that
both two-speed trajectories belong to low-speed urban driving
cycles, and on the other hand, the learned control strategy
obtained after training is a deterministic strategy function,
reflecting the relationship between the random state and the
optimal action.
When the DP-based CVT shift strategy is adopted, the
working points in Fig. 14(a) and (b) are maintained within the
speed range [1000 rpm, 3000 rpm], while the working points in
Fig. 14(c)–(e) are more concentrated below 3000 rpm. When
the amount of change in the engine power is employed, the
working points in Fig. 14(d)–(f) have formed several clear
equal power curves.
Finally, detailed results are listed in Table VII. With the
same as the training results, the DP-based EMS with the
Fig. 12. Results of distance estimation and car-following.
throttle as the control achieves minimum fuel consumption of
about 3.2 L/100 km, and due to the difference in the variable
and keyboard. The physical connection is revealed in Fig. 11. space dimensions, two DP(T)-based EMSs spend 167.96 and
11.12 s to run a driving cycle for testing, respectively. When
As the AI-powered autonomous machines, NVIDIA Jetson
the control is changed to the amount of change in engine
AGX Xavier modules provide AI performance up to 32 TOPS,
power, the fuel economy also shows a big difference due
and a comprehensive set of tools and workflows is helpful to
to different CVT shift strategies. Two types of DP( )-based
quickly train and deploy neural networks [45]. The detailed
specifications are listed in Table VI. EMSs strategies achieve 3.49-L/100-km and 4.48-L/100-km
fuel consumptions after consuming 3385.41 and 1154.06 s,
respectively. Although the fuel economy and computational
B. Analysis of Test Results efficiency are not good in comparison, it is a reliable and stable
First, test results of YOLO-based distance estimation and method to take the amount of power change as the object.
DQN-based car-following control are shown in Fig. 12. The For the four types of LB EMSs, computational efficiency is
visual distance in (9) of the DQN-based car-following con- always a significant feature. When QL-/RB-based EMS gets
trol is the estimated value from the environment perception fuel consumptions of 4.89 L/100 km and 5.12 L/100 km, the
module, instead of the real-time distance. Without the help of consumed times are about 1.96 s. The Vision&DRL-based
communication technology, such as V2V, the true distance is EMS obtains a fuel consumption of 5.76 L/100 km. Such fuel
only a value that exists objectively, and the estimated distance economy is determined by the CVT shift strategy. When the
measured by radar or camera is a known subjective. Test DP-based shift strategy is adopted, the fuel economy of the
results show that both the YOLO-based visual distance and DQN( )/DP-based EMS can reach 4.9 L/100 km. Therefore,
the real following distance can still satisfy the performance the RB-based CVT strategy is more suitable for high-speed
requirements and show a good tracking effect. While accu- driving cycle to a certain extent, and the adaptability of the
rately measuring the real-time distance, it guarantees the safe shifting strategy needs to be improved.
driving of the following car. It is worth noting that the Vision&DRL-based hierarchical
Second, online test results of the Vision&DRL-based EMS control strategy in the embedded device NVIDIA Jetson AGX
are shown in Figs. 13–15. Generally, facing the new test Xavier takes a total of 476.87 s, which indicates that, under
driving cycle, whether it is SOC trajectories, the distribution the driving cycle of 1797 s, a loop of tasks can be completed
characteristics of working points, sequences of the CVT gear within 0.26 s, including object detection, distance estimation,
ratio, and the engine speed, the results still maintain a similar car-following in the 2-D state space, and multiobjective energy
TABLE VII
T EST R ESULTS OF E IGHT EMS S ( 0.0001 = 0.043 g)
Fig. 14. Working points of the engine in the testing driving cycle. (a) DP(T)/DP-based EMS. (c) DP(T)/RB-based EMS. (e) QL( )/DP-based EMS.
(g) DQN( )/DP-based EMS. (b) DP( )/DP-based EMS. (d) DP( )/RB-based EMS. (f) QL( )/RB-based EMS. (h) Vision&DRL-based EMS.
Fig. 15. Sequences of the CVT gear ratio in the testing driving cycle.
management in the 4-D state space. This is the biggest EMSs can achieve near-optimal optimization, good adaptabil-
performance advantage of DRL compared to traditional RL. ity for the new environment, and the potential for real-time
Meanwhile, a conclusion can be drawn that DRL-based applications.
C. Discussion of Existing Problems regarding the throttle as a control target has obtained the lowest
As a preliminary attempt integrating object detection with fuel consumption, but there is an obvious phenomenon of a
LB strategy, the Vision&DRL-based hierarchical control strat- sudden change in torque. Therefore, it is more reasonable to
egy still exhibits defects to be resolved later. control the amount of change in the power of the engine, which
1) Experiment Relies on the Hybrid Control Unit Instead of is beneficial to improve comfort.
Simulation or PIL: Whether it is a computer simulation or PIL Future work will keep treating vehicle vision as a key
experiments, high-performance hardware, such as NVIDIA research direction. The Visual&DRL-based hierarchical con-
GeForce RTX 3080, INTEL i7-10700K, and NVIDIA Jet- trol structure proposed in this article is a preliminary attempt to
son AGX Xavier, is adopted to train and verify CNN-based integrate computer vision with intelligent control algorithms.
object detection algorithms and FCN-based control strategies. In complex driving environments, the information acquired in
However, HCU does not have sufficient computing power, real time by onboard cameras allows vehicle vision technology
and the related tests carried out by HIL or real car are more to be one of the main sources for future autonomous driving.
convincing [46], resulting in that an appropriate simplification Although only the position of the leading car is grasped in
of the neural network model is essential. the car-following environment, the driving images contain a
2) More Realistic Training Environment: The training envi- variety of information, such as road type, weather, pedes-
ronment for the Vision&DRL-based hierarchical control strat- trians, obstacles, and lane lines. Mastering more dynamics
egy is simple and ideal, and the method of evaluating the of the environment is beneficial to improving the safety
following distance is rough. Many potential influencing factors of autonomous driving. Moreover, according to the hybrid
are ignored, such as slope, pavement quality, and camera structure designed by Deep Mind for the Atari 2600, it can
shake in a dynamic environment, and the leading car must also be treated as a reference for the hierarchical control
be a special model that has been precalibrated. Therefore, structure, which can adopt a single hybrid network to control
increasing the dimension of the state space and describing the the vehicle. However, the unsolved problems include how
environment in detail by more state variables could improve to ensure the stability and robustness of this network with
the adaptability facing random environments. black box attributes, and whether the network can be regularly
3) Stability and Adaptability of Neural Network: As we all updated through cloud server computing to achieve gradual
know, the neural network belongs to a black box with many optimization.
parameters. Although DL has the excellent capability to mine R EFERENCES
the data and extract high-dimensional features, only the loss [1] X. Hu, H. Wang, and X. Tang, “Cyber-physical control for energy-saving
or the reward can be observed. For continuously improving vehicle following with connectivity,” IEEE Trans. Ind. Electron., vol. 64,
stability and adaptability, the essential properties of the neural no. 11, pp. 8578–8587, Nov. 2017.
[2] J. Liao, T. Liu, X. Tang, X. Mu, B. Huang, and D. Cao, “Decision-
network need to be further explored. Besides, when the actual making strategy on highway for autonomous vehicles using deep
application is not affected, it is feasible to combine cloud reinforcement learning,” IEEE Access, vol. 8, pp. 177804–177814,
computing and edge computing. 2020.
[3] K. Yang, X. Tang, Y. Qin, Y. Huang, H. Wang, and H. Pu, “Comparative
study of trajectory tracking control for automated vehicles via model
VI. C ONCLUSION AND F UTURE W ORK predictive control and robust H-infinity state feedback control,” Chin.
J. Mech. Eng., vol. 34, no. 1, pp. 1–14, Dec. 2021.
This article proposed a Vision&DRL-based hierarchical [4] W. Li et al., “Deep reinforcement learning-based energy management of
control strategy under a typical car-following environment, hybrid battery systems in electric vehicles,” J. Energy Storage, vol. 36,
Apr. 2021, Art. no. 102355.
which completes vision-based recognition and distance mea- [5] T. Liu, X. Tang, H. Wang, H. Yu, and X. Hu, “Adaptive hierar-
surement, car-following control, and energy management. chical energy management design for a plug-in hybrid electric vehi-
First, the position of the leading car and the distance between cle,” IEEE Trans. Veh. Technol., vol. 68, no. 12, pp. 11513–11522,
Jul. 2019.
two cars are obtained by YOLO and vision-based distance [6] X. Tang, H. Zhou, F. Wang, W. Wang, and X. Lin, “Longevity-
measurement. Second, DQN is employed to control the accel- conscious energy management strategy of fuel cell hybrid electric vehi-
eration of the following car, aiming at maintaining a reasonable cle based on deep reinforcement learning,” Energy, vol. 238, Jan. 2022,
Art. no. 121593.
following distance. Finally, the power of the engine and the [7] M. F. M. Sabri, K. A. Danapalasingam, and M. F. Rahmat, “A review on
gear ratio of the CVT are synchronously controlled by LB hybrid electric vehicles architecture and energy management strategies,”
EMS. After completing off-line training, an online PIL exper- Renew. Sustain. Energy Rev., vol. 53, pp. 1433–1442, Jan. 2016.
[8] F. Zhang, X. Hu, R. Langari, and D. Cao, “Energy management
iment is performed. The results on the edge computing device strategies of connected HEVs and PHEVs: Recent progress and outlook,”
NVIDIA Jetson AGX Xavier show that the vision-based recog- Prog. Energy Combustion Sci., vol. 73, pp. 235–256, Jul. 2019.
nition and distance measurements have considerable accuracy, [9] Y. Luo, T. Chen, S. Zhang, and K. Li, “Intelligent hybrid electric
vehicle ACC with coordinated control of tracking ability, fuel economy,
which can guide the DQN-based car-following strategy to and ride comfort,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 4,
maintain the real-time distance within a safe range. Compared pp. 2303–2308, Aug. 2015.
with the other seven types of EMSs, the Vision&DRL-based [10] X. Tang, Z. Zhang, and Y. Qin, “On-road object detection and tracking
based on radar and vision fusion: A review,” IEEE Intell. Transp. Syst.
hierarchical control strategy achieves fuel consumption of Mag., early access, Aug. 4, 2021, doi: 10.1109/MITS.2021.3093379.
5.76 L/100 km, and the time consumed to run a driving [11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar-
cycle is 476.87 s, which means that, even if the environment chies for accurate object detection and semantic segmentation,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587.
and control tasks are more complex, a control loop can be [12] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis.
completed within 0.26 s. Furthermore, the DP-based EMS (ICCV), Dec. 2015, pp. 1440–1448.
[13] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards [36] Y. Wang, H. Tan, Y. Wu, and J. Peng, “Hybrid electric vehicle
real-time object detection with region proposal networks,” IEEE Trans. energy management with computer vision and deep reinforcement
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017. learning,” IEEE Trans. Ind. Informat., vol. 17, no. 6, pp. 3857–3868,
[14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look Jun. 2021.
once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. [37] W. Li et al., “Cloud-based health-conscious energy management
Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788. of hybrid battery systems in electric vehicles with deep
[15] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in reinforcement learning,” Appl. Energy, vol. 293, Jul. 2021,
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, Art. no. 116977.
pp. 6517–6525. [38] A. Lajunen and T. Lipman, “Lifecycle cost assessment and carbon
[16] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” dioxide emissions of diesel, natural gas, hybrid electric, fuel cell
2018, arXiv:1804.02767. hybrid and electric transit buses,” Energy, vol. 106, pp. 329–342,
[17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, “SSD: Single Jul. 2016.
shot multibox detector,” in Proc. Eur. Conf. Comput. Vis. (ECCV), [39] J. Chen, H. Shu, X. Tang, T. Liu, and W. Wang, “Deep rein-
Oct. 2016, pp. 21–37. forcement learning-based multi-objective control of hybrid power
system combined with road recognition under time-varying envi-
[18] D.-D. Tran, M. Vafaeipour, M. El Baghdadi, R. Barrero, J. Van Mierlo,
ronment,” Energy, vol. 239, Jan. 2022, Art. no. 122123, doi:
and O. Hegazy, “Thorough state-of-the-art analysis of electric and
10.1016/j.energy.2021.122123.
hybrid vehicle powertrains: Topologies and integrated energy manage-
[40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
ment strategies,” Renew. Sustain. Energy Rev., vol. 119, Mar. 2020,
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Art. no. 109596.
(CVPR), Jun. 2016, pp. 770–778.
[19] J. Peng, H. He, and R. Xiong, “Rule based energy management
[41] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
strategy for a series–parallel plug-in hybrid electric bus optimized
“Deep reinforcement learning: A brief survey,” IEEE Signal Process.
by dynamic programming,” Appl. Energy, vol. 185, pp. 1633–1643,
Mag., vol. 34, no. 6, pp. 26–38, Nov. 2017.
Jan. 2017.
[42] H. Tan, H. Zhang, J. Peng, Z. Jiang, and Y. Wu, “Energy management
[20] S. B. Xie, X. Hu, S. Qi, X. Tang, K. Lang, Z. Xin, and J. Brighton, of hybrid electric bus based on deep reinforcement learning in con-
“Model predictive energy management for plug-in hybrid electric vehi- tinuous state and action space,” Energy Convers. Manage., vol. 195,
cles considering optimal battery depth of discharge,” Energy, vol. 173, pp. 548–560, Sep. 2019.
pp. 667–678, Apr. 2019. [43] B. Hu and J. Li, “An edge computing framework for powertrain control
[21] X. Tang, T. Jia, X. Hu, Y. Huang, Z. Deng, and H. Pu, “Naturalistic system optimization of intelligent and connected vehicles based on
data-driven predictive energy management for plug-in hybrid electric curiosity-driven deep reinforcement learning,” IEEE Trans. Ind. Elec-
vehicles,” IEEE Trans. Transport. Electrific., vol. 7, no. 2, pp. 497–508, tron., vol. 68, no. 8, pp. 7652–7661, Aug. 2021.
Jun. 2021. [44] S. Moon and K. Yi, “Human driving data-based design of a vehicle
[22] X. Hu, T. Liu, X. Qi, and M. Barth, “Reinforcement learning for hybrid adaptive cruise control algorithm,” Veh. Syst. Dyn., vol. 46, no. 8,
and plug-in hybrid electric vehicle energy management: Recent advances pp. 661–690, 2008.
and prospects,” IEEE Ind. Electron. Mag., vol. 13, no. 3, pp. 16–25, [45] J. Jeon, S. Jung, E. Lee, D. Choi, and H. Myung, “Run your visual-
Sep. 2019. inertial odometry on NVIDIA Jetson: Benchmark tests on a micro aerial
[23] D. Silver et al., “Mastering the game of go with deep neural networks vehicle,” IEEE Robot. Automat. Lett., vol. 6, no. 3, pp. 5332–5339,
and tree search,” Nature, vol. 529, pp. 484–489, Jan. 2016. Jul. 2021.
[24] D. Silver et al., “Mastering the game of go without human knowledge,” [46] R. Zou, L. Fan, Y. Dong, S. Zheng, and C. Hu, “DQL energy
Nature, vol. 550, no. 7676, pp. 354–359, 2017. management: An online-updated algorithm and its application in
[25] O. Vinyals et al., “Grandmaster level in StarCraft II using multi-agent fix-line hybrid electric vehicle,” Energy, vol. 225, Jun. 2021,
reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019. Art. no. 120174.
[26] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
pp. 436–444, Feb. 2015.
[27] V. Mnih et al., “Human-level control through deep reinforcement learn-
ing,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[28] X. Tang, J. Chen, H. Pu, T. Liu, and A. Khajepour, “Double deep
reinforcement learning-based energy management for a parallel hybrid
electric vehicle with engine start-stop strategy,” IEEE Trans. Transport.
Electrific., early access, Jul. 30, 2021, doi: 10.1109/TTE.2021.3101470.
[29] X. Tang, J. Chen, T. Liu, Y. Qin, and D. Cao, “Distributed deep
reinforcement learning-based energy and emission management strategy
for hybrid electric vehicles,” IEEE Trans. Veh. Technol., vol. 70, no. 10,
pp. 9922–9934, Oct. 2021.
[30] J. Wu, Z. Wei, W. Li, Y. Wang, Y. Li, and D. U. Sauer, “Battery thermal-
Xiaolin Tang (Member, IEEE) received the B.S.
and health-constrained energy management for hybrid electric bus based degree in mechanics engineering and the M.S.
on soft actor-critic DRL algorithm,” IEEE Trans. Ind. Informat., vol. 17,
degree in vehicle engineering from Chongqing
no. 6, pp. 3751–3761, Jun. 2021.
University, Chongqing, China, in 2006 and 2009,
[31] Y. Wu, H. Tan, J. Peng, H. Zhang, and H. He, “Deep reinforcement respectively, and the Ph.D. degree in mechanical
learning of energy management with continuous control strategy and engineering from Shanghai Jiao Tong University,
traffic information for a series-parallel plug-in hybrid electric bus,” Appl. Shanghai, China, in 2015.
Energy, vol. 247, pp. 454–466, Aug. 2019. From August 2017 to August 2018, he was a
[32] X. Han, H. He, J. Wu, J. Peng, and Y. Li, “Energy management based on Visiting Professor with the Department of Mechan-
reinforcement learning with double deep Q-learning for a hybrid electric ical and Mechatronics Engineering, University of
tracked vehicle,” Appl. Energy, vol. 254, Nov. 2019, Art. no. 113708. Waterloo, Waterloo, ON, Canada. He is currently an
[33] T. Liu, X. Hu, S. E. Li, and D. Cao, “Reinforcement learning optimized Associate Professor with the College of Mechanical and Vehicle Engineering,
look-ahead energy management of a parallel hybrid electric vehi- Chongqing University. He has led and has been involved in more than ten
cle,” IEEE/ASME Trans. Mechatronics, vol. 22, no. 4, pp. 1497–1507, research projects, such as the National Natural Science Foundation of China.
Aug. 2017. He has published more than 50 articles. His research interests include hybrid
[34] Y. Li, H. He, A. Khajepour, H. Wang, and J. Peng, “Energy management electric vehicles, vehicle dynamics, and transmission control.
for a power-split hybrid electric bus via deep reinforcement learning with Dr. Tang is also a Committee Member of the Technical Committee on
terrain information,” Appl. Energy, vol. 255, Dec. 2019, Art. no. 113762. Vehicle Control and Intelligence of the Chinese Association of Automation
[35] R. Lian, H. Tan, J. Peng, Q. Li, and Y. Wu, “Cross-type transfer (CAA). He was a recipient of several prestigious awards/honors, including
for deep reinforcement learning based hybrid electric vehicle energy the Bayu Scholar and First Prize of Chongqing Natural Science. He is also
management,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 8367–8380, an Editor of International Journal of Vehicle Performance and Journal of
Aug. 2020. Chongqing University of Technology.
Jiaxin Chen received the B.S. degree in vehicle Teng Liu (Member, IEEE) received the B.S. degree
engineering from the Xi’an University of Technol- in mathematics and the Ph.D. degree in automotive
ogy, Xi’an, China, in 2019. He is currently pursuing engineering from the Beijing Institute of Technology
the M.S. degree in automotive engineering with (BIT), Beijing, China, in 2011 and 2017, respec-
Chongqing University, Chongqing, China. tively. His Ph.D. dissertation, under the supervision
His current research interests include energy man- of Prof. Fengchun Sun, was entitled Reinforcement
agement and reinforcement learning. Learning-Based Energy Management for Hybrid
Electric Vehicles.
He was a Research Fellow with Vehicle Intel-
ligence Pioneers Ltd., Qingdao, Shandong, China,
from 2017 to 2018. He was a Post-Doctoral Fellow
with the Department of Mechanical and Mechatronics Engineering, University
of Waterloo, Waterloo, ON, Canada, from 2018 to 2020. He is currently
a Professor with the Department of Automotive Engineering, Chongqing
University, Chongqing, China. He has more than eight years of research
and working experience in renewable vehicles and connected autonomous
vehicles. His current research focuses on reinforcement learning (RL)-based
energy management in hybrid electric vehicles, RL-based decision making
for autonomous vehicles, and cyber-physical-social systems (CPSS)-based
parallel driving. He has published over 40 SCI papers and 15 conference
papers in these areas.
Dr. Liu is also a member of IEEE Vehicular Technology Society (VTS),
IEEE Intelligent Transportation Systems (ITS), IEEE Industrial Electronics
Society (IES), IEEE Transportation Electrification Community (TEC), and
IEEE/Chinese Association of Automation (CAA). He received the Merit
Student of Beijing in 2011, the Teli Xu Scholarship (Highest Honor) of BIT
Kai Yang received the B.E. degree in vehicle engi- in 2015, the “Top 10” in the 2018 IEEE VTS Motor Vehicle Challenge, and
neering from the Wuhan University of Technology, the Sole Outstanding Winner in the 2018 ABB Intelligent Technology Com-
Wuhan, China, in 2018. He is currently pursuing the petition. He is the Workshop Co-Chair of the 2018 IEEE Intelligent Vehicles
Ph.D. degree with the Vehicle Engineering Depart- Symposium (IV 2018). He has been a reviewer in multiple SCI journals,
ment, Chongqing University, Chongqing, China. selectively including IEEE T RANSACTIONS ON I NDUSTRIAL E LECTRONICS ,
His current research interests include vehicle IEEE T RANSACTIONS ON I NTELLIGENT V EHICLES , IEEE T RANSACTIONS
dynamics and autonomous vehicles. ON I NTELLIGENT T RANSPORTATION S YSTEMS , IEEE T RANSACTIONS ON
S YSTEMS , M AN , AND C YBERNETICS : S YSTEMS , IEEE T RANSACTIONS ON
I NDUSTRIAL I NFORMATICS , and Advances in Mechanical Engineering.
Xiaosong Hu (Senior Member, IEEE) received

the Ph.D. degree in automotive engineering from
the Beijing Institute of Technology, Beijing, China,
in 2012. He did scientific research and completed
the Ph.D. dissertation at the Automotive Research
Center, University of Michigan, Ann Arbor, MI,
USA, from 2010 to 2012.
He was a Post-Doctoral Researcher with the
Department of Civil and Environmental Engineering,
University of California at Berkeley, Berkeley, CA,
USA, from 2014 to 2015, and the Swedish Hybrid
Vehicle Center and the Department of Signals and Systems, Chalmers Uni-
versity of Technology, Gothenburg, Sweden, from 2012 to 2014. He was a
Visiting Post-Doctoral Researcher with the Institute for Dynamic Systems and
Control, Swiss Federal Institute of Technology (ETH), Zürich, Switzerland,
Mitsuru Toyoda (Member, IEEE) received the in 2014. He is currently a Professor with the Department of Mechanical and
Ph.D. degree in mechanical engineering from Sophia Vehicle Engineering, Chongqing University, Chongqing, China. His research
University, Tokyo, Japan, in March 2018. interests include modeling and control of alternative powertrains and energy
Since April 2018, he has been a Project Assistant storage systems.
Professor with the Institute of Statistical Mathemat- Dr. Hu is also an IET Fellow. He was a recipient of numerous prestigious
ics, Tokyo. In April 2019, he joined the Depart- awards/honors, including the Web of Science Highly-Cited Researcher by
ment of Mechanical Systems Engineering, Tokyo Clarivate Analytics, the SAE Environmental Excellence in Transportation
Metropolitan University, Tokyo, where he is cur- Award, the IEEE ITSS Young Researcher Award, the SAE Ralph Teetor
rently an Assistant Professor. His research interests Educational Award, the Emerging Sustainability Leaders Award, the EU Marie
are related to optimal control and stochastic system Currie Fellowship, the ASME DSCD Energy Systems Best Paper Award, and
control theory. the Beijing Best Ph.D. Dissertation Award.

Visual Detection and Deep Reinforcement Learning-Based Car Following and Energy Management For Hybrid Electric Vehicles

Uploaded by

Copyright:

Available Formats

You might also like

Visual Detection and Deep Reinforcement Learning-Based Car Following and Energy Management For Hybrid Electric Vehicles

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Visual Detection and Deep Reinforcement Learning-Based Car Following and Energy Management For Hybrid Electric Vehicles

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO.

2, JUNE 2022 2501

Visual Detection and Deep Reinforcement

Abstract— Practical vision-based technology is essential for DL Deep learning.

resistance Fslope , and air resistance Fair should be overcome,

Fig. 2. Car-following environment model built by CATIA. (a) Car-following

within the upper and lower thresholds. Moreover, due to the

maximum action-value corresponding to each state

where π is the control strategy, s is the state, a is the action,

3) The lower layer is the energy management module. Once

For all of the above definitions, something related should TABLE II

well, which is an advantage of DQN. As shown in Table IV

distance has good accuracy compared with the real distance,

Fig. 11. PIL equipment.

Xiaosong Hu (Senior Member, IEEE) received

You might also like