Professional Documents
Culture Documents
Visual Detection and Deep Reinforcement Learning-Based Car Following and Energy Management For Hybrid Electric Vehicles
Visual Detection and Deep Reinforcement Learning-Based Car Following and Energy Management For Hybrid Electric Vehicles
Visual Detection and Deep Reinforcement Learning-Based Car Following and Energy Management For Hybrid Electric Vehicles
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2502 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
grasping real-time driving conditions and possible obstacles. as DP, PMP, ECMS, and MPC. However, the above two types
Second, a driving decision influenced by the environment is of EMSs always exist certain inherent flaws that are inevitable
employed to determine the behavior, such as cruising, overtak- or even unavoidable in terms of efficiency, adaptability, and
ing, changing lanes, and car-following. Finally, by controlling robustness, making it difficult to get breakthroughs. With the
the command of driving, braking, and steering, autonomous deepening of research about ML, especially the adoption of
driving getting rid of the human driver can be realized while RL [22], some AI programs have gotten outstanding achieve-
ensuring a higher level of security [3]. ments, such as Alpha Go [23], AlphaGo Zero [24], and Alpha
For new energy vehicles, EVs, HEVs, and FCVs are Star [25] designed by the British AI team Deep Mind. Relying
regarded as the main targets [4]–[6]. By equipping batteries as on combining the knowledge of DL [26], Deep Mind succes-
the energy source and an electric motor as the power source, sively proposed various DRL algorithms, such as DQN [27],
EVs realize zero emission, considerable comfort, and lower DDPG [28], and A3C [29]. The series of algorithms have
noise. FCVs will also play an important role in future driving, significantly improved the scope of application and optimized
especially for long-distance commercial vehicles, which relies the learning effect.
on the chemical reaction of hydrogen and oxygen to generate Research scholars in the field of HEV EMSs have also
electrical energy. However, the defects of the above two achieved many remarkable results [30]–[32]. Liu et al. [33]
vehicles are obvious, related to insufficient driving mileage, proposed an RL-based EMS combined with velocity predic-
the safety of batteries, and hydrogen storage [7]. In contrast, tion and verified the optimization and control efficiency by
HEVs have more mature technology and reliable performance HIL experiment. Li et al. [34] established a hybrid action
at present. A hybrid powertrain equipped with an ICE and space integrating discrete and continuous variables, designed
an MG is popular. The technical route includes powertrain a pretraining stage based on DP-based results, and proposed
selection, parameter matching, and energy management. After a DDPG-based EMS considering terrain information for a
a suitable topological configuration is determined based on hybrid electric bus. Lian et al. [35] first tried TL to propose
the actual requirements, the optimal fuel economy and lower an EMS suitable for different types of HEVs, which will play
emissions can be achieved by reasonably distributing the a significant role in the future. Wang et al. [36] proposed a
power flow between the ICE and the MG and selecting the DDPG-based EMS that masters traffic lights and surrounding
driving mode [8]. cars by target detection to analyze the traffic flow. Li et al. [37]
Once a novel direction will be created in combination with proposed a DDPG-based EMS for EVs equipped with a hybrid
the abovementioned, the intelligent HEV is a potential object. battery system (high-power and high-energy batteries). While
The concept of intelligent HEVs has been proposed [9], but optimizing the electrical and thermal safety, energy loss, and
there are not many achievements, and the studies on smart cars aging cost, it combines cloud computing and edge computing
and new energy vehicles are relatively independent. Moreover, as a reliable way for DRL.
advanced communication technologies, such as V2X, are With the joint efforts of the above scholars, LB EMSs
currently only available on a small scale, and the installation of have achieved considerable results. Meanwhile, current studies
communication facilities is expensive, which cannot guarantee indicate that integrating more information about the exter-
suitability in any driving environment. After all, not every nal environment obtained by sensors, radars, cameras, and
street and every car will be equipped with V2X equipment in communications can promote the rationalization and safety of
daily life. By combining vehicle vision with intelligent control, decision-making and control. Dedicated to the development of
a pair of eyes is added to the virtual driver of intelligent HEVs. intelligent HEVs with visualization control technology through
For object detection, there are two basic types of algorithms, the integration of computer vision, this article proposes a
classified as one stage and two stages [10]. The standard two- Vision&DRL-based hierarchical control structure in the car-
stage algorithm searches the candidate region and classifies following scene. After modeling the environment by a 3-D
the object in the region. R-CNN algorithms [11]–[13] are modeling software CATIA and collecting some driving images
representative. The one-stage algorithm can directly detect at different following distances, the details of the hierarchical
targets, including YOLO [14]–[16] and SSD [17]. R-CNN control structure can be described as follows:
has better accuracy, but the computational efficiency is not
ideal. SSD can fulfill the needs of accuracy and efficiency, but A. Upper Layer (Vision-Based Distance Measurement)
hyperparameters that depend on experience are an obstacle. The YOLO v3 algorithm is adopted to detect the leading
For YOLO, considerable detection effects for tiny targets and car in the driving images, and the following distance between
outstanding efficiency are treated as the key features and two cars is visually evaluated after precalibrated mapping.
advantages. Therefore, the subsequent work will also treat
YOLO as one of the base algorithms. B. Middle Layer (Car-Following Control)
Furthermore, the driving performance of HEVs mainly After grasping the real-time distance, the acceleration of the
depends on EMSs loaded into the onboard controller. Accord- following car is controlled by DQN to maintain the following
ing to the current research progress, EMSs are divided into distance within a safe and reasonable range.
three categories [18]: RB, OB, and LB. Before the advent
of LB EMSs, a series of predefined rules based on expert C. Lower Layer (Energy Management Strategy)
experience or various optimization algorithms oriented to the The DQN algorithm is used to synchronously control the
global or instantaneous is mainstream methods [19]–[21], such amount of change in the power of the engine and the gear ratio
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2503
modeling of the hybrid power system. Section III illustrates where Teng and n eng are the engine torque and speed, T is the
YOLO and DRL and proposes the hierarchical control struc- total driving time, ṁ is the instantaneous fuel consumption,
ture. Section IV carries on the off-line training and analyzes and Mfuel is the total fuel consumption. According to the map
the training results. Section V discusses the results of the of fuel consumption obtained by AUTONOMIE, the instan-
online PIL test. Conclusion and future work are described in taneous fuel consumption can be obtained by interpolating
Section VI. the speed and torque, and the total fuel consumption is the
cumulative fuel consumption of the powertrain after driving a
II. H YBRID P OWERTRAIN M ODELING time T .
A parallel HEV with the P2 structure is treated as the Lithium-ion batteries take the zero-order equivalent circuit
research object. The powertrain, as shown in Fig. 1, contains based on the internal resistance to express the dynamic change
an ICE, clutch, MG, lithium-ion battery, hydraulic torque in the SOC, which is defined as follows [39]:
converter, CVT, final drive, and so on. According to the power
Voc − Voc2 − 4Rint Pbatt
flow, the driving modes can be divided into pure electric, SOC· = − (4)
pure engine, hybrid drive, driving charging, and regenerative 2Rint Q batt
braking modes. Detailed parameters about the car are listed where Pbatt is the power of the battery, Voc is the open-circuit
in Table I, which are derived from the simulation software voltage, Q batt is the nominal capacity of the battery pack,
AUTONOMIE [38] developed by the U.S. Argonne National Rint is the internal resistance, and SOC ˙ is the amount of
Laboratory. instantaneous change in the SOC.
Furthermore, the safety of the powertrain model requires the
A. Longitudinal Dynamics real-time state to satisfy constraints of energy sources, such
Based on the characteristics of longitudinal dynamics, as batteries, and power sources, such as the engine and the
rolling resistance Froll , acceleration resistance Facc , slope motor in (5) and (6). The speed, torque, and SOC are limited
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2504 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2505
where T is the total time step and Q(s, a) means the expected
value of the discounted accumulated rewards.
However, the Q-learning suffers from the “curse of dimen-
sionality” and “discretization error” [42] by using a tabular to
record the action-value function. Hence, neural networks with
Fig. 4. Algorithm framework of DQN. powerful fitting capabilities are introduced to overcome these
inherent defects in (13), The original update of the action value
is also changed to the way of updating the neural network
where Pr(object) is the probability of object existence and parameters after calculating the loss in (14) and the gradient
IOUtruth in (15). Meanwhile, the loss in (14) can be understood as a
pred is the ratio of the intersection and union of the true
box and the predicted box. quantitative way of evaluating how it is beneficial to execute
Parts of the recognition results at different distances are action a in state s. Without grasping the next action a , the
shown in Fig. 2(c)–(f). Once the distance between two cars difference between the predicted value and the target value
exceeds 20 m, the pixel area occupied by the leading car in could be an important criterion. The smaller the loss, the closer
the image is too small to detect; thus, the urban driving cycles to the optimal action
are selected due to the flaw. Then, the predicted bounding
Q(s, a|θ ) ≈ Q(s, a) (13)
box of YOLO is treated as the base information to complete
the distance measurement. However, during the recognition where θ is the parameters of neural networks, including the
of YOLO, the size of the bounding box is often jittery, and weights w and the biases b of all neurons
the center point is relatively more stable. The center point 2
of the predicted box is used to calibrate the visual distance L(θ ) = E r + γ max Q(s , a |θ ) − Q(s, a|θ ) (14)
on the v-axis in the pixel coordinate system, and a quartic
∇θ L(θ ) = E r + γ max Q s , a |θ
polynomial is fitted to estimate the real-time distance
− Q(s, a|θ ) ∇θ Q(s, a|θ ) (15)
D = −6.257 × 10−6 · v 4 − 4.4 × 10−3 · v 3 + 1.195 · v 2
where r + γ maxQ(s , a |θ ) is the target value and Q(s, a|θ )
−142.47 · v + 6387.46 (9) is the predicted value.
Moreover, the experience replay and the target network
where v is the coordinate of the center point of the predicted are designed to optimize the convergence during the learning
bounding box on the v-axis and D is the visual distance. period. For the former, by storing the samples generated by
the iterative process of RL in an experience pool of limited
capacity and randomly selecting minibatches of samples when
B. Deep Reinforcement Learning
updating neural networks, the correlation between the training
As the first DRL algorithm, the structure of DQN, as shown samples is broken to a certain extent, thus ensuring the
in Fig. 4, contains four modules: the Q-learning algorithm, completeness of the exploration of strategies [43]. For the
the neural network, the experience replay, and the target latter, the target network avoids the drawback of abnormal
network [41]. Based on the process of data interaction fluctuation of the loss caused by using only an online network
between the environment and the agent in the Q-learning to output the target value r + γ maxQ(s , a |θ ) with the
algorithm, the reward r (t) is generated after the Environment predicted value Q(s, a|θ ). Although the target network takes
executes the action a(t) in the state s(t), and the TD error the same structure as the online network, the update method
in (10) is determined to update the action value Q(s, a) is to directly copy the parameters of the online network after
in (11), which represents the expected value of the discounted a time interval. With the help of this time-delay effect, the
accumulated rewards in (12). In this way, the optimal strategy advantages of the different actions in each state can be more
π ∗ obtained after iterative learning can be represented by the clearly highlighted.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2506 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2507
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2508 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
TABLE IV
D ETAILED S ETTINGS OF S EVEN T YPES OF EMS S FOR C OMPARISON
B. Analysis of Training Results Fig. 7. Training results of speed trajectory and car-following distance. (a) Red
First, DQN-based car-following and DQN-based EMSs and blue lines are the speed trajectories of the leading car and the following
car, respectively. (b) Red line is the defined as safe following distance, the
included in the Vision&DRL-based hierarchical control strat- blue line is the actual following distance, and the green line represents the
egy are trained. After 100 rounds of iterations, the total following distance by YOLO-based visual estimation.
cumulative reward of the above two control strategies is drawn
in Fig. 6. Basically, the reward keeps in a stable convergence Second, the results of the DQN-based car-following strategy
after about 60 rounds, which also marks the completion of the are shown in Fig. 7, including the speed tracking in Fig. 7(a)
training. The parameters of the online network in the agent and distance maintenance in Fig. 7(b), and the speed trajectory
will be saved and reloaded in the new environment for the of the leading car is the same as that of Inrets Urban Cycle.
online test. The results prove that the calibrated YOLO-based visual
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2509
TABLE V
T RAINING R ESULTS OF E IGHT EMS S ( 0.0001 = 0.1783 g)
TABLE VI
D EVELOPER K IT T ECHNICAL S PECIFICATIONS
Fig. 8. SOC trajectories of eight kinds of EMSs in the training driving cycle.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2510 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
Fig. 9. Working points of the engine in the training driving cycle. (a) DP(T)/DP-based EMS. (c) DP(T)/RB-based EMS. (e) QL( )/DP-based EMS.
(g) DQN( )/DP-based EMS. (b) DP( )/DP-based EMS. (d) DP( )/RB-based EMS. (f) QL( )/RB-based EMS. (h) Vision&DRL-based EMS.
Fig. 10. Sequences of the CVT gear ratio in the training driving cycle.
proves that this kind of control strategy has a serious torque under the new environment NYC Traffic Cycle that is run
mutation phenomenon, which should never exist in reality. three times repeatedly, the trained network parameter file is
Therefore, taking the change of power as the control tar- reloaded into the online network of the agent. The total time
get can significantly optimize comfort. For the four types is 1797 s, and the total mileage is 5.662 km. It should be
of LB EMSs, QL-based EMSs get the fuel economy of noted that the continued update of the network is prohibited
6.68 L/100 km and 7.08 L/100 km, and both consume about during testing, and the network only represents a deterministic
1.86 s to run a cycle, while the Vision&DRL-based EMS gets control strategy. Whether the driving cycle is run once or more
a fuel economy of 7.62 L/100 km and consumes 1.92 s. times, the final result will not change too much.
V. O NLINE P ROCESSOR - IN - THE -L OOP E XPERIMENT A. Processor in the Loop Experimental Equipment
The Vision&DRL-based hierarchical control strategy The Vision&DRL-based hierarchical control strategy is
involves the FCN of DRL and the convolutional network verified by the online PIL experiment. The main equipment
of YOLO. As key performance indicators, optimization and NVIDIA Jetson AGX Xavier is connected to the HP monitor
real-time efficiency are tested on the embedded device, and by an HDMI cable and is controlled by a wireless mouse
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2511
Fig. 13. SOC trajectories of eight kinds of EMSs in the testing driving cycle.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2512 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
TABLE VII
T EST R ESULTS OF E IGHT EMS S ( 0.0001 = 0.043 g)
Fig. 14. Working points of the engine in the testing driving cycle. (a) DP(T)/DP-based EMS. (c) DP(T)/RB-based EMS. (e) QL( )/DP-based EMS.
(g) DQN( )/DP-based EMS. (b) DP( )/DP-based EMS. (d) DP( )/RB-based EMS. (f) QL( )/RB-based EMS. (h) Vision&DRL-based EMS.
Fig. 15. Sequences of the CVT gear ratio in the testing driving cycle.
management in the 4-D state space. This is the biggest EMSs can achieve near-optimal optimization, good adaptabil-
performance advantage of DRL compared to traditional RL. ity for the new environment, and the potential for real-time
Meanwhile, a conclusion can be drawn that DRL-based applications.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2513
C. Discussion of Existing Problems regarding the throttle as a control target has obtained the lowest
As a preliminary attempt integrating object detection with fuel consumption, but there is an obvious phenomenon of a
LB strategy, the Vision&DRL-based hierarchical control strat- sudden change in torque. Therefore, it is more reasonable to
egy still exhibits defects to be resolved later. control the amount of change in the power of the engine, which
1) Experiment Relies on the Hybrid Control Unit Instead of is beneficial to improve comfort.
Simulation or PIL: Whether it is a computer simulation or PIL Future work will keep treating vehicle vision as a key
experiments, high-performance hardware, such as NVIDIA research direction. The Visual&DRL-based hierarchical con-
GeForce RTX 3080, INTEL i7-10700K, and NVIDIA Jet- trol structure proposed in this article is a preliminary attempt to
son AGX Xavier, is adopted to train and verify CNN-based integrate computer vision with intelligent control algorithms.
object detection algorithms and FCN-based control strategies. In complex driving environments, the information acquired in
However, HCU does not have sufficient computing power, real time by onboard cameras allows vehicle vision technology
and the related tests carried out by HIL or real car are more to be one of the main sources for future autonomous driving.
convincing [46], resulting in that an appropriate simplification Although only the position of the leading car is grasped in
of the neural network model is essential. the car-following environment, the driving images contain a
2) More Realistic Training Environment: The training envi- variety of information, such as road type, weather, pedes-
ronment for the Vision&DRL-based hierarchical control strat- trians, obstacles, and lane lines. Mastering more dynamics
egy is simple and ideal, and the method of evaluating the of the environment is beneficial to improving the safety
following distance is rough. Many potential influencing factors of autonomous driving. Moreover, according to the hybrid
are ignored, such as slope, pavement quality, and camera structure designed by Deep Mind for the Atari 2600, it can
shake in a dynamic environment, and the leading car must also be treated as a reference for the hierarchical control
be a special model that has been precalibrated. Therefore, structure, which can adopt a single hybrid network to control
increasing the dimension of the state space and describing the the vehicle. However, the unsolved problems include how
environment in detail by more state variables could improve to ensure the stability and robustness of this network with
the adaptability facing random environments. black box attributes, and whether the network can be regularly
3) Stability and Adaptability of Neural Network: As we all updated through cloud server computing to achieve gradual
know, the neural network belongs to a black box with many optimization.
parameters. Although DL has the excellent capability to mine R EFERENCES
the data and extract high-dimensional features, only the loss [1] X. Hu, H. Wang, and X. Tang, “Cyber-physical control for energy-saving
or the reward can be observed. For continuously improving vehicle following with connectivity,” IEEE Trans. Ind. Electron., vol. 64,
stability and adaptability, the essential properties of the neural no. 11, pp. 8578–8587, Nov. 2017.
[2] J. Liao, T. Liu, X. Tang, X. Mu, B. Huang, and D. Cao, “Decision-
network need to be further explored. Besides, when the actual making strategy on highway for autonomous vehicles using deep
application is not affected, it is feasible to combine cloud reinforcement learning,” IEEE Access, vol. 8, pp. 177804–177814,
computing and edge computing. 2020.
[3] K. Yang, X. Tang, Y. Qin, Y. Huang, H. Wang, and H. Pu, “Comparative
study of trajectory tracking control for automated vehicles via model
VI. C ONCLUSION AND F UTURE W ORK predictive control and robust H-infinity state feedback control,” Chin.
J. Mech. Eng., vol. 34, no. 1, pp. 1–14, Dec. 2021.
This article proposed a Vision&DRL-based hierarchical [4] W. Li et al., “Deep reinforcement learning-based energy management of
control strategy under a typical car-following environment, hybrid battery systems in electric vehicles,” J. Energy Storage, vol. 36,
Apr. 2021, Art. no. 102355.
which completes vision-based recognition and distance mea- [5] T. Liu, X. Tang, H. Wang, H. Yu, and X. Hu, “Adaptive hierar-
surement, car-following control, and energy management. chical energy management design for a plug-in hybrid electric vehi-
First, the position of the leading car and the distance between cle,” IEEE Trans. Veh. Technol., vol. 68, no. 12, pp. 11513–11522,
Jul. 2019.
two cars are obtained by YOLO and vision-based distance [6] X. Tang, H. Zhou, F. Wang, W. Wang, and X. Lin, “Longevity-
measurement. Second, DQN is employed to control the accel- conscious energy management strategy of fuel cell hybrid electric vehi-
eration of the following car, aiming at maintaining a reasonable cle based on deep reinforcement learning,” Energy, vol. 238, Jan. 2022,
Art. no. 121593.
following distance. Finally, the power of the engine and the [7] M. F. M. Sabri, K. A. Danapalasingam, and M. F. Rahmat, “A review on
gear ratio of the CVT are synchronously controlled by LB hybrid electric vehicles architecture and energy management strategies,”
EMS. After completing off-line training, an online PIL exper- Renew. Sustain. Energy Rev., vol. 53, pp. 1433–1442, Jan. 2016.
[8] F. Zhang, X. Hu, R. Langari, and D. Cao, “Energy management
iment is performed. The results on the edge computing device strategies of connected HEVs and PHEVs: Recent progress and outlook,”
NVIDIA Jetson AGX Xavier show that the vision-based recog- Prog. Energy Combustion Sci., vol. 73, pp. 235–256, Jul. 2019.
nition and distance measurements have considerable accuracy, [9] Y. Luo, T. Chen, S. Zhang, and K. Li, “Intelligent hybrid electric
vehicle ACC with coordinated control of tracking ability, fuel economy,
which can guide the DQN-based car-following strategy to and ride comfort,” IEEE Trans. Intell. Transp. Syst., vol. 16, no. 4,
maintain the real-time distance within a safe range. Compared pp. 2303–2308, Aug. 2015.
with the other seven types of EMSs, the Vision&DRL-based [10] X. Tang, Z. Zhang, and Y. Qin, “On-road object detection and tracking
based on radar and vision fusion: A review,” IEEE Intell. Transp. Syst.
hierarchical control strategy achieves fuel consumption of Mag., early access, Aug. 4, 2021, doi: 10.1109/MITS.2021.3093379.
5.76 L/100 km, and the time consumed to run a driving [11] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierar-
cycle is 476.87 s, which means that, even if the environment chies for accurate object detection and semantic segmentation,” in Proc.
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 580–587.
and control tasks are more complex, a control loop can be [12] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis.
completed within 0.26 s. Furthermore, the DP-based EMS (ICCV), Dec. 2015, pp. 1440–1448.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
2514 IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, VOL. 8, NO. 2, JUNE 2022
[13] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards [36] Y. Wang, H. Tan, Y. Wu, and J. Peng, “Hybrid electric vehicle
real-time object detection with region proposal networks,” IEEE Trans. energy management with computer vision and deep reinforcement
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017. learning,” IEEE Trans. Ind. Informat., vol. 17, no. 6, pp. 3857–3868,
[14] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look Jun. 2021.
once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. [37] W. Li et al., “Cloud-based health-conscious energy management
Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788. of hybrid battery systems in electric vehicles with deep
[15] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in reinforcement learning,” Appl. Energy, vol. 293, Jul. 2021,
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017, Art. no. 116977.
pp. 6517–6525. [38] A. Lajunen and T. Lipman, “Lifecycle cost assessment and carbon
[16] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” dioxide emissions of diesel, natural gas, hybrid electric, fuel cell
2018, arXiv:1804.02767. hybrid and electric transit buses,” Energy, vol. 106, pp. 329–342,
[17] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed, “SSD: Single Jul. 2016.
shot multibox detector,” in Proc. Eur. Conf. Comput. Vis. (ECCV), [39] J. Chen, H. Shu, X. Tang, T. Liu, and W. Wang, “Deep rein-
Oct. 2016, pp. 21–37. forcement learning-based multi-objective control of hybrid power
system combined with road recognition under time-varying envi-
[18] D.-D. Tran, M. Vafaeipour, M. El Baghdadi, R. Barrero, J. Van Mierlo,
ronment,” Energy, vol. 239, Jan. 2022, Art. no. 122123, doi:
and O. Hegazy, “Thorough state-of-the-art analysis of electric and
10.1016/j.energy.2021.122123.
hybrid vehicle powertrains: Topologies and integrated energy manage-
[40] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
ment strategies,” Renew. Sustain. Energy Rev., vol. 119, Mar. 2020,
image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
Art. no. 109596.
(CVPR), Jun. 2016, pp. 770–778.
[19] J. Peng, H. He, and R. Xiong, “Rule based energy management
[41] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,
strategy for a series–parallel plug-in hybrid electric bus optimized
“Deep reinforcement learning: A brief survey,” IEEE Signal Process.
by dynamic programming,” Appl. Energy, vol. 185, pp. 1633–1643,
Mag., vol. 34, no. 6, pp. 26–38, Nov. 2017.
Jan. 2017.
[42] H. Tan, H. Zhang, J. Peng, Z. Jiang, and Y. Wu, “Energy management
[20] S. B. Xie, X. Hu, S. Qi, X. Tang, K. Lang, Z. Xin, and J. Brighton, of hybrid electric bus based on deep reinforcement learning in con-
“Model predictive energy management for plug-in hybrid electric vehi- tinuous state and action space,” Energy Convers. Manage., vol. 195,
cles considering optimal battery depth of discharge,” Energy, vol. 173, pp. 548–560, Sep. 2019.
pp. 667–678, Apr. 2019. [43] B. Hu and J. Li, “An edge computing framework for powertrain control
[21] X. Tang, T. Jia, X. Hu, Y. Huang, Z. Deng, and H. Pu, “Naturalistic system optimization of intelligent and connected vehicles based on
data-driven predictive energy management for plug-in hybrid electric curiosity-driven deep reinforcement learning,” IEEE Trans. Ind. Elec-
vehicles,” IEEE Trans. Transport. Electrific., vol. 7, no. 2, pp. 497–508, tron., vol. 68, no. 8, pp. 7652–7661, Aug. 2021.
Jun. 2021. [44] S. Moon and K. Yi, “Human driving data-based design of a vehicle
[22] X. Hu, T. Liu, X. Qi, and M. Barth, “Reinforcement learning for hybrid adaptive cruise control algorithm,” Veh. Syst. Dyn., vol. 46, no. 8,
and plug-in hybrid electric vehicle energy management: Recent advances pp. 661–690, 2008.
and prospects,” IEEE Ind. Electron. Mag., vol. 13, no. 3, pp. 16–25, [45] J. Jeon, S. Jung, E. Lee, D. Choi, and H. Myung, “Run your visual-
Sep. 2019. inertial odometry on NVIDIA Jetson: Benchmark tests on a micro aerial
[23] D. Silver et al., “Mastering the game of go with deep neural networks vehicle,” IEEE Robot. Automat. Lett., vol. 6, no. 3, pp. 5332–5339,
and tree search,” Nature, vol. 529, pp. 484–489, Jan. 2016. Jul. 2021.
[24] D. Silver et al., “Mastering the game of go without human knowledge,” [46] R. Zou, L. Fan, Y. Dong, S. Zheng, and C. Hu, “DQL energy
Nature, vol. 550, no. 7676, pp. 354–359, 2017. management: An online-updated algorithm and its application in
[25] O. Vinyals et al., “Grandmaster level in StarCraft II using multi-agent fix-line hybrid electric vehicle,” Energy, vol. 225, Jun. 2021,
reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019. Art. no. 120174.
[26] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
pp. 436–444, Feb. 2015.
[27] V. Mnih et al., “Human-level control through deep reinforcement learn-
ing,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[28] X. Tang, J. Chen, H. Pu, T. Liu, and A. Khajepour, “Double deep
reinforcement learning-based energy management for a parallel hybrid
electric vehicle with engine start-stop strategy,” IEEE Trans. Transport.
Electrific., early access, Jul. 30, 2021, doi: 10.1109/TTE.2021.3101470.
[29] X. Tang, J. Chen, T. Liu, Y. Qin, and D. Cao, “Distributed deep
reinforcement learning-based energy and emission management strategy
for hybrid electric vehicles,” IEEE Trans. Veh. Technol., vol. 70, no. 10,
pp. 9922–9934, Oct. 2021.
[30] J. Wu, Z. Wei, W. Li, Y. Wang, Y. Li, and D. U. Sauer, “Battery thermal-
Xiaolin Tang (Member, IEEE) received the B.S.
and health-constrained energy management for hybrid electric bus based degree in mechanics engineering and the M.S.
on soft actor-critic DRL algorithm,” IEEE Trans. Ind. Informat., vol. 17,
degree in vehicle engineering from Chongqing
no. 6, pp. 3751–3761, Jun. 2021.
University, Chongqing, China, in 2006 and 2009,
[31] Y. Wu, H. Tan, J. Peng, H. Zhang, and H. He, “Deep reinforcement respectively, and the Ph.D. degree in mechanical
learning of energy management with continuous control strategy and engineering from Shanghai Jiao Tong University,
traffic information for a series-parallel plug-in hybrid electric bus,” Appl. Shanghai, China, in 2015.
Energy, vol. 247, pp. 454–466, Aug. 2019. From August 2017 to August 2018, he was a
[32] X. Han, H. He, J. Wu, J. Peng, and Y. Li, “Energy management based on Visiting Professor with the Department of Mechan-
reinforcement learning with double deep Q-learning for a hybrid electric ical and Mechatronics Engineering, University of
tracked vehicle,” Appl. Energy, vol. 254, Nov. 2019, Art. no. 113708. Waterloo, Waterloo, ON, Canada. He is currently an
[33] T. Liu, X. Hu, S. E. Li, and D. Cao, “Reinforcement learning optimized Associate Professor with the College of Mechanical and Vehicle Engineering,
look-ahead energy management of a parallel hybrid electric vehi- Chongqing University. He has led and has been involved in more than ten
cle,” IEEE/ASME Trans. Mechatronics, vol. 22, no. 4, pp. 1497–1507, research projects, such as the National Natural Science Foundation of China.
Aug. 2017. He has published more than 50 articles. His research interests include hybrid
[34] Y. Li, H. He, A. Khajepour, H. Wang, and J. Peng, “Energy management electric vehicles, vehicle dynamics, and transmission control.
for a power-split hybrid electric bus via deep reinforcement learning with Dr. Tang is also a Committee Member of the Technical Committee on
terrain information,” Appl. Energy, vol. 255, Dec. 2019, Art. no. 113762. Vehicle Control and Intelligence of the Chinese Association of Automation
[35] R. Lian, H. Tan, J. Peng, Q. Li, and Y. Wu, “Cross-type transfer (CAA). He was a recipient of several prestigious awards/honors, including
for deep reinforcement learning based hybrid electric vehicle energy the Bayu Scholar and First Prize of Chongqing Natural Science. He is also
management,” IEEE Trans. Veh. Technol., vol. 69, no. 8, pp. 8367–8380, an Editor of International Journal of Vehicle Performance and Journal of
Aug. 2020. Chongqing University of Technology.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.
TANG et al.: VISUAL DETECTION AND DRL-BASED CAR FOLLOWING AND ENERGY MANAGEMENT FOR HEVs 2515
Jiaxin Chen received the B.S. degree in vehicle Teng Liu (Member, IEEE) received the B.S. degree
engineering from the Xi’an University of Technol- in mathematics and the Ph.D. degree in automotive
ogy, Xi’an, China, in 2019. He is currently pursuing engineering from the Beijing Institute of Technology
the M.S. degree in automotive engineering with (BIT), Beijing, China, in 2011 and 2017, respec-
Chongqing University, Chongqing, China. tively. His Ph.D. dissertation, under the supervision
His current research interests include energy man- of Prof. Fengchun Sun, was entitled Reinforcement
agement and reinforcement learning. Learning-Based Energy Management for Hybrid
Electric Vehicles.
He was a Research Fellow with Vehicle Intel-
ligence Pioneers Ltd., Qingdao, Shandong, China,
from 2017 to 2018. He was a Post-Doctoral Fellow
with the Department of Mechanical and Mechatronics Engineering, University
of Waterloo, Waterloo, ON, Canada, from 2018 to 2020. He is currently
a Professor with the Department of Automotive Engineering, Chongqing
University, Chongqing, China. He has more than eight years of research
and working experience in renewable vehicles and connected autonomous
vehicles. His current research focuses on reinforcement learning (RL)-based
energy management in hybrid electric vehicles, RL-based decision making
for autonomous vehicles, and cyber-physical-social systems (CPSS)-based
parallel driving. He has published over 40 SCI papers and 15 conference
papers in these areas.
Dr. Liu is also a member of IEEE Vehicular Technology Society (VTS),
IEEE Intelligent Transportation Systems (ITS), IEEE Industrial Electronics
Society (IES), IEEE Transportation Electrification Community (TEC), and
IEEE/Chinese Association of Automation (CAA). He received the Merit
Student of Beijing in 2011, the Teli Xu Scholarship (Highest Honor) of BIT
Kai Yang received the B.E. degree in vehicle engi- in 2015, the “Top 10” in the 2018 IEEE VTS Motor Vehicle Challenge, and
neering from the Wuhan University of Technology, the Sole Outstanding Winner in the 2018 ABB Intelligent Technology Com-
Wuhan, China, in 2018. He is currently pursuing the petition. He is the Workshop Co-Chair of the 2018 IEEE Intelligent Vehicles
Ph.D. degree with the Vehicle Engineering Depart- Symposium (IV 2018). He has been a reviewer in multiple SCI journals,
ment, Chongqing University, Chongqing, China. selectively including IEEE T RANSACTIONS ON I NDUSTRIAL E LECTRONICS ,
His current research interests include vehicle IEEE T RANSACTIONS ON I NTELLIGENT V EHICLES , IEEE T RANSACTIONS
dynamics and autonomous vehicles. ON I NTELLIGENT T RANSPORTATION S YSTEMS , IEEE T RANSACTIONS ON
S YSTEMS , M AN , AND C YBERNETICS : S YSTEMS , IEEE T RANSACTIONS ON
I NDUSTRIAL I NFORMATICS , and Advances in Mechanical Engineering.
Authorized licensed use limited to: Univ of Science and Tech Beijing. Downloaded on December 13,2022 at 07:14:14 UTC from IEEE Xplore. Restrictions apply.