Professional Documents
Culture Documents
1 s2.0 S0360544223015128 Main
1 s2.0 S0360544223015128 Main
1 s2.0 S0360544223015128 Main
Chao Yang, Zhexi Lu, Weida Wang, Muyao Wang, Jing Zhao
PII: S0360-5442(23)01512-8
DOI: https://doi.org/10.1016/j.energy.2023.128118
Reference: EGY 128118
Please cite this article as: Yang C, Lu Z, Wang W, Wang M, Zhao J, An efficient intelligent energy
management strategy based on deep reinforcement learning for hybrid electric flying car, Energy (2023),
doi: https://doi.org/10.1016/j.energy.2023.128118.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
Zhexi Lu: Methodology, Validation, Investigation, Software, Writing - Review & Editing.
Weida Wang: Methodology, Validation, Investigation, Supervision, Writing - Review & Editing.
f
r oo
-p
re
lP
na
ur
Jo
An Efficient Intelligent Energy Management Strategy Based on Deep
Chao Yanga,b, Zhexi Lua,b, Weida Wang,a,b,*, Muyao Wanga, Jing Zhaoa,b
Abstract
f
oo
Hybrid electric flying cars hold clear potential to support high mobility and environmentally friendly
r
-p
transportation. For hybrid electric flying cars, overall performance and efficiency highly depend on the
re
coordination of the electrical and fuel systems under ground and air dual-mode. However, the huge differences
lP
in the scale and fluctuation characteristics of energy demand between ground driving and air flight modes make
na
the efficient control of energy flow more complex. Thus, designing a power coordinated control strategy for
ur
hybrid electric flying cars is a challenging technical problem. This paper proposed a deep reinforcement
Jo
learning-based energy management strategy (EMS) for a series hybrid electric flying car. A mathematical model
of the series hybrid electric flying car driven by the distributed hybrid electric propulsion system (HEPS) which
mainly consists of battery packs, twin turboshaft engine and generator sets (TGSs), 16 rotor-motors, and 4
wheel-motors is established. Subsequently, a Double Deep Q Network (DDQN)-based EMS considering ground
and air dual driving mode is proposed. A simplified method for the number of control variables is designed to
improve exploration efficiency and accelerate the convergence speed. In addition, the frequent engine on/off
problem is also taken into account. Finally, DDQN-based and dynamic programming (DP)-based EMSs are
applied to investigate the power flow distribution for two completely different hypothetical driving scenarios,
namely search and rescue (SAR) scenarios and urban air mobility (UAM) scenarios. The results demonstrate the
effectiveness of the DDQN-based EMS and its capacity of reducing the computation time.
Keywords: Flying cars; Hybrid electric propulsion system; Energy management strategy; Double Deep Q
1. Introduction
Flying cars have received increasing attention in recent years and are becoming a reality. Flying cars can
run like a normal car on the road and fly like aircraft in near-ground space[1]. Hence, they are adaptive to harsh
f
oo
driving environments such as broken bridges, congested roads, hills, and cliffs[2]. This will effectively reduce
r
-p
the traffic jam and infrastructure maintenance costs, and start a new urban air mobility (UAM) mode[3]. Due to
re
the demand for high flexibility and mobility, vertical-take-off-and-land (VTOL) vehicles have become more
lP
popular in recent years. Many companies such as Joby Aviation, Volocopter and Lithium have developed some
na
prototypes to explore new technologies of electric VTOL vehicles (eVTOLs). However, the current energy
density of batteries still limits the application of eVTOLs to heavy load and long endurance scenarios. Thus,
ur
hybrid electric fly cars combining the advantages of different energy sources, such as Terrafugia TF-X[4], are
Jo
more applicable for heavy applications or the mission designed for sustained climbing and long cruising at
current stage. For hybrid electric fly cars, the combination of both ground and air driving units with a shared
power supply constitutes the integrated hybrid electric propulsion system (HEPS) [5]. The huge difference in
energy demand characteristics between ground driving mode and air flight mode makes it more complex to
govern the energy flow between multi-energy sources. The improper power distribution will lead to more energy
consumption. Thus, an efficient energy management strategy (EMS) considering ground and air dual driving
mode plays a necessary role in realizing the efficient use of energy and fulfilling the performance of hybrid
ground vehicles include optimization-based and learning-based methods. Dynamic programming (DP) and
equivalent consumption minimization strategy (ECMS) are the representatives of global optimization and
instantaneous optimization respectively. The former can make globally optimal control decision, and is usually
used as the benchmark to evaluate the performance of other methods [7]. But the prior knowledge of driving
cycles is required and cannot be applied to real-time controllers. The latter is easy to implement in real-time, but
the control can only work well under conditions similar to the driving cycle used to calibrate the ECMS [8].
f
oo
EMSs of HEPSs for ground vehicles can also be seen in the aviation field, but the coupled propulsion-
r
-p
aerodynamics effects[9],[10],[11] flight safety[12], and the power demand characteristics in climb and cruise
re
phase [13] are further considered in EMSs of HEPSs for aircraft. Learning-based methods have been widely
lP
studied in the HEV energy management field in recent years as a possible solution to derive an instantaneous
EMS for online application purposes and guarantee the adaptability of the EMS [14],[15]. Liu and Zou et al.
na
[16],[17] designed a Q-learning based EMS, and verified the adaptability, optimality, and the capacity of
ur
reducing the computation time of Q-learning algorithm. However, the Q-learning algorithm is known for its
Jo
“curse of dimensionality” when handling high dimensional problems[18]. Sun et al. [19] used deep
reinforcement learning (DRL) to manage the high-dimensional state and action spaces, in which Q-tables were
replaced by long short-term memory (LSTM) network. Tang et al. [20] proved that DRL control strategies based
on Deep Q Network (DQN), asynchronous superior actor critic (A3C), and distributed proximal policy
optimization (DPPO) can all achieve near-optimal fuel economy and excellent computational efficiency.
Learning-based methods may provide a solution for the efficient control of the power flow of hybrid electric
flying vehicles.
Although the above methods are of great reference significance, the optimal control of HEPS in flying car
is still a new and basically blank application area that poses a number of distinct challenges. First, there are
significant differences in the scale and fluctuation characteristics of demand power between ground driving and
air flight modes. The demand power changes frequently in the ground driving phase due to the influence of
traffic conditions and terrain conditions, but it is much higher and changes less in the air flight phase. In hybrid
electric flying cars, the demand power is provided by both the turboshaft engine and generator sets (TGS) and
the battery pack. If the power supply cannot quickly and accurately respond to such a drastic change in power
demand, it will lead to the imbalance between the power demand side and the supply side, resulting in bus
f
oo
voltage imbalance, load current overload, power output interruption and other faults. Second, the flying car is
r
-p
equipped with high-power batteries to meet the power demand in the flight phase, resulting in only slight
re
changes in SOC during the ground phase at each step. To ensure the accuracy of the control strategy, SOC, as a
lP
state variable, needs to be dispersed into high dimensions, leading to heavy calculation burden. Finally, the
hybrid electric flying car studied in this paper uses the turboshaft engine as the power source. The constant
na
speed characteristics of the rotor and the frequent start-stop problem of the turboshaft engine need to be
ur
considered [21]. Although Wei et al. [22] have carried out the research on the EMS for flying car with turboshaft
Jo
engine, the EMS considering the ground and air dual-mode driving cycles needs to be further discussed.
Motivated by the aforementioned distinct challenges in the energy management for hybrid electric flying
cars and considering the above advantages of the learning-based method, this paper proposes a DRL-based EMS
for a hybrid electric flying car with a propulsion system consisting of twin TGSs, battery packs, and electric
motors in a series configuration. The prominent contributions are given as follows: i) The mathematical model
of a series hybrid electric flying car with a HEPS consisting of battery packs, twin TGSs, 16 rotor-motors, and 4
wheel-motors is established. ii) The power split between the two TGSs and battery packs considering ground
and air dual driving mode is formulated as the optimal control problem. iii) A Double Deep Q Network
(DDQN)-based EMS considering the frequent on/off of the turboshaft engine is designed and a simplified
method for the number of control variables is proposed to improve exploration efficiency and accelerate the
convergence speed. Two completely different hypothetical driving scenarios, namely search and rescue (SAR)
scenarios and urban air mobility (UAM) scenarios are applied to demonstrate the effectiveness of the proposed
EMS.
The rest of the paper is organized as follows. In Section 2, a series HEPS of the flying car is modeled and
the optimal control problem of energy management is stated. In Section 3, the principle of DDQN algorithm is
f
oo
introduced and applied to obtain a DDQN-based EMS of the flying car. In Section 4, the DDQN-based and DP-
r
-p
based optimal control results are compared. Finally, some conclusions are summarized in Section 5.
re
2. Flying car powertrain modeling and the problem formulation
lP
na
The analysis performed in this paper is based on the power request of a hybrid electric flying car used in
various roles, such as UAM, SAR, and military duties. It allows ground-level driving, vertical take-off and
ur
landing, and near-ground flight. The series distributed HEPS considered in the present work is shown in Fig. 1,
Jo
in which twin TGSs and the battery pack are connected by a high-voltage electrical bus.
Motor a
Air Driving Unit
Turboshaft
Engine 2 TGS 1 Wheel Motor*4 Gearbox
The turboshaft engine mechanically drives a permanent magnet synchronous generator, which provides
electrical power to the high-voltage bus through the inverter. The battery is connected to the high voltage bus
through a dual active bridge bi-directional DC/DC converter. Turboshaft engines are widely used in
vertical/short takeoff and landing aircraft. Its rotors generally operate at a constant rotate speed during the load
phase.
The propulsion load includes all components from the inverter to the propeller/ wheel of the flying car.
AC/DC inverters receive electricity from high-voltage bus and invert direct current to three-phase alternating
current to drive rotor motors or wheel motors. Eight pairs of coaxial rotors are utilized to generate the lift force
f
oo
in flight mode and four wheel-motors are used to drive in ground mode. The proposed power system can work
r
in five different modes: -p
re
Full electric drive mode: the power request of the rotor/wheel motors is satisfied by battery packs
lP
Hybrid drive mode: battery packs and only one TGS are used to move the rotor/wheel motors.
na
Power assist mode: twin TGSs and battery packs together move the rotor/wheel motors. This mode
ur
Recharge mode: the TGS gives more power than is required by the rotor/wheel motors and the
excess power is used to charge the battery. Only one TGS is used for charging.
Regenerative braking mode: Wheel-motors work as generators to charge battery packs. This mode
The power supply state of battery packs and TGSs under the above five modes is shown in Table 1, and the
elementary parameters of the flying cars model are listed in Table 2. The thrust coefficient, rotation resistance
moment coefficient, and paddle radius are all provided by the manufacturer of the paddles, and the other
f
oo
Table 2. Basic parameters for the hybrid electric flying car model
r
Vehicle mass m (kg)
9.81
Air/Ground
Air/Ground
re
Rolling resistance coefficient fg 0.018 Ground
lP
According to the ground-driving longitudinal vehicle dynamics, the demand torque at each wheel can be
represented as
dv
Tw = 0.25( Mgf r cos + Mg sin + 0.5CD Af v 2 + M ) rw (1)
dt
where M is the vehicle mass, g is the gravity acceleration, fr is rolling resistance coefficient, θ is the slope of the
road, CD is the air drag coefficient, Af is the frontal area, ρα is the air density, v is the vehicular longitudinal
speed, δ is the conversion ratio of vehicle rolling mass, and rw is the wheel radius. When the flying car is
running on the ground, the demand torque is provided by four wheel motors and mechanical brakes, which is
expressed as
f
oo
Tw = Trans
sign (Tw ) ground
Tm + TBrk (2)
r
-p
where ηTrans is the transmission efficiency, Tmground is the wheel motor torque, TBrk is the torque of the mechanical
re
brake. To achieve the desired speed of the wheel motor, a proportional-integral driving model is used to control
lP
The calculation of the driving force required by the flying car in flight phase includes the force demand
Jo
calculation in the vertical climb/landing phase and the force demand calculation in the cruise phase. According
to flying aerodynamics and force balance, the force demand Fv in the vertical climb/landing phase consists of
the gravity force G, acceleration resistance Fa and drag force Fd [23], which is derived as
dvv
Fv = G + ( Fa + Fd ) sign(vv ) = Mg + ( M + 0.5 Av CD vv2 ) sign(vv ) (3)
dt
1, climb
sign(vv ) = (4)
−1 land
The force demand Fc in the cruise phase, i.e., Eq. (5), is also derived by force balance.
dvh
Fc = Fh2 + Fv2 = M + 0.5 Af CD vh2 + Fv2 (5)
dt
where Fh is horizontal force demand, vh is the horizontal flying speed. To facilitate the calculation, the change of
instantaneous attitude angle is not considered in this study, which does not affect the steady-state performance
The force Ft provided by a single rotor and its rotation resistance moment Tr is calculated as
where CT and CQ are thrust coefficient and rotation resistance moment coefficient, respectively. R is the rotor
f
oo
radius, ωr is the rotor rotational speed.
r
2.3 Power balance
-p
re
lP
The demand power of electric motors to propel the vehicle can be obtained as
where nmground and nmair are rotational speed of wheel motors and rotor motors, Tmair is the rotor motor torque,
ur
ηm_ground and ηm_air are the efficiency of the wheel motor and the rotor motor, respectively. PTGS1 and PTGS2 are the
Jo
output power of the two TGSs respectively, which are equal to generator output power in TGSs, and Pb is the
In this paper, the experimental modeling method is adopted. The engine starting test and load test were
carried out. In the working mode of the turboshaft engine, the electronic controller ECU controls the electronic
fuel pump to change the fuel supply, and maintains the engine speed equal to 6089rpm constant throughout the
working envelope. The ECU uses the proportional/integral (PID) mode to control the engine rotational speed.
According to the comparison between the measured speed and the reference value, if the speed is low, increase
the fuel quantity and increase the speed. If the speed is high, reduce the fuel quantity and reduce
(a) The rotational speed-fuel flow curve in the start phase. (b) The torque-fuel flow curve in the load phase.
f
oo
Fig.2 Fuel flow curves of the turboshaft engine
r
-p
the speed, so as to maintain the engine dynamic constant at all times. The relationship between engine speed and
re
fuel flow during the starting process obtained from the experiment is shown in Fig. 2 (a). The relationship
lP
between engine torque and fuel flow under load conditions obtained from the experiment is shown in Fig.2 (b).
na
The total fuel flow fT of two TGSs at each moment can be calculated as
(9)
3600 3600
Jo
where f1 and f2 represent the fuel flow of TGS1 and TGS2 respectively. nturb1, nturb2, Tturb1 and Tturb2 are the speed
In the TGS, the turboshaft engine shaft is rigidly connected with the generator shaft, so the rotation speed
of the two is equal. The turboshaft engine and the generator speed and the output power of the TGS can be
derived as follows:
nturb = ngen
dngen
Tturb − Tgen = ( J eng + J gen ) (10)
30 dt
Tgen ngen
Pgen = PTGS = gen
9550
where ngen and Tgen are the speed and torque of the generator, Jturb and Jgen are the rotational inertia of turboshaft
engine and generator, respectively. Pgen is the generator power, that is, the output power of TGS PTGS, and ηgen is
f
oo
Fig. 3 Efficiency map of the generator of TGS.
r
2.5 Modeling of the rotor motor and wheel motor
-p
re
lP
In the HEPS of the flying car, there are wheel motors used to drive wheels or regenerate energy while
braking and rotor motors used to drive propellers. The motors used at wheels and rotors in flying cars are
na
different, so their power demand, torque and speed need to be calculated separately. Efficiency maps of the rotor
ur
Fig. 4 Efficiency maps of the rotor motor and the wheel motor.
2.6 Modeling of the battery
The power battery pack is the main energy supply and storage device of the hybrid electric flying car, and
the estimate of the state of charge (SOC) is essential for the formulation of EMSs. The equivalent circuit model
considering the battery cell equivalent internal resistance Rint_cell and the open circuit voltage Voc_cell is employed
here to describe the charging/discharging characteristics of the power battery pack, which can be expressed as
f
Voc _ cell − Voc2 _ cell − 4 Rint_ cell Pb _ cell
oo
I batt = (11)
2 Rint_ cell
I batt dt
r
SOC = SOC0 − Q
-p batt _ cell
re
where Vb_cell, Ibatt, Pb_cell, and Qbatt_cell are the terminal voltage, current, output power , and capacity of the battery
lP
The main parameters of the whole battery pack, capacity Qbatt and resistance Rb, are calculated as
na
nsRint_ cell
Rb = (13)
Jo
np
where ns is the number of series-connected battery cells, np is the number of parallel-connected battery cell
stings. The basic parameters of the battery cell are listed in Table 3.
Parameter Value
The primary purpose of the EMS is to find a power control strategy to minimize fuel consumption in the
drive cycle while satisfying the performance constraints of the system. The energy flow of the propulsion
system of the hybrid electric flying vehicle is more complex. In this study, it can be described as finding a
reasonable way to split the power between TGS1, TGS2 and battery pack to minimize the total energy
f
oo
consumption cost while ensuring the driving and flight power supply. The energy flow of the propulsion system
r
of the studied hybrid electric flying car is shown in Fig. 5. Note that the HEPS for the flying car studied in this
-p
paper is powered by twin turboshaft engines. On the one hand, the frequent engine on/off affects the working
re
lP
life of the engine. On the other hand, the air starting of turboshaft engine is a complex non-equilibrium and
nonlinear air aerothermodynamics process [24], which is very sensitive to the changes of atmospheric conditions
na
and the interference of the intake flow field. Turbulence during air starting will cause unstable operation of the
ur
compressor and flow mismatch, resulting in imbalance of the fuel air ratio during the start process, which will
Jo
eventually lead to failure of the engine air starting. Thus, the additional cost of turboshaft engine on/off needs to
Propulsion Load
Rotor Motor1
Power Sources Rotor Motor2 Rotor
Battery Packs
…
Electrical
Power Rotor Motor16
Turboshaft Generator 1
Engine 1 Distribution
Center Wheel Motor1
Turboshaft Generator 2
Engine 2
Wheel Motor2 Wheel
…
The driving cycle of hybrid electric flying cars includes ground driving and air flight. The huge differences
in the scale and fluctuation characteristics of demand power between the two driving mode makes the optimal
control decisions significantly different. The learning-based method can find the optimal control strategy
through the trial-and-error learning between the agent and the environment, providing a solution for the energy
management of hybrid electric flying cars. Thus, a DDQN-based EMS is proposed in this paper. The good
f
oo
representation ability of the neural network in the DDQN algorithm makes the high-dimensional problem of
state variables mentioned in the introduction can be well solved. Furthermore, the generalization ability of the
r
-p
neural network can make the DDQN-based EMS applicable to variable driving cycles. The EMS framework of
re
hybrid electric flying car based on DDQN is shown in Fig. 6. The driving data is used as the input, and the
lP
experience replay selects the output power of the battery pack as a control action from the current state with ε-
na
greedy policy, executes it in the environment, and gets back a reward and the next state to gather a sample of
ur
training data. Then, the random batch of training data is input to both networks. Q network predicts Q-value and
Jo
target network predicts target Q-value. We compute the mean squared error loss using the difference between
the target Q-value and the predicted Q-value. The loss is backpropagated and gradient descent is used to update
the weights of the Q network. The target network is not trained and remains fixed. Q network weights are copied
Current state St
Training driving data
s 1 , a1 , r 1 , s 2 combination search
Experience
s2, a2, r2, s3
replay Control action at
…
memory Powertrain model
st, at, rt, st+1 ε-greedy policy
Update of weights
Compute loss
Q network
Copy at a specific
frequency
f
oo
Fig. 6 EMS framework of hybrid electric flying car based on DDQN
r
-p
The energy management of hybrid electric flying car can be regarded as a Markov decision process (MDP).
re
In a finite MDP, the sets of states S, actions A, and rewards R all have a finite number of elements. The random
lP
variables rt R and st S at each time step t have well defined discrete probability distributions dependent
only on the preceding state and action[25]. Here, we set the state variables as the demand power Pdem and the
na
SOC, expressed as st S ={(Pdem (t ), SOC (t )} . According to the power balance Eq. (8), two of the battery power
ur
Pb, the output power of TGS1 PTGS1 and the output power of TGS2 PTGS2 should be selected as control variables.
Jo
However, it is worth noting that the battery and the engine rated power of the flying car are both high. For
reinforcement learning, the corresponding action space is also quite large, resulting in low exploration efficiency,
and it takes a long time to converge to the optimal. In fact, given the total power required to be provided by the
twin TGSs, the power required to be provided by each TGS to minimize fuel consumption can be determined.
Inspired by this, we calculate the optimal power distribution mode between the two TGSs under determined
total power required to be provided by the twin TGSs in advance, as shown in Fig. 7, and save it in a table.
When the power demand and the battery power are determined, the power provided by each TGS will be known.
Thus, the control variable can be simplified as only batter power Pb.
Fig. 7 The optimal power distribution mode between the two TGSs.
f
oo
It can be seen from Fig. 7 that when the total power required to be provided by the twin TGSs is lower than
250kW, only TGS2 is used as the power source. When the total power required to be provided by twin TGSs is
r
-p
higher than 250kW, the TGS1 starts and operates at an output power higher than 85kW. Considering the
re
characteristics of the components, the following physical constraints should be applied in the model.
lP
Taking the fuel consumption and the turboshaft engine on/off into account, the reward function is written as
na
Eq. (14).
ur
1 PTGS1 0
onengine1 = (15)
0 otherwise
1 PTGS 2 0
onengine 2 = (16)
0 otherwise
where S1 and S2 represent the unit price of fuel and electricity respectively, which are set to 7.20 CNY/L and
0.65 CNY/kW∙h. αT1 and αT2 are the additional cost added into the cost function to punish engine on/off.
Turboshaft engine 1 and turboshaft engine 2 are punished by αT1 and αT2 respectively each time they are ignited.
α is a weighting factor greater than 1. SOCp and Pdem_p are preset thresholds equal to 0.35 and 200, respectively.
This means that if the SOC is below 0.35 while flying, the reward is worse and the cost of power consumption
increases. The purpose of this setting is to use the range of SOC from 0.3 to 0.35 as the buffer zone. Once the
SOC during flight is lower than 0.35, reduce the battery power output as much as possible, and prevent the SOC
from approaching the lower limit during flight, resulting in insufficient flight power, and hence causing
dangerous.
f
T Tgen Tgenmax
oo
genmin
T ground T ground T ground
mmin m mmax
r
max
-p
The goal of energy management is to find the optimal control policy π* to maximize the expected return Gt:
re
T −1
Gt = rt +1 + tt + 2 + 2 rt +3 + = t rt +1 (18)
lP
t =0
Here, policy π is a mapping from states to probabilities of selecting each possible action. π (a|s) represents the
na
probability that at = a if st = s at time t. T is the total step, and γ [0, 1] is the discount factor.
ur
To evaluate the expected return of the policy π, the action value (i.e., Q-value) function Qπ (s, a) for policy
Jo
π is defined, as the expected return starting from s, taking the action a, and thereafter following policy π:
T −1
Q (s, a) [Gt st = s, at = a] = [ t rt +1 st = s, at = a] (19)
t =0
As mentioned above, there is always at least one policy that is better than or equal to all other policies, which is
called the optimal policy π*. The optimal policy π* corresponds to the optimal Q-value function, denoted Q* (s,
For the value-based reinforcement learning method, the key idea is how to calculate the value function for
policy 𝜋. For example, to improve the learning efficiency, temporal difference is introduced in Q-learning to
calculate and update Q-value functions. Temporal difference learning is the main learning method of
reinforcement learning. It updates the Q-value function in each iteration as follows:
where α is the learning rate, s ' is the next state, a ' is the next action.
Generally, epsilon-greedy algorithm is used to explore the action space. For the current state s, select a
random action a with probability ɛ. Otherwise, select the action for which the Q-value function is greatest.
During the iteration, the ɛ value gradually decays, and finally the optimal control strategy can be obtained.
As we mentioned in the introduction, the SOC changes only slightly during the ground phase at each step
f
oo
for the flying car. To ensure the accuracy of the control strategy, the state space is high-dimensional, leading to
r
-p
heavy calculation burden. To solve this problem, DRL algorithm is introduced in this paper. DQN algorithm is a
re
value-based DRL algorithm, which can deal with continuous observation space problems. The DQN architecture
lP
has two neural nets, the Q network Q(s, a; ) and the target network Qt (s, a;t ) , and a component called
Experience Replay. The Q network is the agent that is trained to produce the optimal state-action value.
na
Experience Replay interacts with the environment to generate data to train the Q network. The target network is
ur
identical to the Q network. To deal with the calculation of value functions in high-dimensional state and action
Jo
Q( s, a1 ; ) Q ( s, a1 )
(22)
Q( s, am ; ) Q (s, am )
where is the parameters of the DNN, including the weight w and bias b of all neurons. If the action is a finite
discrete M action a1, ..., am, then Q network outputs M dimension vectors, where the m dimension represents the
The updating equation (21) used by Q-learning is correspondingly rewritten as (23) in DQN, and the Q
network parameters are updated with minibatch stochastic gradient descent algorithm. The gradient is calculated
as (24)
2
L( ) = r + max Qt ( s ', a ';t ) − Q( s, a; ) (23)
Q Network
a'
Target Network
L( ) = (
a'
t )
r + max Q ( s ', a '; ') − Q( s, a; ) Q( s, a; )
(24)
where is the Q network parameter, t is the target network parameter, Q(s, a; ) is the predicted value obtained
by the Q network, r + max Qt (s ', a ';t ) − Q(s, a; ) is the temporal difference error, and r + max Qt (s ', a ';t )
a' a'
f
oo
Freezing target network technology and experience replay technology are fundamental to the DQN
r
-p
algorithm. The Q network’s weights get updated at each time step. Using only a single Q network will cause
re
synchronous fluctuations of the predicted value and the target value in the update process [26]. Thus, the target
lP
value is unstable, making the training unstable. To obtain more stable training, the target network that doesn’t
get trained is introduced. The Q network is updated in one step, and the target network is updated in multiple
na
steps. In this way, the target values remain stable for a short period. After a pre-configured number of time-steps,
ur
the parameters of the Q network are copied over to the target network. Experience replay technology refers to
Jo
building an experience pool to remove data correlation. A large number of samples in the form of Eq. (25) are
stored in the experience pool. During training, a batch of samples are randomly selected from the experience
pool, and the Q network parameters are updated using Eq. (24). In this way, the similarity with adjacent training
However, it has been proved that the overestimation of action values is common in the standard DQN
algorithm[27]. Thus, the DDQN algorithm is applied to solve the optimal control problem of the energy
management for flying cars in this paper. The max operator in standard DQN, in Eq. (23), uses the same values
both to select and to evaluate an action. This makes it more likely to select overestimated values, resulting in
overoptimistic value estimates. To prevent this, the selection from the evaluation is decoupled in DDQN as
follow
That is, the selection of the action, in the argmax, is due to the Q network, and the target network is used to
f
oo
4. Results and discussion
r
-p
This section reports the validation of the DDQN algorithm using two completely different hypothetical
re
driving scenarios, which are later referred to as test case 1 and test case 2. DP-based and DDQN-based EMSs
lP
are applied to investigate the power flow distribution mode under ground and air dual driving mode. The state
na
variable of DP is discretized, and the state variables of DDQN are the continuous state variables. The total cost
ur
is used to evaluate the developed strategy, which includes the cost of fuel and electricity consumption.
Jo
The test case 1 is a SAR mission profiles in a hilly area with a total duration of 1535s. It includes ground
driving, take-off/climbing, cruising, landing phases, and ground rescue process, and the power demand along the
SAR mission is evaluated with a time-step of 0.02s, see Fig. 8 (a). The altitude and speed profiles that define the
missions are shown in Fig. 8 (b). The flying car starts 4.3 km from the operation site, see Fig. 8 (c), flies over a
hill to reach the rescue site, and returns to the base after performing a 10-minute rescue mission. The ground
driving speed is higher when leaving for the rescue site and when returning to the base after completing the
rescue task. During the rescue mission, the vehicle speed is relatively low, and braking and stopping
phenomenon is more frequent, as shown in Fig. 8 (b).
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d e b a bc d e b a
f
oo
(a) Power requirement for SAR mission
r
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d e b -p a bc d e b a
Velocity
Altitude
re
lP
na
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d e b a bc d e b a
Jo
Fig.8 Power requirement, altitude and speed profiles, and distance profile for SAR mission
The test case 2 is a UAM mission profiles with a total duration of 1920s. It also includes ground driving,
take-off/climbing, cruising, landing phases, and ground rescue process, and the power demand along the UAM
mission is evaluated with a time-step of 0.02s, see Fig. 9 (a). However, it is significantly different from Test case
1 in that the continuous flight time in Test case 2 is longer. The altitude and speed profiles that define the
missions are shown in Fig. 9 (b). The flying car starts from the starting point, takes off after encountering the
congested road section, and flies at low altitude at a speed of 85 km/h at a flying altitude of 50 m. After flying
over the congested road, the flying car lands and continues to drive until it reaches the destination.
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d eb a
f
r oo
-p
(a) Power requirement for UAM mission
re
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d eb a
lP
Velocity
Altitude
na
ur
Jo
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d eb a
Fig.9 Power requirement, altitude and speed profiles, and distance profile for UAM mission
4.2 Test case 1
To evaluate the effectiveness and the convergence of the overall training, the SAR driving scenario is used.
Fig. 10 shows the convergence of the EMS based on off-line training DDQN. The reward is poor at the
beginning of the training process and significantly increases with the number of episodes. After about 40
episodes, the rewards tend to converge. In this test case, the DP and DDQN are studied and compared to
investigate the power flow distribution mode with multiple takeoff and short-term flight conditions. The initial
f
oo
state of SOC is set as 60%, and the maximum and minimum values of SOC are 80% and 30% respectively. Fig.
11 shows the SOC curves of the DP and DDQN algorithms. The SOC fluctuations of both algorithms are
r
-p
between the upper and lower limits, and there is a significant decrease during the two flights. In the first flight,
re
the SOC of both algorithms decreases by about 0.12, the SOC of the DP decreases first fast and then slowly, and
lP
the SOC of the DDQN decreases approximately linearly. In the second flight, DP consumes more electric energy
na
to achieve lower fuel consumption, so the SOC decreases by about 0.13, while DDQN consumes more fuel and
ur
SOC decreases by about 0.11, and hence the final SOC of DDQN is higher than DP.
Jo
The fuel economy of flying cars is influenced by both generator and engine efficiency. Fig. 12 shows the
working points of the generator and turboshaft engine. It can be seen that the generator working points of
DDQN are more dispersed than DP. The working points of turboshaft engine 1 of the two algorithms are all
located in the high efficiency area, and several working points of turboshaft engine 2 are located in the low
efficiency area. Fig. 13 illustrates the power split of the two algorithms. It can be found that in the DP algorithm,
the engines generally output higher power than DDQN and work shorter time than DDQN. The engine is shut
down at the end of the flight, and the battery follows the power demand. In the DDQN algorithm, the twin
turboshaft engines remain switched on and operate at relatively lower power during both flights.
As can be seen from Fig. 12 (a), the working point of generator 1 in DDQN algorithm is more located in
the high efficiency zone, but the fuel economy of DDQN is lower than that of DP algorithm. The reason is that
in the DP algorithm, the optimality of some generator working points is sacrificed in exchange for shorter
working time. In addition, because the cost of engine on/off is considered in the reward, there is no phenomenon
f
oo
Fig. 10 The convergence of the EMS based on off-line training DDQN
r
-p
re
lP
Buffer zone
na
(a) Generators working points in DDQN and DP (b) Turboshaft engines working points in DDQN and DP
Fig.12 Generators and turboshaft engines working points in DDQN and DP for test case 1
(a) Battey power in DDQN and DP for test case 1
f
r oo
-p
(b) TGS1 power in DDQN and DP for test case 1
re
lP
na
ur
Fig. 13 Power split results and in DDQN and DP for test case 1
Detailed results are listed in Table 4, and Table 5 shows the computation time of the two algorithms. Taking
the DP algorithm as a benchmark, the total cost based on DDQN is 11.20% higher than that of the benchmark,
but the computation time is 63.01% shorter than that of the benchmark. Thus, the power distribution strategy
based on DDQN algorithm generates a total cost close to that of the DP algorithm but shorter computation time.
Method Final SOC Fuel consumption(kg) Electricity consumption(kW∙h) Cost (CNY) Cost gap (%)
DP 3934 -
The UAM flight mission is implemented in this test case, which is assumed to be previously unknown.
f
oo
Different from test case 1, this test case focuses on investigating the power flow distribution mode under
r
-p
continuous long flight conditions. Because if the SOC reaches the lower limit prematurely under continuous
re
long flight conditions, the output power of the battery will be limited, which will lead to the failure to follow the
lP
power demand. Once the power provided by TGSs and battery packs fails to meet the power demand during the
flight phase, flight instability may occur, resulting in unsafe flight. Similarly, the initial state of SOC is set as
na
60%, the maximum and minimum values of SOC are 80% and 30% respectively.
ur
Fig. 14 shows the SOC curves of the DP and DDQN algorithms. The SOC fluctuations of both algorithms
Jo
are between the upper and lower limits. During flight, the SOC of the DP decreases by about 0.23, and the SOC
of the DDQN decreases by about 0.25. The final SOC of DDQN is lower than DP. Due to the existence of SOC
buffer zone, DDQN algorithm is able to meet the high-power demand in flight phase under continuous long
flight conditions.
Buffer zone
Fig. 14 The SOC profiles power in DDQN and DP for test case 2
(a) Generators working points in DDQN and DP (b) Turboshaft engines working points in DDQN and DP
Fig.15 Generators and turboshaft engines working points in DDQN and DP for test case 2
f
r oo
-p
re
(a) Battey power in DDQN and DP for test case 2
lP
na
ur
Jo
Fig. 15 shows the working points of the generator and turboshaft engine. It can be seen that the generator
working points of DDQN are also more dispersed than DP in test case 2. The working points of turboshaft
engines of the two algorithms are all located in the high efficiency area. The results of power flow distribution
are shown in Fig. 16. As in test case 1, in DP algorithm, the engine generally outputs higher power than DDQN,
and its working time is shorter than DDQN. At the end of the flight, the engine is switched off and the battery is
used to follow the power demand. In the DDQN algorithm, the turboshaft engine works for a longer time to
prevent the SOC from dropping too quickly and ensure the power supply at the end of flight.
Detailed results are listed in Table 6, and Table 7 shows the computation time of the two algorithms. The
f
oo
total cost based on DDQN is 1.81% higher than that of the benchmark, and the computation time is 69.66%
r
-p
shorter than that of the benchmark. the power distribution strategy based on DDQN algorithm generates a total
re
cost close to that of the DP algorithm but shorter computation time. The optimality and the rapidity of DDQN
lP
Method Final SOC Fuel consumption(kg) Electricity consumption(kW∙h) Cost (CNY) Cost gap (%)
ur
DP 4795 -
5. Conclusion
The optimal EMS of a series hybrid electric flying car considering ground and air dual-mode driving cycles
is derived by applying DDQN algorithm. Simulation studies are conducted to verify the effectiveness of the
DDQN-based EMS using two completely different hypothetical driving scenarios. The results based on SAR
and UAM driving cycles prove that the designed EMS can achieve the total cost close to that of DP, and greatly
reduce the computation time. The results of both driving scenarios show that TGSs do not work in the ground
running mode under the constraint of turboshaft engine on/off, which will help to reduce the noise of running on
the ground. In the flight phase, both TGSs work to ensure the power supply during flight. The DP algorithm
makes the TGS work with higher power by sacrificing the optimality of the generator operating point. Thus, one
TGS can be turned off at the end of the flight and the battery is used to provide the remaining power, and hence
f
oo
shortening the operating time of the TGS and reducing fuel consumption. The DDQN algorithm makes the TGS
r
-p
work with lower power and the working point of the generator better. The operating time of the TGS is longer,
re
and it keeps opening during the whole flight phase. Although the fuel consumption is higher than that of DP, it is
lP
helpful to stabilize the output power of the TGS during flight and avoid the drastic change of battery discharge
power.
na
ur
Acknowledgements
Jo
This work was supported by the National Natural Science Foundation of China (Grant No.51975048,
52275047). The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
Reference
[1] Rajashekara K, Wang Q, Matsuse K. Flying cars: Challenges and propulsion strategies[J]. IEEE Electrification
[3] Postorino M N, Sarné G M L. Reinventing mobility paradigms: Flying car scenarios and challenges for urban
[4] A. Biradar, P. DeBitetto, L. Phan, L. Duang, S. Sarma, Hybrid-electric powered aerospace systems and the battery
energy density revolution, in: 2018 IEEE Aerospace Conference, 2018, pp. 1–6
[5] Wang W, Chen Y, Yang C, et al. An enhanced hypotrochoid spiral optimization algorithm based intertwined optimal
f
oo
sizing and control strategy of a hybrid electric air-ground vehicle[J]. Energy, 2022, 257: 124749.
r
[6] -p
Yang C, Lu Z, Wang W, et al. Energy management of hybrid electric propulsion system: recent progress and a flying
re
car perspective under three-dimensional transportation networks[J]. Green Energy and Intelligent Transportation, 2022:
lP
100061.
[7] Feng Y, Dong Z. Optimal energy management with balanced fuel economy and battery life for large hybrid electric
na
[8] Enang W, Bannister C. Robust proportional ECMS control of a parallel hybrid electric vehicle[J]. Proceedings of the
Jo
Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 2017, 231(1): 99-119.
[9] Zhang J, Roumeliotis I, Zolotas A. Model-based fully coupled propulsion-aerodynamics optimization for hybrid
[10] Doff-Sotta M, Cannon M, Bacic M. Optimal energy management for hybrid electric aircraft[J]. IFAC-PapersOnLine,
[11] Donateo T, Ficarella A. Designing a hybrid electric powertrain for an unmanned aircraft with a commercial
[12] Xie Y, Savvaris A, Tsourdos A. Fuzzy logic based equivalent consumption optimization of a hybrid electric propulsion
system for unmanned aerial vehicles[J]. Aerospace Science and Technology, 2019, 85: 13-23.
[13] Pornet C, Gologan C, Vratny P C, et al. Methodology for sizing and performance assessment of hybrid energy
[14] Hu X, Liu T, Qi X, et al. Reinforcement learning for hybrid and plug-in hybrid electric vehicle energy management:
Recent advances and prospects[J]. IEEE Industrial Electronics Magazine, 2019, 13(3): 16-25.
[15] Zhang Z, Zhang T, Hong J, et al. Double deep Q-network guided energy management strategy of a novel electric-
f
oo
[16] Liu T, Zou Y, Liu D, et al. Reinforcement learning of adaptive energy management with transition probability for a
r
-p
hybrid electric tracked vehicle[J]. IEEE Transactions on Industrial Electronics, 2015, 62(12): 7837-7846.
re
[17] Zou Y, Liu T, Liu D, et al. Reinforcement learning-based real-time energy management for a hybrid tracked vehicle[J].
lP
[18] Zou R, Fan L, Dong Y, et al. DQL energy management: An online-updated algorithm and its application in fix-line
na
[19] Sun M, Zhao P, Lin X. Power management in hybrid electric vehicles using deep recurrent reinforcement learning[J].
Jo
[20] Tang X, Chen J, Liu T, et al. Distributed deep reinforcement learning-based energy and emission management strategy
for hybrid electric vehicles[J]. IEEE Transactions on Vehicular Technology, 2021, 70(10): 9922-9934.
[21] Donateo T, De Pascalis C L, Strafella L, et al. Off-line and on-line optimization of the energy management strategy in a
Hybrid Electric Helicopter for urban air-mobility[J]. Aerospace Science and Technology, 2021, 113: 106677.
[22] Wei Z, Ma Y, Xiang C, et al. Power Prediction-Based Model Predictive Control for Energy Management in Land and
[23] Donateo T, Carlà A, Avanzini G. Fuel consumption of rotorcrafts and potentiality for hybrid electric power systems[J].
Energy Conversion and Management, 2018, 164: 429-442.
[24] Sheng H, Chen Q, Li J, et al. Research on dynamic modeling and performance analysis of helicopter turboshaft
engine's start-up process[J]. Aerospace Science and Technology, 2020, 106: 106097.
[25] Sutton RS, Barto AG. Reinforcement learning: an introduction. Second edition, Cambridge, Massachusetts: The MIT
[26] Tang X, Chen J, Pu H, et al. Double Deep Reinforcement Learning-Based Energy Management for a Parallel Hybrid
Electric Vehicle With Engine Start–Stop Strategy[J]. IEEE Transactions on Transportation Electrification, 2021, 8(1):
f
oo
1376-1388.
r
-p
[27] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI
re
conference on artificial intelligence. 2016, 30(1).
lP
na
ur
Jo
Highlights
⚫ The search and rescue scenario and urban air mobility scenario are applied.
f
r oo
-p
re
lP
na
ur
Jo
1
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
☐ The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
of
ro
-p
re
lP
na
ur
Jo