1 s2.0 S0360544223015128 Main

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Journal Pre-proof

An efficient intelligent energy management strategy based on deep reinforcement


learning for hybrid electric flying car

Chao Yang, Zhexi Lu, Weida Wang, Muyao Wang, Jing Zhao

PII: S0360-5442(23)01512-8
DOI: https://doi.org/10.1016/j.energy.2023.128118
Reference: EGY 128118

To appear in: Energy

Received Date: 13 March 2023


Revised Date: 10 May 2023
Accepted Date: 12 June 2023

Please cite this article as: Yang C, Lu Z, Wang W, Wang M, Zhao J, An efficient intelligent energy
management strategy based on deep reinforcement learning for hybrid electric flying car, Energy (2023),
doi: https://doi.org/10.1016/j.energy.2023.128118.

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2023 Published by Elsevier Ltd.


Author contributions,

Chao Yang: Conceptualization, Methodology, Software, Investigation, Writing - Original Draft.

Zhexi Lu: Methodology, Validation, Investigation, Software, Writing - Review & Editing.

Weida Wang: Methodology, Validation, Investigation, Supervision, Writing - Review & Editing.

Muyao Wang: Investigation, Resources, Writing - Review & Editing.

Jing Zhao: Investigation, Resources, Writing - Review & Editing.

f
r oo
-p
re
lP
na
ur
Jo
An Efficient Intelligent Energy Management Strategy Based on Deep

Reinforcement Learning for Hybrid Electric Flying Car

Chao Yanga,b, Zhexi Lua,b, Weida Wang,a,b,*, Muyao Wanga, Jing Zhaoa,b

a School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100084, China;

b Chongqing Innovation Center, Beijing Institute of Technology, Chongqing 401122, China;

Corresponding Author, email: wangwd0430@163.com

Abstract

f
oo
Hybrid electric flying cars hold clear potential to support high mobility and environmentally friendly

r
-p
transportation. For hybrid electric flying cars, overall performance and efficiency highly depend on the
re
coordination of the electrical and fuel systems under ground and air dual-mode. However, the huge differences
lP

in the scale and fluctuation characteristics of energy demand between ground driving and air flight modes make
na

the efficient control of energy flow more complex. Thus, designing a power coordinated control strategy for
ur

hybrid electric flying cars is a challenging technical problem. This paper proposed a deep reinforcement
Jo

learning-based energy management strategy (EMS) for a series hybrid electric flying car. A mathematical model

of the series hybrid electric flying car driven by the distributed hybrid electric propulsion system (HEPS) which

mainly consists of battery packs, twin turboshaft engine and generator sets (TGSs), 16 rotor-motors, and 4

wheel-motors is established. Subsequently, a Double Deep Q Network (DDQN)-based EMS considering ground

and air dual driving mode is proposed. A simplified method for the number of control variables is designed to

improve exploration efficiency and accelerate the convergence speed. In addition, the frequent engine on/off

problem is also taken into account. Finally, DDQN-based and dynamic programming (DP)-based EMSs are

applied to investigate the power flow distribution for two completely different hypothetical driving scenarios,

namely search and rescue (SAR) scenarios and urban air mobility (UAM) scenarios. The results demonstrate the
effectiveness of the DDQN-based EMS and its capacity of reducing the computation time.

Keywords: Flying cars; Hybrid electric propulsion system; Energy management strategy; Double Deep Q

Network; Ground and air dual driving mode;

1. Introduction

Flying cars have received increasing attention in recent years and are becoming a reality. Flying cars can

run like a normal car on the road and fly like aircraft in near-ground space[1]. Hence, they are adaptive to harsh

f
oo
driving environments such as broken bridges, congested roads, hills, and cliffs[2]. This will effectively reduce

r
-p
the traffic jam and infrastructure maintenance costs, and start a new urban air mobility (UAM) mode[3]. Due to
re
the demand for high flexibility and mobility, vertical-take-off-and-land (VTOL) vehicles have become more
lP

popular in recent years. Many companies such as Joby Aviation, Volocopter and Lithium have developed some
na

prototypes to explore new technologies of electric VTOL vehicles (eVTOLs). However, the current energy

density of batteries still limits the application of eVTOLs to heavy load and long endurance scenarios. Thus,
ur

hybrid electric fly cars combining the advantages of different energy sources, such as Terrafugia TF-X[4], are
Jo

more applicable for heavy applications or the mission designed for sustained climbing and long cruising at

current stage. For hybrid electric fly cars, the combination of both ground and air driving units with a shared

power supply constitutes the integrated hybrid electric propulsion system (HEPS) [5]. The huge difference in

energy demand characteristics between ground driving mode and air flight mode makes it more complex to

govern the energy flow between multi-energy sources. The improper power distribution will lead to more energy

consumption. Thus, an efficient energy management strategy (EMS) considering ground and air dual driving

mode plays a necessary role in realizing the efficient use of energy and fulfilling the performance of hybrid

electric flying cars.


The research on EMSs for HEPSs has been widely carried out [6]. Current advanced EMSs of HEPSs for

ground vehicles include optimization-based and learning-based methods. Dynamic programming (DP) and

equivalent consumption minimization strategy (ECMS) are the representatives of global optimization and

instantaneous optimization respectively. The former can make globally optimal control decision, and is usually

used as the benchmark to evaluate the performance of other methods [7]. But the prior knowledge of driving

cycles is required and cannot be applied to real-time controllers. The latter is easy to implement in real-time, but

the control can only work well under conditions similar to the driving cycle used to calibrate the ECMS [8].

f
oo
EMSs of HEPSs for ground vehicles can also be seen in the aviation field, but the coupled propulsion-

r
-p
aerodynamics effects[9],[10],[11] flight safety[12], and the power demand characteristics in climb and cruise
re
phase [13] are further considered in EMSs of HEPSs for aircraft. Learning-based methods have been widely
lP

studied in the HEV energy management field in recent years as a possible solution to derive an instantaneous

EMS for online application purposes and guarantee the adaptability of the EMS [14],[15]. Liu and Zou et al.
na

[16],[17] designed a Q-learning based EMS, and verified the adaptability, optimality, and the capacity of
ur

reducing the computation time of Q-learning algorithm. However, the Q-learning algorithm is known for its
Jo

“curse of dimensionality” when handling high dimensional problems[18]. Sun et al. [19] used deep

reinforcement learning (DRL) to manage the high-dimensional state and action spaces, in which Q-tables were

replaced by long short-term memory (LSTM) network. Tang et al. [20] proved that DRL control strategies based

on Deep Q Network (DQN), asynchronous superior actor critic (A3C), and distributed proximal policy

optimization (DPPO) can all achieve near-optimal fuel economy and excellent computational efficiency.

Learning-based methods may provide a solution for the efficient control of the power flow of hybrid electric

flying vehicles.

Although the above methods are of great reference significance, the optimal control of HEPS in flying car
is still a new and basically blank application area that poses a number of distinct challenges. First, there are

significant differences in the scale and fluctuation characteristics of demand power between ground driving and

air flight modes. The demand power changes frequently in the ground driving phase due to the influence of

traffic conditions and terrain conditions, but it is much higher and changes less in the air flight phase. In hybrid

electric flying cars, the demand power is provided by both the turboshaft engine and generator sets (TGS) and

the battery pack. If the power supply cannot quickly and accurately respond to such a drastic change in power

demand, it will lead to the imbalance between the power demand side and the supply side, resulting in bus

f
oo
voltage imbalance, load current overload, power output interruption and other faults. Second, the flying car is

r
-p
equipped with high-power batteries to meet the power demand in the flight phase, resulting in only slight
re
changes in SOC during the ground phase at each step. To ensure the accuracy of the control strategy, SOC, as a
lP

state variable, needs to be dispersed into high dimensions, leading to heavy calculation burden. Finally, the

hybrid electric flying car studied in this paper uses the turboshaft engine as the power source. The constant
na

speed characteristics of the rotor and the frequent start-stop problem of the turboshaft engine need to be
ur

considered [21]. Although Wei et al. [22] have carried out the research on the EMS for flying car with turboshaft
Jo

engine, the EMS considering the ground and air dual-mode driving cycles needs to be further discussed.

Motivated by the aforementioned distinct challenges in the energy management for hybrid electric flying

cars and considering the above advantages of the learning-based method, this paper proposes a DRL-based EMS

for a hybrid electric flying car with a propulsion system consisting of twin TGSs, battery packs, and electric

motors in a series configuration. The prominent contributions are given as follows: i) The mathematical model

of a series hybrid electric flying car with a HEPS consisting of battery packs, twin TGSs, 16 rotor-motors, and 4

wheel-motors is established. ii) The power split between the two TGSs and battery packs considering ground

and air dual driving mode is formulated as the optimal control problem. iii) A Double Deep Q Network
(DDQN)-based EMS considering the frequent on/off of the turboshaft engine is designed and a simplified

method for the number of control variables is proposed to improve exploration efficiency and accelerate the

convergence speed. Two completely different hypothetical driving scenarios, namely search and rescue (SAR)

scenarios and urban air mobility (UAM) scenarios are applied to demonstrate the effectiveness of the proposed

EMS.

The rest of the paper is organized as follows. In Section 2, a series HEPS of the flying car is modeled and

the optimal control problem of energy management is stated. In Section 3, the principle of DDQN algorithm is

f
oo
introduced and applied to obtain a DDQN-based EMS of the flying car. In Section 4, the DDQN-based and DP-

r
-p
based optimal control results are compared. Finally, some conclusions are summarized in Section 5.
re
2. Flying car powertrain modeling and the problem formulation
lP
na

The analysis performed in this paper is based on the power request of a hybrid electric flying car used in

various roles, such as UAM, SAR, and military duties. It allows ground-level driving, vertical take-off and
ur

landing, and near-ground flight. The series distributed HEPS considered in the present work is shown in Fig. 1,
Jo

in which twin TGSs and the battery pack are connected by a high-voltage electrical bus.

Motor a
Air Driving Unit

Turboshaft Rotor Motor*16 Gearbox


Engine 1
AC Motor b
AC DC
Generator 1 Air flight mode
DC
DC Battery Packs
TGS 2 BUS np× ns
DC
……
AC
Generator 2 DC AC
DC

Turboshaft
Engine 2 TGS 1 Wheel Motor*4 Gearbox

Mechanical connection Ground Driving Unit


Ground driving mode
Electric power transmission

Fig. 1 Series hybrid electric propulsion system

The turboshaft engine mechanically drives a permanent magnet synchronous generator, which provides
electrical power to the high-voltage bus through the inverter. The battery is connected to the high voltage bus

through a dual active bridge bi-directional DC/DC converter. Turboshaft engines are widely used in

vertical/short takeoff and landing aircraft. Its rotors generally operate at a constant rotate speed during the load

phase.

The propulsion load includes all components from the inverter to the propeller/ wheel of the flying car.

AC/DC inverters receive electricity from high-voltage bus and invert direct current to three-phase alternating

current to drive rotor motors or wheel motors. Eight pairs of coaxial rotors are utilized to generate the lift force

f
oo
in flight mode and four wheel-motors are used to drive in ground mode. The proposed power system can work

r
in five different modes: -p
re
 Full electric drive mode: the power request of the rotor/wheel motors is satisfied by battery packs
lP

only (the TGSs all turned off).

 Hybrid drive mode: battery packs and only one TGS are used to move the rotor/wheel motors.
na

 Power assist mode: twin TGSs and battery packs together move the rotor/wheel motors. This mode
ur

generally appears during flight to meet high power requirements.


Jo

 Recharge mode: the TGS gives more power than is required by the rotor/wheel motors and the

excess power is used to charge the battery. Only one TGS is used for charging.

 Regenerative braking mode: Wheel-motors work as generators to charge battery packs. This mode

only appears in ground driving.

The power supply state of battery packs and TGSs under the above five modes is shown in Table 1, and the

elementary parameters of the flying cars model are listed in Table 2. The thrust coefficient, rotation resistance

moment coefficient, and paddle radius are all provided by the manufacturer of the paddles, and the other

parameters are provided by the manufacturer of the vehicle.


Table 1. Power supply modes of the flying car

Modes Battery TGS1 TGS2

Full electric drive mode Discharge Not output Not output

Hybrid drive mode Discharge Not output Output

Power assist mode Discharge Output Output

Recharge mode Charge Not output Output

Regenerative braking mode Charge Not output Not output

f
oo
Table 2. Basic parameters for the hybrid electric flying car model

Parameter Value Remark

r
Vehicle mass m (kg)

Gravity acceleration g (m/s2)


-p 2000

9.81
Air/Ground

Air/Ground
re
Rolling resistance coefficient fg 0.018 Ground
lP

Drag coefficient CD 0.51 Air/Ground

Frontal area Aa (m2) 7.80 Ground


na

Wheel radius r (m) 0.39 Ground


ur

Thrust coefficient CT 0.026 Air


Jo

Vertical area while climbing/landing Au (m2) 32.00 Air

Frontal area during level flight Af (m2) 12.40 Air

Rotation resistance moment coefficient CQ 0.0026 Air

Paddle radius R (m) 0.90 Air

Rotor-motor power PR (kW) 50.00 Air

Wheel-motor power PW (kW) 60.00 Ground

2.1 Ground-driving longitudinal dynamics

According to the ground-driving longitudinal vehicle dynamics, the demand torque at each wheel can be
represented as

dv
Tw = 0.25( Mgf r cos  + Mg sin  + 0.5CD Af  v 2 +  M )  rw (1)
dt

where M is the vehicle mass, g is the gravity acceleration, fr is rolling resistance coefficient, θ is the slope of the

road, CD is the air drag coefficient, Af is the frontal area, ρα is the air density, v is the vehicular longitudinal

speed, δ is the conversion ratio of vehicle rolling mass, and rw is the wheel radius. When the flying car is

running on the ground, the demand torque is provided by four wheel motors and mechanical brakes, which is

expressed as

f
oo
Tw = Trans
sign (Tw ) ground
Tm + TBrk (2)

r
-p
where ηTrans is the transmission efficiency, Tmground is the wheel motor torque, TBrk is the torque of the mechanical
re
brake. To achieve the desired speed of the wheel motor, a proportional-integral driving model is used to control
lP

the output torque of the four-wheel motors.


na

2.2 Flying point-mass dynamic


ur

The calculation of the driving force required by the flying car in flight phase includes the force demand
Jo

calculation in the vertical climb/landing phase and the force demand calculation in the cruise phase. According

to flying aerodynamics and force balance, the force demand Fv in the vertical climb/landing phase consists of

the gravity force G, acceleration resistance Fa and drag force Fd [23], which is derived as

dvv
Fv = G + ( Fa + Fd )  sign(vv ) = Mg + ( M + 0.5 Av CD  vv2 )  sign(vv ) (3)
dt
1, climb
sign(vv ) =  (4)
−1 land

where vv is the vertical climb/landing speed and Av is the vertical area.

The force demand Fc in the cruise phase, i.e., Eq. (5), is also derived by force balance.

dvh
Fc = Fh2 + Fv2 =  M + 0.5 Af CD  vh2 + Fv2 (5)
dt
where Fh is horizontal force demand, vh is the horizontal flying speed. To facilitate the calculation, the change of

instantaneous attitude angle is not considered in this study, which does not affect the steady-state performance

analysis of the flying car.

The force Ft provided by a single rotor and its rotation resistance moment Tr is calculated as

Ft = 0.5CT  ( R 2 )(r R) 2 (6)

Tr = 0.5CQ  ( R2 )(r R)2 R (7)

where CT and CQ are thrust coefficient and rotation resistance moment coefficient, respectively. R is the rotor

f
oo
radius, ωr is the rotor rotational speed.

r
2.3 Power balance
-p
re
lP

The demand power of electric motors to propel the vehicle can be obtained as

Pdem = (4Tmground nmgroundm− sign


_ ground + 16Tm nm m _ air
(Tm ) air air − sign (Tr )
) / 9550 = PTGS1 + PTGS 2 + Pb (8)
na

where nmground and nmair are rotational speed of wheel motors and rotor motors, Tmair is the rotor motor torque,
ur

ηm_ground and ηm_air are the efficiency of the wheel motor and the rotor motor, respectively. PTGS1 and PTGS2 are the
Jo

output power of the two TGSs respectively, which are equal to generator output power in TGSs, and Pb is the

battery pack power.

2.4 Modeling of the turboshaft engine and generator

In this paper, the experimental modeling method is adopted. The engine starting test and load test were

carried out. In the working mode of the turboshaft engine, the electronic controller ECU controls the electronic

fuel pump to change the fuel supply, and maintains the engine speed equal to 6089rpm constant throughout the

working envelope. The ECU uses the proportional/integral (PID) mode to control the engine rotational speed.

According to the comparison between the measured speed and the reference value, if the speed is low, increase
the fuel quantity and increase the speed. If the speed is high, reduce the fuel quantity and reduce

(a) The rotational speed-fuel flow curve in the start phase. (b) The torque-fuel flow curve in the load phase.

f
oo
Fig.2 Fuel flow curves of the turboshaft engine

r
-p
the speed, so as to maintain the engine dynamic constant at all times. The relationship between engine speed and
re
fuel flow during the starting process obtained from the experiment is shown in Fig. 2 (a). The relationship
lP

between engine torque and fuel flow under load conditions obtained from the experiment is shown in Fig.2 (b).
na

The total fuel flow fT of two TGSs at each moment can be calculated as

f1 (nTurb1 , TTurb1 ) f 2 (nTurb 2 , TTurb 2 )


fT = +
ur

(9)
3600 3600
Jo

where f1 and f2 represent the fuel flow of TGS1 and TGS2 respectively. nturb1, nturb2, Tturb1 and Tturb2 are the speed

and torque of the turboshaft engine 1 and turboshaft engine 2.

In the TGS, the turboshaft engine shaft is rigidly connected with the generator shaft, so the rotation speed

of the two is equal. The turboshaft engine and the generator speed and the output power of the TGS can be

derived as follows:


 nturb = ngen

  dngen
Tturb − Tgen = ( J eng + J gen ) (10)
 30 dt
 Tgen ngen
 Pgen = PTGS =  gen
 9550

where ngen and Tgen are the speed and torque of the generator, Jturb and Jgen are the rotational inertia of turboshaft
engine and generator, respectively. Pgen is the generator power, that is, the output power of TGS PTGS, and ηgen is

the generator efficiency, as shown in Fig. 3.

f
oo
Fig. 3 Efficiency map of the generator of TGS.

r
2.5 Modeling of the rotor motor and wheel motor
-p
re
lP

In the HEPS of the flying car, there are wheel motors used to drive wheels or regenerate energy while

braking and rotor motors used to drive propellers. The motors used at wheels and rotors in flying cars are
na

different, so their power demand, torque and speed need to be calculated separately. Efficiency maps of the rotor
ur

motor and the wheel motor are shown in Fig. 4.


Jo

(a) The wheel motor. (b) The rotor motor.

Fig. 4 Efficiency maps of the rotor motor and the wheel motor.
2.6 Modeling of the battery

The power battery pack is the main energy supply and storage device of the hybrid electric flying car, and

the estimate of the state of charge (SOC) is essential for the formulation of EMSs. The equivalent circuit model

considering the battery cell equivalent internal resistance Rint_cell and the open circuit voltage Voc_cell is employed

here to describe the charging/discharging characteristics of the power battery pack, which can be expressed as

Vb _ cell = Voc _ cell − I batt Rint_ cell



 Pb _ cell = Voc _ cell I batt − I batt Rint_ cell
2

f
 Voc _ cell − Voc2 _ cell − 4 Rint_ cell Pb _ cell

oo
 I batt = (11)
 2 Rint_ cell

  I batt dt

r
 SOC = SOC0 − Q
 -p batt _ cell
re
where Vb_cell, Ibatt, Pb_cell, and Qbatt_cell are the terminal voltage, current, output power , and capacity of the battery
lP

cell, respectively. SOC0 is the initial value of SOC.

The main parameters of the whole battery pack, capacity Qbatt and resistance Rb, are calculated as
na

Qbatt = ns n p Qbatt _ cell (12)


ur

nsRint_ cell
Rb = (13)
Jo

np

where ns is the number of series-connected battery cells, np is the number of parallel-connected battery cell

stings. The basic parameters of the battery cell are listed in Table 3.

Table 3 Basic parameters of the battery cell

Parameter Value

Nominal capacity (Ah) Qbatt_cell 3.0

Nominal voltage (V) V rb_cell 3.6


max
Max voltage (V) Ub_cell 4.2
max
Max charge current (A) Ib_cell_ch 5.0

Max discharge current (A) I max


b_cell_dis 30.0

Weight (g) mb_cell 46.6


2.7 Problem statement for the hybrid electric flying car

The primary purpose of the EMS is to find a power control strategy to minimize fuel consumption in the

drive cycle while satisfying the performance constraints of the system. The energy flow of the propulsion

system of the hybrid electric flying vehicle is more complex. In this study, it can be described as finding a

reasonable way to split the power between TGS1, TGS2 and battery pack to minimize the total energy

f
oo
consumption cost while ensuring the driving and flight power supply. The energy flow of the propulsion system

r
of the studied hybrid electric flying car is shown in Fig. 5. Note that the HEPS for the flying car studied in this
-p
paper is powered by twin turboshaft engines. On the one hand, the frequent engine on/off affects the working
re
lP

life of the engine. On the other hand, the air starting of turboshaft engine is a complex non-equilibrium and

nonlinear air aerothermodynamics process [24], which is very sensitive to the changes of atmospheric conditions
na

and the interference of the intake flow field. Turbulence during air starting will cause unstable operation of the
ur

compressor and flow mismatch, resulting in imbalance of the fuel air ratio during the start process, which will
Jo

eventually lead to failure of the engine air starting. Thus, the additional cost of turboshaft engine on/off needs to

be considered in the EMS to avoid frequent turboshaft engine on/off.

Propulsion Load
Rotor Motor1
Power Sources Rotor Motor2 Rotor
Battery Packs

Electrical
Power Rotor Motor16
Turboshaft Generator 1
Engine 1 Distribution
Center Wheel Motor1
Turboshaft Generator 2
Engine 2
Wheel Motor2 Wheel

Mechanical power flow Wheel Motor4


Electric power flow

Fig. 5 The energy flow direction of the propulsion system


3. DDQN-based EMS for hybrid electric flying car

The driving cycle of hybrid electric flying cars includes ground driving and air flight. The huge differences

in the scale and fluctuation characteristics of demand power between the two driving mode makes the optimal

control decisions significantly different. The learning-based method can find the optimal control strategy

through the trial-and-error learning between the agent and the environment, providing a solution for the energy

management of hybrid electric flying cars. Thus, a DDQN-based EMS is proposed in this paper. The good

f
oo
representation ability of the neural network in the DDQN algorithm makes the high-dimensional problem of

state variables mentioned in the introduction can be well solved. Furthermore, the generalization ability of the

r
-p
neural network can make the DDQN-based EMS applicable to variable driving cycles. The EMS framework of
re
hybrid electric flying car based on DDQN is shown in Fig. 6. The driving data is used as the input, and the
lP

experience replay selects the output power of the battery pack as a control action from the current state with ε-
na

greedy policy, executes it in the environment, and gets back a reward and the next state to gather a sample of
ur

training data. Then, the random batch of training data is input to both networks. Q network predicts Q-value and
Jo

target network predicts target Q-value. We compute the mean squared error loss using the difference between

the target Q-value and the predicted Q-value. The loss is backpropagated and gradient descent is used to update

the weights of the Q network. The target network is not trained and remains fixed. Q network weights are copied

to target network at a specific frequency.


Reward Environment Pb(t) , PTGS1(t) , PTGS2(t)

st, at, rt, st+1


Twin TGSs optimal power

Current state St
Training driving data
s 1 , a1 , r 1 , s 2 combination search
Experience
s2, a2, r2, s3
replay Control action at

memory Powertrain model
st, at, rt, st+1 ε-greedy policy

Update of weights

Mini batch of state transitions Current Q-value


s, a, r, s’

Compute loss
Q network
Copy at a specific
frequency

Legend Target Q-value


Control loop
Learning loop
Target network

f
oo
Fig. 6 EMS framework of hybrid electric flying car based on DDQN

r
-p
The energy management of hybrid electric flying car can be regarded as a Markov decision process (MDP).
re
In a finite MDP, the sets of states S, actions A, and rewards R all have a finite number of elements. The random
lP

variables rt  R and st  S at each time step t have well defined discrete probability distributions dependent

only on the preceding state and action[25]. Here, we set the state variables as the demand power Pdem and the
na

SOC, expressed as st  S ={(Pdem (t ), SOC (t )} . According to the power balance Eq. (8), two of the battery power
ur

Pb, the output power of TGS1 PTGS1 and the output power of TGS2 PTGS2 should be selected as control variables.
Jo

However, it is worth noting that the battery and the engine rated power of the flying car are both high. For

reinforcement learning, the corresponding action space is also quite large, resulting in low exploration efficiency,

and it takes a long time to converge to the optimal. In fact, given the total power required to be provided by the

twin TGSs, the power required to be provided by each TGS to minimize fuel consumption can be determined.

Inspired by this, we calculate the optimal power distribution mode between the two TGSs under determined

total power required to be provided by the twin TGSs in advance, as shown in Fig. 7, and save it in a table.

When the power demand and the battery power are determined, the power provided by each TGS will be known.

Thus, the control variable can be simplified as only batter power Pb.
Fig. 7 The optimal power distribution mode between the two TGSs.

f
oo
It can be seen from Fig. 7 that when the total power required to be provided by the twin TGSs is lower than

250kW, only TGS2 is used as the power source. When the total power required to be provided by twin TGSs is

r
-p
higher than 250kW, the TGS1 starts and operates at an output power higher than 85kW. Considering the
re
characteristics of the components, the following physical constraints should be applied in the model.
lP

Taking the fuel consumption and the turboshaft engine on/off into account, the reward function is written as
na

Eq. (14).
ur

−S1  fT (t ) − S2  Pb (t )   − onengine1  T 1 − onengine 2  T 2


 SOC (t )  SOC p  Pdem (t )  Pdem _ p
rt =  (14)
−S1  fT (t ) − S2  Pb (t ) − onengine1  T 1 − onengine 2  T 2
 otherwise
Jo

1 PTGS1  0
onengine1 =  (15)
0 otherwise
1 PTGS 2  0
onengine 2 =  (16)
0 otherwise

where S1 and S2 represent the unit price of fuel and electricity respectively, which are set to 7.20 CNY/L and

0.65 CNY/kW∙h. αT1 and αT2 are the additional cost added into the cost function to punish engine on/off.

Turboshaft engine 1 and turboshaft engine 2 are punished by αT1 and αT2 respectively each time they are ignited.

α is a weighting factor greater than 1. SOCp and Pdem_p are preset thresholds equal to 0.35 and 200, respectively.

This means that if the SOC is below 0.35 while flying, the reward is worse and the cost of power consumption

increases. The purpose of this setting is to use the range of SOC from 0.3 to 0.35 as the buffer zone. Once the
SOC during flight is lower than 0.35, reduce the battery power output as much as possible, and prevent the SOC

from approaching the lower limit during flight, resulting in insufficient flight power, and hence causing

dangerous.

 SOCmin  SOC  SOCmax



nturbmin  nturb  nturbmax

ngenmin  ngen  ngenmax
n ground  n ground  n ground
 mmin m mmax
 air
s.t. nmmin  nm  nmmax
air air
(17)

Tturbmin  Tturb  Tturbmax

f
T  Tgen  Tgenmax

oo
 genmin
T ground  T ground  T ground
 mmin m mmax

Tmair  Tmair  Tmair


 min

r
max

-p
The goal of energy management is to find the optimal control policy π* to maximize the expected return Gt:
re
T −1
Gt = rt +1 +  tt + 2 +  2 rt +3 + =   t rt +1 (18)
lP

t =0

Here, policy π is a mapping from states to probabilities of selecting each possible action. π (a|s) represents the
na

probability that at = a if st = s at time t. T is the total step, and γ  [0, 1] is the discount factor.
ur

To evaluate the expected return of the policy π, the action value (i.e., Q-value) function Qπ (s, a) for policy
Jo

π is defined, as the expected return starting from s, taking the action a, and thereafter following policy π:

T −1
Q (s, a)  [Gt st = s, at = a] =  [  t rt +1 st = s, at = a] (19)
t =0

As mentioned above, there is always at least one policy that is better than or equal to all other policies, which is

called the optimal policy π*. The optimal policy π* corresponds to the optimal Q-value function, denoted Q* (s,

a), and defined as

Q *(s, a) max Q (s, a) (20)


For the value-based reinforcement learning method, the key idea is how to calculate the value function for

policy 𝜋. For example, to improve the learning efficiency, temporal difference is introduced in Q-learning to

calculate and update Q-value functions. Temporal difference learning is the main learning method of
reinforcement learning. It updates the Q-value function in each iteration as follows:

Q(s, a)  Q(s, a) +  r +  max Q(s ', a ') − Q(s, a)  (21)


 aA 

where α is the learning rate, s ' is the next state, a ' is the next action.

Generally, epsilon-greedy algorithm is used to explore the action space. For the current state s, select a

random action a with probability ɛ. Otherwise, select the action for which the Q-value function is greatest.

During the iteration, the ɛ value gradually decays, and finally the optimal control strategy can be obtained.

As we mentioned in the introduction, the SOC changes only slightly during the ground phase at each step

f
oo
for the flying car. To ensure the accuracy of the control strategy, the state space is high-dimensional, leading to

r
-p
heavy calculation burden. To solve this problem, DRL algorithm is introduced in this paper. DQN algorithm is a
re
value-based DRL algorithm, which can deal with continuous observation space problems. The DQN architecture
lP

has two neural nets, the Q network Q(s, a; ) and the target network Qt (s, a;t ) , and a component called

Experience Replay. The Q network is the agent that is trained to produce the optimal state-action value.
na

Experience Replay interacts with the environment to generate data to train the Q network. The target network is
ur

identical to the Q network. To deal with the calculation of value functions in high-dimensional state and action
Jo

space, DQN uses the Q network to approximate the Q-value function Q ( s, a) .

Q( s, a1 ;  )  Q ( s, a1 ) 
   (22)
   
Q( s, am ; )  Q (s, am ) 

where  is the parameters of the DNN, including the weight w and bias b of all neurons. If the action is a finite

discrete M action a1, ..., am, then Q network outputs M dimension vectors, where the m dimension represents the

approximate value Q(s, am ; ) of the corresponding Q-value function Q ( s, am ) .

The updating equation (21) used by Q-learning is correspondingly rewritten as (23) in DQN, and the Q

network parameters are updated with minibatch stochastic gradient descent algorithm. The gradient is calculated
as (24)

  
2

  
L( ) =  r +  max Qt ( s ', a ';t ) − Q( s, a;  )   (23)
 Q Network  
a'

 Target Network  

 L( ) = (
 a'
t )
 r +  max Q ( s ', a ';  ') − Q( s, a;  )   Q( s, a;  ) 


(24)

where  is the Q network parameter, t is the target network parameter, Q(s, a; ) is the predicted value obtained

by the Q network, r +  max Qt (s ', a ';t ) − Q(s, a; ) is the temporal difference error, and r +  max Qt (s ', a ';t )
a' a'

is the target value Y.

f
oo
Freezing target network technology and experience replay technology are fundamental to the DQN

r
-p
algorithm. The Q network’s weights get updated at each time step. Using only a single Q network will cause
re
synchronous fluctuations of the predicted value and the target value in the update process [26]. Thus, the target
lP

value is unstable, making the training unstable. To obtain more stable training, the target network that doesn’t

get trained is introduced. The Q network is updated in one step, and the target network is updated in multiple
na

steps. In this way, the target values remain stable for a short period. After a pre-configured number of time-steps,
ur

the parameters of the Q network are copied over to the target network. Experience replay technology refers to
Jo

building an experience pool to remove data correlation. A large number of samples in the form of Eq. (25) are

stored in the experience pool. During training, a batch of samples are randomly selected from the experience

pool, and the Q network parameters are updated using Eq. (24). In this way, the similarity with adjacent training

samples is broken to avoid the model falling into local optimum.

sample = (s, a, r , s ') (25)

However, it has been proved that the overestimation of action values is common in the standard DQN

algorithm[27]. Thus, the DDQN algorithm is applied to solve the optimal control problem of the energy

management for flying cars in this paper. The max operator in standard DQN, in Eq. (23), uses the same values
both to select and to evaluate an action. This makes it more likely to select overestimated values, resulting in

overoptimistic value estimates. To prevent this, the selection from the evaluation is decoupled in DDQN as

follow

amax = arg amax


 Q( s ', a ';  )
 ' ( Double DQN ) (26)
Y = r +  Qt ( s ', amax ; t )

That is, the selection of the action, in the argmax, is due to the Q network, and the target network is used to

fairly evaluate the value of this action.

f
oo
4. Results and discussion

r
-p
This section reports the validation of the DDQN algorithm using two completely different hypothetical
re
driving scenarios, which are later referred to as test case 1 and test case 2. DP-based and DDQN-based EMSs
lP

are applied to investigate the power flow distribution mode under ground and air dual driving mode. The state
na

variable of DP is discretized, and the state variables of DDQN are the continuous state variables. The total cost
ur

is used to evaluate the developed strategy, which includes the cost of fuel and electricity consumption.
Jo

4.1 Air-ground driving missions

The test case 1 is a SAR mission profiles in a hilly area with a total duration of 1535s. It includes ground

driving, take-off/climbing, cruising, landing phases, and ground rescue process, and the power demand along the

SAR mission is evaluated with a time-step of 0.02s, see Fig. 8 (a). The altitude and speed profiles that define the

missions are shown in Fig. 8 (b). The flying car starts 4.3 km from the operation site, see Fig. 8 (c), flies over a

hill to reach the rescue site, and returns to the base after performing a 10-minute rescue mission. The ground

driving speed is higher when leaving for the rescue site and when returning to the base after completing the

rescue task. During the rescue mission, the vehicle speed is relatively low, and braking and stopping
phenomenon is more frequent, as shown in Fig. 8 (b).

a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d e b a bc d e b a

f
oo
(a) Power requirement for SAR mission

r
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d e b -p a bc d e b a
Velocity
Altitude
re
lP
na

(b) Altitude and speed profiles for SAR mission


ur

a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d e b a bc d e b a
Jo

(c) Distance profile for SAR mission

Fig.8 Power requirement, altitude and speed profiles, and distance profile for SAR mission

The test case 2 is a UAM mission profiles with a total duration of 1920s. It also includes ground driving,

take-off/climbing, cruising, landing phases, and ground rescue process, and the power demand along the UAM

mission is evaluated with a time-step of 0.02s, see Fig. 9 (a). However, it is significantly different from Test case
1 in that the continuous flight time in Test case 2 is longer. The altitude and speed profiles that define the

missions are shown in Fig. 9 (b). The flying car starts from the starting point, takes off after encountering the

congested road section, and flies at low altitude at a speed of 85 km/h at a flying altitude of 50 m. After flying

over the congested road, the flying car lands and continues to drive until it reaches the destination.

a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d eb a

f
r oo
-p
(a) Power requirement for UAM mission
re
a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d eb a
lP

Velocity
Altitude
na
ur
Jo

(b) Altitude and speed profiles for UAM mission

a: Ground drive b: Mode transform c: Vertical climb d: Cruise e: Adjust and Vertical land
a bc d eb a

(c) Distance profile for UAM mission

Fig.9 Power requirement, altitude and speed profiles, and distance profile for UAM mission
4.2 Test case 1

To evaluate the effectiveness and the convergence of the overall training, the SAR driving scenario is used.

Fig. 10 shows the convergence of the EMS based on off-line training DDQN. The reward is poor at the

beginning of the training process and significantly increases with the number of episodes. After about 40

episodes, the rewards tend to converge. In this test case, the DP and DDQN are studied and compared to

investigate the power flow distribution mode with multiple takeoff and short-term flight conditions. The initial

f
oo
state of SOC is set as 60%, and the maximum and minimum values of SOC are 80% and 30% respectively. Fig.

11 shows the SOC curves of the DP and DDQN algorithms. The SOC fluctuations of both algorithms are

r
-p
between the upper and lower limits, and there is a significant decrease during the two flights. In the first flight,
re
the SOC of both algorithms decreases by about 0.12, the SOC of the DP decreases first fast and then slowly, and
lP

the SOC of the DDQN decreases approximately linearly. In the second flight, DP consumes more electric energy
na

to achieve lower fuel consumption, so the SOC decreases by about 0.13, while DDQN consumes more fuel and
ur

SOC decreases by about 0.11, and hence the final SOC of DDQN is higher than DP.
Jo

The fuel economy of flying cars is influenced by both generator and engine efficiency. Fig. 12 shows the

working points of the generator and turboshaft engine. It can be seen that the generator working points of

DDQN are more dispersed than DP. The working points of turboshaft engine 1 of the two algorithms are all

located in the high efficiency area, and several working points of turboshaft engine 2 are located in the low

efficiency area. Fig. 13 illustrates the power split of the two algorithms. It can be found that in the DP algorithm,

the engines generally output higher power than DDQN and work shorter time than DDQN. The engine is shut

down at the end of the flight, and the battery follows the power demand. In the DDQN algorithm, the twin

turboshaft engines remain switched on and operate at relatively lower power during both flights.

As can be seen from Fig. 12 (a), the working point of generator 1 in DDQN algorithm is more located in
the high efficiency zone, but the fuel economy of DDQN is lower than that of DP algorithm. The reason is that

in the DP algorithm, the optimality of some generator working points is sacrificed in exchange for shorter

working time. In addition, because the cost of engine on/off is considered in the reward, there is no phenomenon

of frequent engine start-stop.

f
oo
Fig. 10 The convergence of the EMS based on off-line training DDQN

r
-p
re
lP

Buffer zone
na

Fig. 11 The SOC profiles in DDQN and DP for test case 1


ur
Jo

(a) Generators working points in DDQN and DP (b) Turboshaft engines working points in DDQN and DP

Fig.12 Generators and turboshaft engines working points in DDQN and DP for test case 1
(a) Battey power in DDQN and DP for test case 1

f
r oo
-p
(b) TGS1 power in DDQN and DP for test case 1
re
lP
na
ur

(c) TGS2 power in DDQN and DP for test case 1


Jo

Fig. 13 Power split results and in DDQN and DP for test case 1

Detailed results are listed in Table 4, and Table 5 shows the computation time of the two algorithms. Taking

the DP algorithm as a benchmark, the total cost based on DDQN is 11.20% higher than that of the benchmark,

but the computation time is 63.01% shorter than that of the benchmark. Thus, the power distribution strategy

based on DDQN algorithm generates a total cost close to that of the DP algorithm but shorter computation time.

Table 4. Simulation results of test case 1

Method Final SOC Fuel consumption(kg) Electricity consumption(kW∙h) Cost (CNY) Cost gap (%)

DP 0.3419 31.12 33.01 298.08 -

DDQN 0.3642 35.15 29.27 331.47 11.20%


Table 5. Computation time of DP and DDQN for test case 1

Method Time consuming (s) Relatively increase (%)

DP 3934 -

DDQN 1455 63.01%

4.3 Test case 2

The UAM flight mission is implemented in this test case, which is assumed to be previously unknown.

f
oo
Different from test case 1, this test case focuses on investigating the power flow distribution mode under

r
-p
continuous long flight conditions. Because if the SOC reaches the lower limit prematurely under continuous
re
long flight conditions, the output power of the battery will be limited, which will lead to the failure to follow the
lP

power demand. Once the power provided by TGSs and battery packs fails to meet the power demand during the

flight phase, flight instability may occur, resulting in unsafe flight. Similarly, the initial state of SOC is set as
na

60%, the maximum and minimum values of SOC are 80% and 30% respectively.
ur

Fig. 14 shows the SOC curves of the DP and DDQN algorithms. The SOC fluctuations of both algorithms
Jo

are between the upper and lower limits. During flight, the SOC of the DP decreases by about 0.23, and the SOC

of the DDQN decreases by about 0.25. The final SOC of DDQN is lower than DP. Due to the existence of SOC

buffer zone, DDQN algorithm is able to meet the high-power demand in flight phase under continuous long

flight conditions.

Buffer zone

Fig. 14 The SOC profiles power in DDQN and DP for test case 2
(a) Generators working points in DDQN and DP (b) Turboshaft engines working points in DDQN and DP

Fig.15 Generators and turboshaft engines working points in DDQN and DP for test case 2

f
r oo
-p
re
(a) Battey power in DDQN and DP for test case 2
lP
na
ur
Jo

(b) TGS1 power in DDQN and DP for test case 2

(c) TGS2 power in DDQN and DP for test case 2

Fig. 16 Power split results in DDQN and DP for test case 2

Fig. 15 shows the working points of the generator and turboshaft engine. It can be seen that the generator
working points of DDQN are also more dispersed than DP in test case 2. The working points of turboshaft

engines of the two algorithms are all located in the high efficiency area. The results of power flow distribution

are shown in Fig. 16. As in test case 1, in DP algorithm, the engine generally outputs higher power than DDQN,

and its working time is shorter than DDQN. At the end of the flight, the engine is switched off and the battery is

used to follow the power demand. In the DDQN algorithm, the turboshaft engine works for a longer time to

prevent the SOC from dropping too quickly and ensure the power supply at the end of flight.

Detailed results are listed in Table 6, and Table 7 shows the computation time of the two algorithms. The

f
oo
total cost based on DDQN is 1.81% higher than that of the benchmark, and the computation time is 69.66%

r
-p
shorter than that of the benchmark. the power distribution strategy based on DDQN algorithm generates a total
re
cost close to that of the DP algorithm but shorter computation time. The optimality and the rapidity of DDQN
lP

algorithm are further demonstrated.

Table 6. Simulation results of test case 2


na

Method Final SOC Fuel consumption(kg) Electricity consumption(kW∙h) Cost (CNY) Cost gap (%)
ur

DP 0.3336 43.17 33.18 405.30 -

DDQN 0.3134 43.83 35.44 412.64 1.81%


Jo

Table 7. Computation time of DP and DDQN for test case 2

Method Time consuming (s) Relatively increase (%)

DP 4795 -

DDQN 1455 69.66%

5. Conclusion

The optimal EMS of a series hybrid electric flying car considering ground and air dual-mode driving cycles

is derived by applying DDQN algorithm. Simulation studies are conducted to verify the effectiveness of the
DDQN-based EMS using two completely different hypothetical driving scenarios. The results based on SAR

and UAM driving cycles prove that the designed EMS can achieve the total cost close to that of DP, and greatly

reduce the computation time. The results of both driving scenarios show that TGSs do not work in the ground

running mode under the constraint of turboshaft engine on/off, which will help to reduce the noise of running on

the ground. In the flight phase, both TGSs work to ensure the power supply during flight. The DP algorithm

makes the TGS work with higher power by sacrificing the optimality of the generator operating point. Thus, one

TGS can be turned off at the end of the flight and the battery is used to provide the remaining power, and hence

f
oo
shortening the operating time of the TGS and reducing fuel consumption. The DDQN algorithm makes the TGS

r
-p
work with lower power and the working point of the generator better. The operating time of the TGS is longer,
re
and it keeps opening during the whole flight phase. Although the fuel consumption is higher than that of DP, it is
lP

helpful to stabilize the output power of the TGS during flight and avoid the drastic change of battery discharge

power.
na
ur

Acknowledgements
Jo

This work was supported by the National Natural Science Foundation of China (Grant No.51975048,

52275047). The authors declare that they have no known competing financial interests or personal relationships

that could have appeared to influence the work reported in this paper.

Reference

[1] Rajashekara K, Wang Q, Matsuse K. Flying cars: Challenges and propulsion strategies[J]. IEEE Electrification

Magazine, 2016, 4(1): 46-57.


[2] Kasliwal A, Furbush N J, Gawron J H, et al. Role of flying cars in sustainable mobility[J]. Nature communications,

2019, 10(1): 1-9.

[3] Postorino M N, Sarné G M L. Reinventing mobility paradigms: Flying car scenarios and challenges for urban

mobility[J]. Sustainability, 2020, 12(9): 3581.

[4] A. Biradar, P. DeBitetto, L. Phan, L. Duang, S. Sarma, Hybrid-electric powered aerospace systems and the battery

energy density revolution, in: 2018 IEEE Aerospace Conference, 2018, pp. 1–6

[5] Wang W, Chen Y, Yang C, et al. An enhanced hypotrochoid spiral optimization algorithm based intertwined optimal

f
oo
sizing and control strategy of a hybrid electric air-ground vehicle[J]. Energy, 2022, 257: 124749.

r
[6] -p
Yang C, Lu Z, Wang W, et al. Energy management of hybrid electric propulsion system: recent progress and a flying
re
car perspective under three-dimensional transportation networks[J]. Green Energy and Intelligent Transportation, 2022:
lP

100061.

[7] Feng Y, Dong Z. Optimal energy management with balanced fuel economy and battery life for large hybrid electric
na

mining truck[J]. Journal of Power Sources, 2020, 454: 227948.


ur

[8] Enang W, Bannister C. Robust proportional ECMS control of a parallel hybrid electric vehicle[J]. Proceedings of the
Jo

Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering, 2017, 231(1): 99-119.

[9] Zhang J, Roumeliotis I, Zolotas A. Model-based fully coupled propulsion-aerodynamics optimization for hybrid

electric aircraft energy management strategy[J]. Energy, 2022, 245: 123239.

[10] Doff-Sotta M, Cannon M, Bacic M. Optimal energy management for hybrid electric aircraft[J]. IFAC-PapersOnLine,

2020, 53(2): 6043-6049.

[11] Donateo T, Ficarella A. Designing a hybrid electric powertrain for an unmanned aircraft with a commercial

optimization software[J]. SAE International Journal of Aerospace, 2017, 10(1): 1-11.

[12] Xie Y, Savvaris A, Tsourdos A. Fuzzy logic based equivalent consumption optimization of a hybrid electric propulsion
system for unmanned aerial vehicles[J]. Aerospace Science and Technology, 2019, 85: 13-23.

[13] Pornet C, Gologan C, Vratny P C, et al. Methodology for sizing and performance assessment of hybrid energy

aircraft[J]. Journal of Aircraft, 2015, 52(1): 341-352.

[14] Hu X, Liu T, Qi X, et al. Reinforcement learning for hybrid and plug-in hybrid electric vehicle energy management:

Recent advances and prospects[J]. IEEE Industrial Electronics Magazine, 2019, 13(3): 16-25.

[15] Zhang Z, Zhang T, Hong J, et al. Double deep Q-network guided energy management strategy of a novel electric-

hydraulic hybrid electric vehicle[J]. Energy, 2023: 126858.

f
oo
[16] Liu T, Zou Y, Liu D, et al. Reinforcement learning of adaptive energy management with transition probability for a

r
-p
hybrid electric tracked vehicle[J]. IEEE Transactions on Industrial Electronics, 2015, 62(12): 7837-7846.
re
[17] Zou Y, Liu T, Liu D, et al. Reinforcement learning-based real-time energy management for a hybrid tracked vehicle[J].
lP

Applied energy, 2016, 171: 372-382.

[18] Zou R, Fan L, Dong Y, et al. DQL energy management: An online-updated algorithm and its application in fix-line
na

hybrid electric vehicle[J]. Energy, 2021, 225: 120174.


ur

[19] Sun M, Zhao P, Lin X. Power management in hybrid electric vehicles using deep recurrent reinforcement learning[J].
Jo

Electrical Engineering, 2022, 104(3): 1459-1471..

[20] Tang X, Chen J, Liu T, et al. Distributed deep reinforcement learning-based energy and emission management strategy

for hybrid electric vehicles[J]. IEEE Transactions on Vehicular Technology, 2021, 70(10): 9922-9934.

[21] Donateo T, De Pascalis C L, Strafella L, et al. Off-line and on-line optimization of the energy management strategy in a

Hybrid Electric Helicopter for urban air-mobility[J]. Aerospace Science and Technology, 2021, 113: 106677.

[22] Wei Z, Ma Y, Xiang C, et al. Power Prediction-Based Model Predictive Control for Energy Management in Land and

Air Vehicle with Turboshaft Engine[J]. Complexity, 2021, 2021.

[23] Donateo T, Carlà A, Avanzini G. Fuel consumption of rotorcrafts and potentiality for hybrid electric power systems[J].
Energy Conversion and Management, 2018, 164: 429-442.

[24] Sheng H, Chen Q, Li J, et al. Research on dynamic modeling and performance analysis of helicopter turboshaft

engine's start-up process[J]. Aerospace Science and Technology, 2020, 106: 106097.

[25] Sutton RS, Barto AG. Reinforcement learning: an introduction. Second edition, Cambridge, Massachusetts: The MIT

Press; 2018, p. 1–2

[26] Tang X, Chen J, Pu H, et al. Double Deep Reinforcement Learning-Based Energy Management for a Parallel Hybrid

Electric Vehicle With Engine Start–Stop Strategy[J]. IEEE Transactions on Transportation Electrification, 2021, 8(1):

f
oo
1376-1388.

r
-p
[27] Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI
re
conference on artificial intelligence. 2016, 30(1).
lP
na
ur
Jo
Highlights

⚫ A hybrid electric propulsion system model with dual turbines is established.

⚫ A DDQN-based energy management strategy considering ground and air dual-mode.

⚫ A simplified method for the number of control variables is designed.

⚫ The search and rescue scenario and urban air mobility scenario are applied.

f
r oo
-p
re
lP
na
ur
Jo

1
Declaration of interests

☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.

☐ The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:

of
ro
-p
re
lP
na
ur
Jo

You might also like