2021 Depp Qnetwork For Solving TNEP

Received May 4, 2021, accepted May 20, 2021, date of publication May 24, 2021, date of current version
June 2, 2021.
Digital Object Identifier 10.1109/ACCESS.2021.3083266
Transmission Network Dynamic Planning Based

on a Double Deep-Q Network With Deep ResNet
YUHONG WANG1 , (Member, IEEE), XU ZHOU 1 , HONG ZHOU2 , LEI CHEN1 ,
ZONGSHENG ZHENG1 , (Member, IEEE), QI ZENG1 , SHAORONG CAI2 , AND QING WANG2
1 College of Electrical Engineering, Sichuan University, Chengdu 610065, China
2 State Grid Southwest China Branch, Chengdu 610041, China
Corresponding author: Qi Zeng (zengqi-hk@163.com)
This work was supported by the Science and Technology Program of State Grid Southwest China Branch under Grant
SGSW0000GHJS1900117.
ABSTRACT Based on a Double Deep-Q Network with deep ResNet (DDQN-ResNet), this paper proposes
a novel method for transmission network expansion planning (TNEP). Since TNEP is a large scale and
mixed-integer linear programming (MILP) problem, as the transmission network scale and the optimal
constraints increase, the numerical calculation and heuristic learning-based methods suffer from heavy com-
putational complexities in calculation and training. Besides, due to the black box characteristic, the solution
processes of the heuristic learning-based methods are inexplicable and usually require repeated training.
By using DDQN-ResNet, this paper constructs a high-performance and flexible method to solve large-
scale and complex-constrained TNEP problem. Firstly, we form a two-objective TNEP model, in which
one objective is to minimize the comprehensive cost, and another objective is to maximize the transmission
network reliability. The comprehensive cost takes into account the expansion cost, the network loss
cost, and the maintenance cost. The transmission network reliability is evaluated by the expected energy
not served (EENS) and the electrical betweenness. Secondly, from the TNEP model, the TNEP task is
constructed based on the Markov decision process. By abstracting the task, the TNEP environment is
obtained for DDQN-ResNet. In addition, to identify the construction value of lines, an agent is establish
based on DDQN-ResNet. Finally, we perform the static planning and visualize the reinforcement learning
process. The dynamic planning is realized by reusing the training experience. The validity and flexibility of
DDQN-ResNet are verified on RTS 24-bus test system.
INDEX TERMS Transmission network expansion planning (TNEP), reinforcement learning, Double
Deep-Q Network (DDQN), deep learning, ResNet.
I. INTRODUCTION topology becomes more and more complex and huge. At the
A. SUMMARY OF TRANSMISSION EXPANSION same time, with the intervention of uncertain factors such as
NETWORK PLANNING renewable energies [3], the phenomenon of cascading failures
The power industry is a pillar industry for social development. caused by transmission network line failures has become
With the upgrading of energy consumption in various indus- more frequent [4], which results in the stagnation of social
tries [1], there is a certain imbalance between the rapid growth production and a huge waste of social resources. Therefore,
of energy demand and the construction of regional power it is significant to construct a high-performance method for
grids. Specially, the regional power grids with concentrated large scale TNEP by considering both the reliability and the
loads or power sources operate under extreme conditions. The economy.
reliability of the transmission network has huge safety haz- Power system reliability is an index that measures the
ard [2]. As the backbone of the power system, the transmis- power system to maintain stable transmission power under
sion network is related to the reliability of power supply. Its multiple disturbance conditions. Reliability evaluation is a
complex problem, which is closely related to equipment
The associate editor coordinating the review of this manuscript and quality, operation and maintenance conditions, operating
approving it for publication was Ramazan Bayindir . status, and weather [5]. In the power system reliability
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 76921
Y. Wang et al.: Transmission Network Dynamic Planning Based on DDQN-ResNet
evaluation (PSRE), component reliability is the linchpin to three-stage TNEP with the objective of minimizing load shed-
the assessment. For the transmission network, the reliabil- ding, an algorithm was proposed by combing the Harmony
ity assessment of lines and power generation sets is more Search (HS) metaheuristic with the Branch & Bound (B&B).
valuable [6]. The power system reliability research is mainly Although the multi-stage TNEP methods loss the perception
divided into reliability influencing factors and evaluation of some changes in the system, they retain the high efficiency
methods. Reference [7] analyzed the impact of the relay of static planning. Moreover, the sequence of line construc-
protection device failure on the system reliability. The system tion is clearly in these methods, which is more suitable for
voltage violation caused by the shortage of reactive power current engineering needs.
is also a factor that affects the power system reliability.
Reference [8] proposed a calculation index to measure the 3) DYNAMIC PLANNING APPROACH
lack of reactive power for the PSRE. On the other hand, In the situation of longer planning time span, it is necessary to
power system reliability assessment methods are consisting divide the planning into several level years and determine the
of calculation methods and simulation methods. In [9], an details of TNEP construction scheme year by year. Compared
improved sequential cross-entropy method was presented with the sequential static planning method, the dynamic plan-
to accelerate the calculation of expected energy not sup- ning method not only needs to calculate the annual construc-
plied (EENS), loss of load frequency (LOLF), and loss of tion details, but also needs to complete the grid planning task
load probability (LOLP) for the composite systems. Besides, under various unpredictable human interventions, which are
power system component failures are often considered to be both increase the amount of calculation. The adaptive robust
independent, which is inconsistent with the real situation. For optimization was used in reference [15], which completely
this reason, the Markov cut set method based on DC-OPF retains the dynamic characteristics of the TNEP problem in a
was proposed in reference [10] to calculate the PSRE under small area. Nonetheless, the massive calculation and complex
independent faults. model construction of these methods is still a challenge.
TNEP is an influential part of power system planning, and
it consists of the expansion plan generation and selection [11]. B. STUDY OF SYSTEMATIC METHODS TO TNEP
Its purpose is to provide a periodic power system expansion The essence of TNEP task is to build a transmission
plan for the regional power load growth. On the basis of grid structure at a minimize cost within the acceptable
ensuring the transmission network reliability, it is still a major range of environment, political, and economy [16]. In addi-
challenge to improve the reliability and quality of power tion, the structure also should comply various constraints,
supply by constructing new power equipment with minimize such as the power flow constraints, the generator out-
cost [12]. In addition, the TNEP task often regards about put constraints, the line transmission constraints and the
10 years later as the target year, but most power operation data load fluctuation constraints. Since the TNEP task is a
can only be obtained through prediction, which would bring mixed-integer linear programming (MILP) problem, it is usu-
huge uncertainty. Generally, the following three methods are ally non-convex. Therefore, finding a high-speed and reli-
adopted to simplify this dynamic complexity: able solution method is still the focus of current work [17].
At present, the methods for TNEP are mainly divided into
1) STATIC PLANNING APPROACH numerical calculation and heuristic learning-based methods.
This type of method only considers the grid structure of the In terms of numerical calculation methods, reference [18]
target year and calculates the total cost during the planning constructed the multi-level equilibrium model through the
period. In reference [13] the uncertainty parameter range of classification of TNEP constraints, which greatly improves
renewable energy and load were structured, which is used the solution of MILP problems. Similarly, the algorithm
to constructs the uncertainty robust optimization formula for proposed in reference [19] also adopted the layered think-
TNEP under the least-cost objective. Although, the static ing, and it combined the bender decomposition to solve the
approaches have less process constraints, simple objective TNEP model, which considers the representative scenarios
function and low computational complexity, it is not flexible characteristics of renewable generation. In addition, refer-
enough to adapt to the construction of power system with ence [20] took load volatility constraint into TNEP model and
various changes. solved it by Taguchi orthogonal method. Furthermore, in [21],
the game theory was used to calculate the transmission
2) SEQUENTIAL STATIC PLANNING APPROACH expansion planning (TEP) and generation expansion planning
Although static planning approaches have the advantages (GEP), and the result of this method obtained a win-win
of fast calculation and intuitive model construction, such grid structure for generation companies and transmission
methods are difficult to identify the construction sequence companies. Although the numerical calculation approach has
of transmission lines. Besides, they cannot adapt to human a strict theoretical basis, with the gradual expansion of the
intervention during the planning period. Therefore, by setting power grid, the complexity and inefficiency of the solution
multiple target years and taking TNEP for each target year, have become a problem that cannot be ignored. At the same
the flexibility of the static planning method is improved to time, along with the development of heuristic learning-based
form the sequential static planning approach. In [14], to solve methods, they have been considered as the best way to
76922 VOLUME 9, 2021

solve large-scale optimization problems [22]. Compared with 3) This paper builds an agent based on the DDQN-ResNet
method in [21], the improved NSGA algorithm is more effec- to judge the construction value of different lines to
tive in solving GEP and TEP problems [23]. In reference [24], complete the TNEP task in a new perspective.
the combination of PSO algorithm and QP algorithm was 4) DDQN-ResNet is an interactive and flexible machine
used to solve the TNEP model with the objective of mini- learning method for TNEP. Through in-depth analysis
mizing transmission loss under N-1 contingency. In terms of of the interaction process, we realize the explanation of
dynamic TNEP problems, the heuristic learning-based algo- the TNEP solution process and apply the agent to solve
rithm is still powerful, and the dynamic programming genetic TNEP task in various scenarios.
algorithm has been proven to be effective in the dynamic In conclusion, this paper proposes a high-performance
planning of large-scale power distribution networks [25]. DDQN-ResNet based on deep learning and reinforcement
In [26], the simulated annealing algorithm and commercial learning technology to solve TNEP task. It has the charac-
TNEP software LINGO were applied to solve a small-scale teristics of high reusability, interpretable, and flexible, and
TNEP problem. The comparison of the results further verified can well adapt various scenarios of TNEP task. Compared
the effectiveness and superiority of heuristic learning-based with other TNEP methods, it is more in line with the needs of
algorithm. engineering construction.
As the latest generation of machine learning technology,
deep learning and reinforcement learning have been widely D. PAPER STRUCTURE
used in power system control and design [27]. In recent This paper is organized as follows: Section II constructs the
years, with the tremendous growth of computer computing TNEP reliability and economy model. Section III builds the
power and data volume, deep learning has been rapidly com- deep reinforcement learning TNEP solution method based on
mercialized. In the field of load forecasting, a more promi- DDQN-ResNet. Section IV takes the DDQN-ResNet to com-
nent method that combines ResNet with LSTM improves plete TNEP tasks on IEEE RTS 24-bus test system. Section V
the accuracy of MAPE data by 21.3% [28]. Reference [29] makes a conclusion for this paper.
focuses on fault identification in the DC distribution network.
In [30], to distinguish multiple types of faults, the convolu- II. TRANSMISSION NETWORK EXPANSION
tional Neural Network (CNN) was applied to process fault PLANNING MODEL
signals transformed by Hilbert transformer. At present, more In this section, a TNEP model is proposed, which considers
and more researches turn to reinforcement learning. Because the reliability and economy of the system. The reliability
reinforcement learning makes the machine have the ability index of this model is composed of EENS, electrical between-
of autonomous learning and thinking through human-like ness, and N-1 contingency. And the economic index is repre-
interaction modes, researchers believe that it is closer to sented by comprehensive cost. The objection function is
real artificial intelligence than deep learning. Reinforcement
learning has been used to coordinate several photovoltaic gen- f (X ) = min[EENS(X ), GB (X ), CT (X )] (1)
erators [31]. Compared with control battery energy storage where X is the transmission network; EENS(X) is the
system (BESS), the reinforcement learning is more econom- expected energy not supplied; GB is the electrical between-
ical and effective in reducing the voltage problems. Refer- ness; CT is the comprehensive cost.
ence [32], proposed an Automatic Generation Control (AGC)
control method based on multi-agent reinforcement learn- A. POWER FLOW CONSTRAINT
ing, which takes system economy and security into account.
1) NORMAL STATE POWER FLOW CONSTRAINT
Latterly, the reinforcement learning was used to construct
In the normal state of the power system, the active balance
an online DSTATCOM control strategy to improve power
constraint is
quality [33]. All in all, the above achievements show the
huge potential of the latest generation of machine learning 1Pi = PGi − PLi − Pi (X ) = 0, (2)
technology in the electrical engineering.
the generators output constraint is
C. CONTRIBUTION PGi,min ≤ PGi ≤ PGi,max , (3)

The contributions of this paper are listed below: the transmission lines constraint is
1) In this paper, the multi-objective TNEP model con- −PLi,max ≤ PLi ≤ PLi,max , (4)
sidering reliability and economy is established, which
includes the N-1 contingency, the DC power flow con- the bus voltage constraints is
straints, the EENS, the electrical betweenness and the
Ui,min ≤ Ui ≤ Ui,max , (5)
comprehensive costs.
2) The TNEP task is constructed according to the Markov where 1Pi is the net power of node i; PGi is the active power
Decision Process (MDP), and the TNEP environment output to the system by node i generator; PLi is the active
is obtained by abstracting task for DDQN-ResNet. power input of the transmission line i in (MW); Pi (X ) is the
VOLUME 9, 2021 76923

load power; PGi,min is the maximum active power of node i the possible states of the generators are more diverse than the
generator set; PGi,max is the minimum active power of node lines, it is necessary to subdivide the state interval to make it
i generator set; Ui is the voltage of node i; PLi,max is the more practical. The simulation state etj of generator j is
maximum transmission power of line of node i; Ui,max is the 
maximum voltage of node i; Ui,min is the minimum voltage 
 0, others
 X1
of node i. 

 G1 , (t − 1)/T + 1 − Pgeni ≤ sj


 i=0

 ≤ (t − 1)/T + 1 − Pgen0
2) N-1 CONTINGENCY POWER FLOW CONSTRAINT 

 X2
Under the N-1 contingency, the constraints are etj G2 , (t − 1)/T + 1 − i=0
Pgeni ≤ sj j ∈ NG ,

 X 1
1Pi = PN −1 N −1 ≤ (t − 1)/T + 1 − Pgeni

Gi − PLi − Pi (X ) = 0, (6) 
 i=0
..


PN −1
≤ PGi,max ,

PGi,min ≤ Gi (7) 


 .
PLi ≤ PLi,max ,
N −1

−PLi,max ≤ (8) Gj , (t − 1)/T ≤ sj ≤ (t − 1)/T + Pgenj

Ui,min ≤ UiN −1 ≤ Ui,max , (9) (13)
where PN −1
Gi is the active power output of node i under where NG is the total of generators; eti is the simulation state
−1
the N-1 contingency; PN Li is the active power input of the of generator j in the t-th interval.
transmission line i under the N-1 contingency; UiN −1 is the Then, in the t-th interval, all component states are pro-
active power input of the transmission line i under the N-1 cessed according to (12)-(13), and the grid simulation state
contingency. E t is
E t = (et1 · · · etNL , · · · etNL +NG ). (14)
B. IMPROVED TRANSMISSION GRID EXPECTED ENERGY
NOT SUPPLIED BASED ON AVERAGE AND Therefore, for each sampling, the total power shortage
SCATTERED SAMPLING GLoss (E t ) in the t-th interval can be calculated through
The insufficient power supply caused by generators or lines XT
GLoss (E t ) = Gloss (E t )/T , (15)
outage is the major factor affecting the power system reliabil- t=1
ity [34]. In this paper, we propose an improved EENS based where Gloss is the insufficient power for each interval; GLoss
on the average and scattered sampling. It can more accurately is insufficient power for each sampling.
evaluate the system reliability by considering the different Finally, the system EENS can be calculated through mul-
states of generators and lines. tiple Monte Carlo sampling in
Firstly, the interval [0,1] is divided into T subintervals, and X
each subinterval satisfies the average and scattered sampling EENS(X ) = t T
P(E t )GLoss (E t ), (16)
E ∈E
condition
where E T is the state collection of power system; P(E t ) is the
max{P1stop , P2stop , · · · , PN
stop } ≤ 1/T (10) probability of state E t .
where Pistop is the forced outage rate of component i; N is the C. IMPROVED ELECTRICAL BETWEENNESS OF
total of generators and lines. TRANSMISSION GRID BASED ON GINI COEFFICIENT
Secondly, a set of uniformly distributed sequence st in the EENS can evaluate the system reliability by simulating the
interval [0,1] is generated, and used to simulate the state of power system failures and comparing the power shortage.
components in the grid Different from EENS, the electrical betweenness estimates
st = (0.1, · · · ai · · · 0.3) i ∈ N (11) the system reliability through comparing the power flow
balance under normal conditions [35]. This paper uses the
where ai is the simulation state of the components i. improved electrical betweenness by the Gini coefficient to
Thirdly, the transmission line on-off state can be repre- calculate the power flow unbalance for the system reliability
sented by (0,1) variable. In the t-th subinterval, the state et i evaluation.
of line i can be obtained by comparing the random number ai Firstly, the node injected active power P based on DC
and the forced outage rate Pistop power flow is
(
t 0, others
P = Bθ (17)
ei (12)
1, (t − 1)/T ≤ Si ≤ (t − 1)/T + Pistop i ∈ NL , where B is the admittance matrix; θ is the voltage phase
matrix.
where NL is the total of transmission lines; eti
is the simulation
Secondly, the lines power flow from node i to node jPij is
state of transmission line i in t-th interval.
Similarly, for generator set, the output of each generator 1
Pij = −Bij θij = (θi − θj ), (18)
will change according to the change of the system state. Since xij
76924 VOLUME 9, 2021
where Bij is the admittance of the transmission line; θij is the TABLE 1. Relationship between the reliability and the Gini electric
betweenness.
voltage phase of the transmission line.
Thirdly, (18) can be rewritten into matrix form, the DC
power flow vector F is
F = BL 8 (19)
where BL is the line admittance diagonal matrix; 8 is the

voltage phase difference matrix.
The relationship between 8 and θ is assumed to satisfy
8 = Aθ. (20) D. COMPREHENSIVE COST OF TRANSMISSION GRID

EXPANSION
Then, the (17)(19)(20) can be combined to obtain the line With the expansion of the transmission grid scale, the com-
power flow F prehensive cost should be as small as possible under the
premise of ensuring reliability. This paper adopts DC power
F = BL AB−1 P, (21) flow to calculate the comprehensive cost CT of transmission
grid. It is composed of the expansion cost C1 , the network
where P indicates nodes output power loss cost C2 , and the maintenance cost C3 .
CT = C1 + C2 + C3 . (26)
P = [P1 , P2 · · · PN ]T . (22)
In (26), the expansion cost C1 is
The electrical betweenness calculation requires only one XNL
generator set and one load to transmit unit power each time. C1 = λi Cl,i , (27)
l=1
That is, the generating node is taken as Pi = 1, the load
where λi is the line construction decision variable, namely
node is taken as Pj = −1. And the smaller value between
1 means the line is constructed, 0 means the line is not
the generation output and the load input active power is used
constructed; Cl,I is the construction price of line i.
as the weight.
In (26), the network loss cost C2 is
ωij = min{Pgen , Pload },
XNL
(23) C2 = p0 (θl,i − θl,j )2 /gl , (28)
l=1
where Pgen is the generator node j output; Pload is the load where p0 is the electricity price; gl is the line resistance.
node i input. In (26), the maintenance cost C3 is composed of the main-
Next, all the generation-load combinations in the system tenance cost of the lines and the generators,
should be traversed, and the system electrical betweenness XNL XNG
Bij is C 3 = ηl λi Cl,i + ηg Pgen , (29)
l=1 g=1
where ηl is the line annual maintenance cost coefficient; ηg
ωij BL AB−1 P,
X X
Bij = (24) is the generator annual maintenance cost coefficient.
Gj ∈NG Li ∈NL
In summary, this section constructs a TNEP model that
where Gj is the j-th generator node; Li is the i-th load node. considers the economy and reliability of the transmission
The electrical betweenness can intuitively determine the grid. And this model will be used below to build a reinforce-
significant lines in the transmission network. When these ment learning environment.
lines are outage, they will cause the large-scale power flow
transfer and even cascading failures. In this paper, the Gini III. ALGORITHM FOR SOLVING TRANSMISSION
coefficient is used to measure the uniformity of the lines elec- NETWORK PLANNING BASED ON DDQN-RESNET
trical betweenness. The larger the Gini coefficient, the more In this paper, TNEP task is reconstructed as a task to for-
unbalanced power flow and the more vulnerable the grid. mulate the strategy for adding or deleting lines based on
Finally, the improved electrical betweenness GB based on the current transmission network. We obtain the optimization
Gini coefficient is or degradation signals through the objective and constraint
XNL −1 XNL function, and judge whether the line is worthy of construction
GB = (NL − i)(BN − Bi )/NL Bi , (25) or not. If the value of the line is high, it will be retained,
i=1 i=1
otherwise it will be deleted. This operation will be repeated to
where BN is the maximum electrical betweenness; Bi is the update the transmission grid structure until the TNEP task is
electrical betweenness of the line i. completed. In fact, this is the typical limited Markov decision
The relationship between the reliability and the Gini elec- task as shown in Figure 1, and the reinforcement learning is
tric betweenness is shown in Table 1. an effective method to solve the Markov decision problems.
VOLUME 9, 2021 76925

forms the strategy to accomplish tasks only by maintaining

the value function. In this way, the discrete problem can be
solved more directly, and it is easier to implement off-policy,
which is a more efficient training mode of the reinforcement
learning.
DDQN is a deep reinforcement learning method based
on Q-learning and DQN [36]. It inherits the value-based
characteristics of Q-learning. However, when the system
has excessive discrete states or maintains a highly sparse
Q-table matrix, Q-learning is difficult to converge and waste
a lot of time to solve those problem. Thereby, DDQN and
DQN introduce the neural networks to predict the value
of actions to avoids those problems of Q-learning. Further-
FIGURE 1. Finite Markov decision process of TNEP task. more, the DDQN separates the action value prediction from
the maximum value action selection through constructing
two value prediction functions, alleviating the overestimation
Therefore, this paper designs an MDP model according to the problem of the DQN.
characteristics of TNEP task. The model state-space repre- The TNEP task has the characteristics of numerous con-
sentation is nection states and highly sparse connection matrix. There-
fore, this paper adopts the DDQN to accurately judge the
MDP = (S, A, R, γ ), (30) construction value of lines, and then form a high-performance
where γ is the reward conversion factor, S is the set of the method to solve the large-scale TNEP problem.
transmission gird structure, S = {s0 , s1 . . . sn }, and st repre- The MDP model stipulates that the states are independent
sents the combination of the transmission grid lines st = [l0 , of each other, and the state changes after executing actions
l1 . . . ln ], l ∈ (0, 1) and t ∈ (0, n). A is the set of action A = are deterministic. The total reward of each state v(st ) is
{a0 , a1 . . . an }, and building or deleting lines are regarded as
X
v(st ) = p(at | st )q(at |st ), (33)
an action at . If the line li is selected for construction in the at ∈a
state st , the li state will be changed to 1 by the action at ,
where p(at |st ) is the probability of each action at being
otherwise it will be changed to 0. Besides, in each state st ,
selected in state si , q(at | st ) is the cumulative reward of each
only one line can be changed each time to ensure the converge
action at .
of the reinforcement learning training. R is the set of reward
The total reward of state v(st ) can also be obtained via the
R = {R0 , R1 . . . Rn }, Rt is the action at direct reward based on
total reward expectation of each episode,
optimization or degradation signal in the state st , and it is the
most sensitive indicator that affects reinforcement learning. v(st ) = E [G(st )|st ]
The reward function Rt (at |st ) used in this paper is
"∞ #
X
=E γ Rt+k+1 |St
k
Rt (at |st ) = Vbase − (CT (st+1 ) + EENS(st+1 ))
k=0
×(1 + GB (st+1 )) (31) = E [Rt + γ G(st+1 )|st ]
where Vbase is the reward baseline. = E [Rt + γ v(st+1 )|st ] . (34)
In an episode of TNEP task, multiple attempts to change From (33) and (34), the reward of taking action at in state
the line selection are required, and the reward of the different st can be obtain q(at , st ) as
actions need to be converted according to the construction X
sequence. The total reward of each episode Gt (st ) is q(at , st ) = R (at , st ) + γ p (st+1 |st )v(st+1 ), (35)
st+1 ∈s
Gt (st ) = Rt + γ Rt+1 + γ 2 Rt+3 + · · · . (32)
where p(st+1 |st ) is the probability of transition between states
The reinforcement learning method can gradually obtain si and si+1 , and it is equal to p(st+1 |st ) in the deterministic
the construction value of each line in multiple sequence deci- model, that is, when action ai is taken, the state st must be
sion, and a strategy is formed that can obtain the best network changed to state st+1 .
structure. Ultimately, the realization of TNEP task is guided DDQN is a kind of value-based reinforcement learning,
by this strategy. it adopts the strategy of selecting actions with a fixed proba-
bility. It can be seen in (35) that only the construction value
A. DOUBLE DEEP Q-LEARNING of the lines under the various transmission grid structures is
The reinforcement learning can be divided into three cat- needed. This value judgment can be followed to achieve the
egories: value-based methods, strategy-based methods and TNEP task, and we use deep ResNet to build the judgment
actor-critic methods. The value-based reinforcement learning network.
76926 VOLUME 9, 2021

B. DEEP RESNET
The deep learning has two main branches: the deep convo-
lutional networks and the deep confidence networks. In the
field of artificial intelligence, the deep convolutional neural
network (CNN) is the most mature and widely used frame-
work. It applies the convolution kernel to accurately extract
multi-dimensional data features, and then cooperates with the
pooling layer to compress information of each layer. This
framework greatly reduces the number of neurons required
and accelerates network training.
Generally, the more layers of CNN, the better the extraction
effect of data features. However, as the number of layers
FIGURE 3. Value prediction function structure based on deep ResNet.
increases, the gradient disappearance and the gradient dis-
persion frequently occur during the training process, which
always lead to the failure of the CNN training.
DQN constructs two parameterized value networks Qeval
Therefore, this paper uses deep ResNet to improve it.
and Qtarget with the same structure to predict the value of
On the one hand, it can help the network converge by
actions, and selects actions based on ε-greedy strategy. The
introducing data short-circuits on CNN. On the other hand,
reward yt of each action is
the convolutional layer in CNN is retained to efficiently 
extract data features.  Rt , not done
After the short-circuits are introduced by deep ResNet, the yt = (38)
 Rt + γ max Qtarget (st+1 , at+1 , ω ), done,
0
convolutional layers are divided into several residual blocks, at+1
and the block structure is shown in Figure 2.
ω0
where is ResNet parameterize of Qtarget .
The value function of the DQN is iterated according to
Qeval (st , at , ω) ≡ Qeval (st , at , ω) + α(Rt
+γ max Qtarget (st+1 , at+1 , ω0 )
at+1
− Qeval (st , at , ω)), (39)
FIGURE 2. Deep ResNet residual block.
In DQN, Qeval is updated in every yt , and the parameters of
Qeval are copied to Qtarget regularly to improve the problem
The data flow Xli of residual block are of difficult convergence in Q-learning. However, Qtarget may
Xl−2 = BN (ReLU (Conv2D(Xl−3 ))), (36) pass the estimated loss to Qeval during the occasional itera-
tions. After multiple iterations, the loss would not be elimi-
Xl = Xl−3 + Xl−1 , (37) nated, but lead to the overestimation. Therefore, on this basis
where BN is the batch normalization layer; ReLU is the of DQN, DDQN decouples the selection of the maximum
rectified linear unit layer; Conv2D is the 2D convolutional Q value action from the estimation of the action Q value to
layer. reduce the problem of overestimation. And the reward yt of
In the forward propagation, the feature extraction of the each action is

 Rt ,
state is mainly performed by the convolutional layers. But 
 not done
in the back propagation, the loss signal is transmitted to the
yt = Rt + γ Qtarget
shallow layer through the more active short-circuits. Finally,
((st , arg max Qeval (st , ω), a), ω0 ), done,


this architecture can increase the success rate and ensure

at+1
the formation of a stable value judgment network. The deep (40)
ResNet structure is shown in Figure 3.
and the value function of DDQN is iterated according to
C. DDQN VALUE FUNCTION BASED ON DEEP RESNET Qeval (st , at , ω) ≡ Qeval (st , at , ω) + α(Rt
For the TNEP task, the goal of DDQN-ResNet is to train a
+ γ Qtarget (st+1 , arg max
value function that can accurately predict the construction at+1
value of different lines based on the current transmission grid × Qeval (st+1 , at+1 , ω), ω0 )
structure. − Qeval (st , at , ω)). (41)
The value function is parameterized by the deep ResNet.
It has the advantage of convenient derivation and training. The DDQN training flowchart is shown in Figure 4. DDQN
The parameterized value network is Q(st , at , ω) and ω is takes the current grid structure as the starting point of the task.
ResNet parameters. The Qtarget and Qeval value judgment functions are used to
VOLUME 9, 2021 76927

4) Accord to the current Qeval parameters, use the agent

to generate an episode grid plan, and record (st , at , Rt ,
st+1 ) into the memory.
5) Access the data of multiple steps from memory, calcu-
late the Qeval parameters through gradient descent, and
update the Qeval network.
6) Copy Qeval parameters to Qtarget every 1000 steps.
7) Compare T with Tmax . If T is greater than Tmax , return
to step (4) to continue learning, otherwise output the
TNEP scheme.
FIGURE 4. DDQN-ResNet training flowchart. The flow chart of DDQN-ResNet model solution is shown in
figure5.
select lines based on the ε-greedy strategy, and the interac-

tive information is recorded in the memory and periodically
retrieved for ResNet training. Finally, the trained Qtarget is
used to guide the DDQN agent to select high-value lines to
enhance the performance of the transmission network.
D. INTERPRETABLE TNEP BASED ON DDQN-RESNET

Most of TNEP conclusions provided by the heuristic
learning-based methods only have the grid structure of the
target planning year. It is difficult to determine the implemen-
tation plan of different lines during the construction process.
Besides, due to the black box characteristic of the heuristic
learning-based methods, it is hard to rigorously deduct and
explain the process of index optimization. This is the main
problem of the heuristic learning-based methods in engineer-
ing applications.
In this paper, DDQN-ResNet is used to construct an agent
to solve the MDP model of TNEP task. The agent attempts to
change the gird connection and calculate the action rewards
through receiving the grid evaluation indicator signals. Along
with the training process progresses, the strategy for the FIGURE 5. The flow chart of DDQN-ResNet model solution.
agent to accurately identify the value of lines construction
is formed. This strategy can be used to optimize the grid
structure until the TNEP task is complete.
Since DDQN-ResNet is trained every interaction, IV. SIMULATION AND VERIFICATION
the agent’s optimization strategy to solve the TNEP problem Currently, the IEEE RTS 24-bus test system is widely used in
can be obtained by monitoring each interaction process, the power system reliability evaluation. It contains 24 power
making up the problem that the traditional machine learning generation nodes, which include 32 generators with capacity
methods are difficult to explain and track the optimization between 12MW and 400MW, and 17 load nodes with the
process. In addition, it can also help relevant departments to maximum total power of 2850MW. The initial grid consists
formulate detailed grid expansion plans. of 38 lines connected by 3 sets of the interconnecting trans-
formers. The system has two voltage levels of 138kV and
320kV. There are 88 lines to be chosen in this paper, and
E. DDQN-RESNET FOR TNEP
the loads and generators capacity are increased to 1.15 times.
The model solving process is listed as follows: Besides, the investment cost is converted to the equivalent
1) Initialize the line set and two ResNet, and then form annual value. The detailed parameters of the system can be
the reinforcement learning environment based on the found in Appendix and reference [37].
current grid. This article uses PyCharm Community ide as the plat-
2) Establish the DDQN-ResNet agent, and set the action form, the reinforcement learning environment is built based
at , learning rate α, income conversion factor γ . on Python 3.7, and the deep residual network is built on
3) Set the maximum iteration Tmax and initialize the TensorFlow 1.14. The computer CPU adopts Intel-Core-i9-
counter T at the same time. 9900K-3.60GHz, and the memory is 32GB.
76928 VOLUME 9, 2021

DDQN-ResNet maximum iteration is 200 episodes with

the maximum of 40000 steps. The number of average scatter
sampling is 2000 times, and each sample is divided into
8 segments. ResNet copy parameters interval is 1000 steps.
Finally, DQN used for contrast adopts the same parameters.
A. THE STATIC TNEP OF IEEE RTS 24-BUS TEST SYSTEM

In Table 2 and Figure 6, DDQN-ResNet and DQN TNEP
results are compared with that of the reference [38] according
to the above parameters.
This paper adds the calculation of N-1 contingency and
reliability under the minimize economic objective. Among
three schemes, DDQN-ResNet result has the smallest EENS
and electric betweenness after increasing cost of 3.419M$
and 2.26M$. In actual operation, the economic loss caused by
insufficient power due to transmission interruption is usually
more than 10 times of electricity bill, so the grid structure
proposed by DDQN-ResNet is more in line with the require-
ments of high reliability and high economy.
Furthermore, the result shows that three approaches all
reduce the EENS and Gini Electric betweenness of system
through constructing the new lines to balance the power flow.
In Figure 7 and Table 3, the influence of DDQN-ResNet
and the reference [38] method on the power flow is shown.
Through the comparison of the 38 lines power flow in the
initial plan, the superiority of DDQN-ResNet result is further
explained.
Obviously, DDQN-ResNet result greatly reduces the
power flow on the lines of 11-13, 12-12, 12-23, 13-23, and FIGURE 6. TNEP result of three methods.
TABLE 2. Comparison of tnep result on ieee rts 24-bus test system.
FIGURE 7. Comparison chart of power flow between DDQN-ResNet and

reference [38] methods.
TABLE 3. Comparison of power flow between DDQN-ResNet and

literature methods. 16-19. It reduces the number of lines with the power flow
more than 200MW from 4 to 1. This further illustrates that
DDQN-ResNet result is more balanced and reliable.
B. THE DYNAMIC TNEP OF IEEE RTS 24-BUS

In the training process based on heuristic learning, only
a small amount of network training information can be
VOLUME 9, 2021 76929

observed, which makes the process have the characteristics

of a black box, and it is inconvenient to demonstrate the reli-
ability of the algorithm. DDQN-ResNet does not rely on the
supervised or unsupervised training methods of the heuris-
tic learning-based methods. It adopts an interactive learning
method. The transmission grid indicator signal is converted
into a reward to train the agent based on (31), and then the
reliable decision-making strategy is formed for the agent to
judge the value of different lines. The interactive data in this
mode can be monitored to make the training process transpar-
ent and interpretable, and it greatly improves the application
of machine learning in the engineering. Figures 8-10 show
the distribution of EM, EENS and comprehensive cost signals
received by DDQN-ResNet through interactions, and we can
obtain the DDQN-ResNet agent optimization strategy from FIGURE 10. The comprehensive cost of interactive feedback in
this distributions. DDQN-ResNet.
In in the first 1000 steps of Figures 8-10, the distribution

of the transmission grid structure indicators is highly random,
and only part indexes of each transmission grid structure have
good performance. This is due to the insufficient training of and this error would cause the agent unable to be accu-
the ResNet parameters. There is a large error between the rate identify the lines value. After 1500 steps, the random-
true lines value and the ResNet network prediction value, ness of the indicator distribution begins to decrease, and a
stable indicator performance state appears (EENS is about
6 M$, EM is about 0.2, and the comprehensive cost is about
16.5 M$). This shows that the agent gradually forms a sta-
ble line value function and has a relatively good strategy.
However, DDQN-ResNet adopts the ε-greedy strategy, which
allows the DDQN-ResNet agent to randomly select different
line combinations to construct the transmission grid based
on this stable structure with a probability of 10%. This
trial-development mechanism can ensure the convergence of
ResNet. At the same time, it will make the learning data cover
more connection structures and reduce the problem of local
optimization. Thereby, a high-performance agent is trained to
complete TNEP task, which is better than the other methods.
Figure 11 shows the best lines selected in 200 episodes
training. The color switch indicates that the agent is try-
ing to explore the reward of different lines to the current
FIGURE 8. EENS of interactive feedback in DDQN-ResNet. transmission network structure. The increase of the bubble
diameter indicates improvement of the predictive value of the
lines. During the training process, the agent finds 16 potential
high-value lines and finally determines 7 high-value lines.
Since the optimization process of the heuristic learning-
based method is always towards the direction of the minimize
loss, the local optimization problems are occurred frequently.
However, DDQN-ResNet is based on the ε-greedy strategy,
which drives agents to select the line that has been judged
to be the most valuable with a probability of 90% for con-
struction, and randomly to select a line for a trial with a
probability of 10%. This mechanism can effectively prevent
the agent from repeatedly increasing the value of certain lines,
so that the planning result of the agent is not limited to the
combination of these lines. The agent can greatly ensure
the diversity of the planning learning dates and avoid local
FIGURE 9. The electrical betweenness of interactive feedback in optimization problems by trying some lines that are not of
DDQN-ResNet. the greatest value.
76930 VOLUME 9, 2021

FIGURE 11. The best line identified by DDQN-ResNet in 200 episodes.

FIGURE 13. The changes of DQN agent’s judgment on lines value.
Both DDQN-ResNet and DQN achieve transmission grid

planning tasks by adjusting the predicted value of each line
during the training process. The change trend of the line value
during the two training processes can be compared to verify
the improvement of DDQN-ResNet in the overestimation
problems. Figures 12-14 show the value changes of the lines
during DDQN-ResNet and DQN training process. The x-axis
represents the line number, the y-axis represents the number
of training, and the z-axis represents the predicted value of
the lines by the agent.
It can be concluded from Figures 12-13 that these two
algorithms successfully construct an expansion plan at each
episode. Their agents receive index rewards, and the rewards
are distributed according to the contribution of each line in
each planning. As the training progresses, the value of the line
FIGURE 14. The difference between DDQN-ResNet and DQN agent on
with good performance will increase faster, and vice versa. lines value.
The difference of the lines value predictions between
DDQN-ResNet and DQN during the iteration is plotted
in Figure 14. Obviously, the difference between the predicted lower than that of DQN. Since the double residual network
value of DDQN-ResNet and the DQN is mostly negative, is established by DDQN-ResNet to form Qeval and Qtarget ,
which means that the predicted value of DDQN-ResNet is it realizes the separation of the best line selection and the
best line value estimation. This strategy can reduce the impact
of the accidental overestimation in DQN on the subsequent
iterations to ensure that the predicted value is consistent
with the true value. At the same time, this mechanism can
accelerate the value enhancement of high-performance lines.
The total value of each episode is shown in Figure 15. The
difference further confirms the effect of DDQN-ResNet in
reducing overestimation.
The heuristic learning-based methods can only generate
the target year structure through the line combination when
dealing with static planning problems. Although on this basis,
some scholars have considered the lines reuse and the gen-
erate of multiple level years to increase certain dynamic
characteristics, there is still a big gap between this method
and a highly flexible, observable, and interpretable method.
DDQN-ResNet adopts the complete interactive learning
FIGURE 12. The changes of DDQN-ResNet agent’s judgment on lines mode. The training objective has been changed from find-
value. ing the combination of lines to accurately distinguishing
VOLUME 9, 2021 76931

well coordinate the relationship among comprehensive cost,

EENS and EM. On the contrary, DQN pursues the reduction
of comprehensive cost too much, which reduces the reliability
of transmission grid. This also proves that DDQN-ResNet
weakens the impact of overestimation through the double Q
network, which makes the line value prediction more accurate
and the final plan is more in line with engineering needs.
TNEP is an idealized design of the grid in the target year
based on the current transmission grid. However, as time
goes by, the construction plan may be forced to change due
to political, economic and environmental factors. Therefore,
the dynamic programming algorithms have higher require-
ments for reusability and flexibility. DDQN-ResNet agent
can quickly adapt to these changes through the established
FIGURE 15. The total lines value predicted by DDQN-ResNet and DQN strategy and continue to complete the new TNEP tasks with-
agent.
out repeated training.
This paper designs two typical dynamic planning scenarios
based on the results of the above static planning:
the value of lines. It avoids a series of problems of heuris-
Scenario 1: Based on the static planning scheme, when the
tic learning-based methods, and it make the learning pro-
fourth line 18-19 of DDQN-ResNet is changed to 20-23, the
cess clearer. The static planning optimal lines construction
dynamic planning result is listed in Table 6. As a comparison,
sequence of DDQN-ResNet in RTS 24-bus test system is
the fourth line 15-21 of DQN is changed to 16-19, the agent
listed in Table 4, and as a comparison, the result of DQN is
dynamic planning result is listed in Table 7.
listed in Table 5.
It shows that the current electrical betweenness of
TABLE 4. Transmission network planning line sequence of DDQN-ResNet.
DDQN-ResNet scheme is in a good range (<0.2) after joining
the intervention, and it is more inclined to adjust the objective
TABLE 6. Dynamic transmission network planning line sequence of

ddqn-resnet in scenario 1.
TABLE 5. Transmission network planning line sequence of DQN.
TABLE 7. Dynamic transmission network planning line sequence of DQN

in scenario 1.
The optimal line construction sequence of DDQN-ReaNet

is listed in Table 4. As a comparison, the result of DQN
is listed in Table 5. As can be seen, DDQN-ResNet can
76932 VOLUME 9, 2021

of constructing the new transmission lines to reduce EENS. The results in Tables 8 and 9 show that with the increase
At the same time, the construction cost of new result is nearly of generation and load capacity, the evaluation index of the
unchanged. Although DQN can continue to optimize the transmission grid deteriorates greatly. Under this influence,
network structure after the intervention, the initial two lines both DDQN-ResNet’s and DQN’s agents choose to build
are not as good as DDQN-ResNet and the overestimation of 4 relatively high value lines to improve the performance of
the line value judgment results in poor performance. the transmission network. The new grid structure can support
Scenario 2: Based on the static planning scheme, as the regional development by adding new lines and retaining the
load and generator capacity increase from 1.15 to 1.30 times original lines. DDQN-ResNet achieves better EENS and EM
in the future, the transmission grid needs to continue to performance under almost the same comprehensive cost as
build new lines for ensuring the reliable power supply in this compared to DQN. It is because that after the intervention is
area. The dynamic planning result of DDQN-ResNet is listed added, the difference of the predicted value can be quickly
in Table 8, and the result of DQN is listed in Table 9. corrected by DDQN-ResNet, but the correction of DQN is
slower, which makes the line value judgment in the planning
scheme had a large error and the result is not as good as
DDQN-ResNet.
TABLE 8. Dynamic transmission network planning line sequence of
DDQN-ResNet in scenario 2.
In summary, DDQN-ResNet can solve TNEP problem in
a more flexible and efficient way to support regional devel-
opment. Its interactive learning method makes the construc-
tion plan clearer and more convenient for TNEP task of the
large-scale grid.
C. THE DYNAMIC TNEP OF IEEE NEW ENGLAND

39-BUS TEST SYSTEM
Furthermore, DDQN-ResNet is applied to the more complex
IEEE new England 39-bus test system, as shown in Figure 16.
It contains 10 generators with capacity between 250MW and
1000MW, and 39 load nodes with the maximum total power
TABLE 9. Dynamic transmission network planning line sequence of DQN

in scenario 2.
FIGURE 16. TNEP result of IEEE New England 39-bus system.
VOLUME 9, 2021 76933

TABLE 10. Transmission network planning line sequence of TABLE 11. Line parameters of IEEE RTS 24-bus test system.
DDQN-ResNet in ieee new england 39-bus test system.
of 6254.23MW. The initial grid consists of 46 lines, which are

used as the chosen lines in this test. The loads and generators
capacity are increased to 1.15 times. The TNEP result is listed
in Table 10 and the structure is shown in Figure 16.
In Table 10, it shows that the EENS of the transmission
grid is relatively large, because the IEEE New England 39 bus
system contains only 10 large-capacity power generation
nodes, and many nodes are connected through a single line.
The result of DDQN-ResNet contains a total of 8 lines.
Although EM level of the transmission network is slightly
reduced, EENS is greatly reduced. This scheme enhances
the reliability of the transmission grid and helps the system
maintain stable operation.
V. CONCLUSION
For the first time, this paper proposes a novel method to solve
TNEP tasks through the deep reinforcement learning DDQN-
ResNet. In addition, we construct a transmission network
expansion planning model. The model contains the improved
EENS based on equal dispersive sampling, the improved
electric betweenness based on Gini coefficient and the com-
prehensive cost of transmission network, and we use them
to measure the reliability and economy of the transmission
network. Furthermore, based on this model, we establish a
TNEP MDP model and a reinforcement environment, they
preserve the characteristics of TNEP task and the rules of the
Markov decision process.
DDQN-ResNet uses a completely different learning mode
from the heuristic learning-based method. We use the highly
visible and flexible interactive learning mode of this method
to monitor the TNEP task solving process, thereby improv-
ing the black block problem of the heuristic learning-based
methods and increasing the reliability of its application in
engineering. Furthermore, this method combines DDQN with
ResNet to avoid the over estimation problem and improve
training convergence. Therefore, DDQN-ResNet is a high-
performance method for TNEP task.
76934 VOLUME 9, 2021

TABLE 11. (Continued.) Line parameters of IEEE RTS 24-bus test system. transmission lines sequence, and the other is to expand the
generators and loads capacity. DDQN-ResNet agent com-
pletes the new TNEP tasks in two scenarios based on existing
strategy without repeated training, and it demonstrates the
high flexibility and efficiency of DDQN-ResNet.
We apply DDQN-ResNet to TNEP tasks, it can obtain a
more thorough implementation plan to help the transmission
grid construction department better decide on the construc-
tion details. At the same time, this method can flexibly accept
various actual requirements and changes in construction with-
out repetitive training, and realize TNEP tasks in multiple
scenarios. Therefore, DDQN-ResNet is a high-performance
method suitable for complex large-scale grid design and
planning.
However, the TNEP task is a complex engineering prob-
lem, and this article discussed and verified the feasibility of
deep reinforcement learning in this problem. In future work,
we will enhance the application of this method in engineer-
ing from two aspects. On the one hand, we will consider
the uncertainty of large-scale adjustable electric vehicles
load [39] and new energy [40] in the TNEP model. Further-
more, we will improve the algorithm structure to enhance its
efficiency and ability in complex model problems.
APPENDIX
The line parameters of IEEE RTS 24-bus are listed as
Table 11.
REFERENCES
[1] S. Fan, ‘‘Research on deep learning energy consumption prediction
based on generating confrontation network,’’ IEEE Access, vol. 7,
pp. 165143–165154, 2019, doi: 10.1109/ACCESS.2019.2949030.
[2] P. Yong, N. Zhang, C. Kang, Q. Xia, and D. Lu, ‘‘MPLP-based fast power
system reliability evaluation using transmission line status dictionary,’’
IEEE Trans. Power Syst., vol. 34, no. 2, pp. 1630–1640, Mar. 2019, doi: 10.
1109/TPWRS.2018.2878324.
[3] M. Ajalli and A. Pirayesh, ‘‘Effect of wind units on market clearing price,
transmission congestion and network reliability in transmission expan-
sion planning,’’ J. Eng., vol. 2018, no. 5, pp. 304–315, May 2018, doi:
10.1049/joe.2017.0909.
[4] A. Bagheri and C. Zhao, ‘‘Distributionally robust reliability assessment
for transmission system hardening plan under N − k security criterion,’’
IEEE Trans. Rel., vol. 68, no. 2, pp. 653–662, Jun. 2019, doi: 10.1109/
TR.2019.2893138.
[5] Y. Shu and Y. Tang, ‘‘Analysis and recommendations for the adaptability
of China’s power system security and stability relevant standards,’’ CSEE
J. Power Energy Syst., vol. 3, no. 4, pp. 334–339, Dec. 2017, doi: 10.
17775/CSEEJPES.2017.00650.
[6] B. Hu, K. Xie, and H.-M. Tai, ‘‘Inverse problem of power system reli-
ability evaluation: Analytical model and solution method,’’ IEEE Trans.
Power Syst., vol. 33, no. 6, pp. 6569–6578, Nov. 2018, doi: 10.1109/
TPWRS.2018.2839841.
[7] A. Safdarian, M. Farajollahi, and M. Fotuhi-Firuzabad, ‘‘Impacts of
remote control switch malfunction on distribution system reliability,’’
IEEE Trans. Power Syst., vol. 32, no. 2, pp. 1572–1573, Mar. 2017, doi:
10.1109/TPWRS.2016.2568747.
[8] W. Qin, P. Wang, X. Han, and X. Du, ‘‘Reactive power aspects in reliability
In this paper, DDQN-ResNet is applied to the transmission assessment of power systems,’’ IEEE Trans. Power Syst., vol. 26, no. 1,
network static planning. The result has the high reliability and pp. 85–92, Feb. 2011, doi: 10.1109/TPWRS.2010.2050788.
economy than the reference, and it verifies the superiority [9] Y. Zhao, Y. Tang, W. Li, and J. Yu, ‘‘Composite power system reliability
evaluation based on enhanced sequential cross-entropy Monte Carlo simu-
of this method. According to the static planning result, two lation,’’ IEEE Trans. Power Syst., vol. 34, no. 5, pp. 3891–3901, Sep. 2019,
dynamic scenarios are designed, one is to change the optimal doi: 10.1109/TPWRS.2019.2909769.
VOLUME 9, 2021 76935

[10] Y. Liu and C. Singh, ‘‘Reliability evaluation of composite power systems [30] M.-F. Guo, N.-C. Yang, and W.-F. Chen, ‘‘Deep-learning-based fault
using Markov cut-set method,’’ IEEE Trans. Power Syst., vol. 25, no. 2, classification using Hilbert–Huang transform and convolutional neural
pp. 777–785, May 2010, doi: 10.1109/TPWRS.2009.2033802. network in power distribution systems,’’ IEEE Sensors J., vol. 19, no. 16,
[11] J. Cervantes and F. Fred Choobineh, ‘‘A quantile-based approach for pp. 6905–6913, Aug. 2019, doi: 10.1109/JSEN.2019.2913006.
transmission expansion planning,’’ IEEE Access, vol. 8, pp. 82630–82640, [31] M. Al-Saffar and P. Musilek, ‘‘Reinforcement learning-based distributed
2020, doi: 10.1109/ACCESS.2020.2991127. BESS management for mitigating overvoltage issues in systems with high
[12] N. G. Ude, H. Yskandar, and R. C. Graham, ‘‘A comprehensive state- PV penetration,’’ IEEE Trans. Smart Grid, vol. 11, no. 4, pp. 2980–2994,
of-the-art survey on the transmission network expansion planning opti- Jul. 2020, doi: 10.1109/TSG.2020.2972208.
mization algorithms,’’ IEEE Access, vol. 7, pp. 123158–123181, 2019, doi: [32] J. Li, T. Yu, H. Zhu, F. Li, D. Lin, and Z. Li, ‘‘Multi-agent deep rein-
10.1109/ACCESS.2019.2936682. forcement learning for sectional AGC dispatch,’’ IEEE Access, vol. 8,
[13] X. Han, L. Zhao, J. Wen, X. Ai, J. Liu, and D. Yang, ‘‘Transmission pp. 158067–158081, 2020, doi: 10.1109/ACCESS.2020.3019929.
network expansion planning considering the generators’ contribution to [33] M. Bagheri, V. Nurmanova, O. Abedinia, and M. S. Naderi, ‘‘Enhancing
uncertainty accommodation,’’ CSEE J. Power Energy Syst., vol. 3, no. 4, power quality in microgrids with a new online control strategy for DSTAT-
pp. 450–460, Dec. 2017, doi: 10.17775/CSEEJPES.2015.01190. COM using reinforcement learning algorithm,’’ IEEE Access, vol. 6,
[14] L. E. D. Oliveira, J. T. Saraiva, P. V. Gomes, and F. D. Freitas, ‘‘A three- pp. 38986–38996, 2018, doi: 10.1109/ACCESS.2018.2852941.
stage multi-year transmission expansion planning using heuristic, Meta- [34] G. Li, Y. Huang, and Z. Bie, ‘‘Reliability evaluation of smart distri-
heuristic and decomposition techniques,’’ in Proc. IEEE Milan PowerTech, bution systems considering load rebound characteristics,’’ IEEE Trans.
Jun. 2019, pp. 1–6, doi: 10.1109/PTC.2019.8810478. Sustain. Energy, vol. 9, no. 4, pp. 1713–1721, Oct. 2018, doi: 10.1109/
[15] R. Garcia-Bertrand and R. Minguez, ‘‘Dynamic robust transmission expan- TSTE.2018.2810220.
sion planning,’’ IEEE Trans. Power Syst., vol. 32, no. 4, pp. 2618–2628, [35] D.-S. Yang, Y.-H. Sun, B.-W. Zhou, X.-T. Gao, and H.-G. Zhang, ‘‘Critical
Jul. 2017, doi: 10.1109/TPWRS.2016.2629266. nodes identification of complex power systems based on electric cactus
[16] R. Hemmati, R.-A. Hooshmand, and A. Khodabakhshian, ‘‘State-of-the- structure,’’ IEEE Syst. J., vol. 14, no. 3, pp. 4477–4488, Sep. 2020, doi: 10.
art of transmission expansion planning: Comprehensive review,’’ Energy 1109/JSYST.2020.2967403.
Rev., vol. 165, no. 1, pp. 842–862, Nov. 2020. [36] H. Van Hasselt, A. Guez, and D. Silver, ‘‘Deep reinforcement learning with
double Q-learning,’’ in Proc. AAAI, vol. 16, Feb. 2016, pp. 2094–2100.
[17] A. Moreira, G. Strbac, R. Moreno, A. Street, and I. Konstantelos, ‘‘A five-
[37] P. Subcommittee, ‘‘IEEE reliability test system,’’ IEEE Trans. Power
level MILP model for flexible transmission network planning under uncer-
App. Syst., vol. PAS-98, no. 6, pp. 2047–2054, Nov. 1979, doi:
tainty: A min–max regret approach,’’ IEEE Trans. Power Syst., vol. 33,
10.1109/TPAS.1979.319398.
no. 1, pp. 486–501, Jan. 2018, doi: 10.1109/TPWRS.2017.2710637.
[38] H. Zhang, V. Vittal, G. T. Heydt, and J. Quintero, ‘‘A mixed-integer
[18] D. Pozo, E. Sauma, and J. Contreras, ‘‘A three-level static MILP model
linear programming approach for multi-stage security-constrained trans-
for generation and transmission expansion planning,’’ in Proc. IEEE PES
mission expansion planning,’’ IEEE Trans. Power Syst., vol. 27, no. 2,
T&D Conf. Expo., Apr. 2014, p. 1, doi: 10.1109/TDC.2014.6863239.
pp. 1125–1133, May 2012, doi: 10.1109/TPWRS.2011.2178000.
[19] Y. Li, J. Wang, and T. Ding, ‘‘Clustering-based chance-constrained trans- [39] C. Kong, R. Jovanovic, I. Bayram, and M. Devetsikiotis, ‘‘A hierarchical
mission expansion planning using an improved benders decomposition optimization model for a network of electric vehicle charging stations,’’
algorithm,’’ IET Gener., Transmiss. Distrib., vol. 12, no. 4, pp. 935–946, Energies, vol. 10, no. 5, p. 675, May 2017.
Feb. 2018, doi: 10.1049/iet-gtd.2017.0117. [40] B. Zeng, Y. Liu, F. Xu, Y. Liu, X. Sun, and X. Ye, ‘‘Optimal demand
[20] H. Yu, C. Y. Chung, and K. P. Wong, ‘‘Robust transmission network response resource exploitation for efficient accommodation of renewable
expansion planning method with Taguchi’s orthogonal array testing,’’ energy sources in multi-energy systems considering correlated uncer-
IEEE Trans. Power Syst., vol. 26, no. 3, pp. 1573–1580, Aug. 2011, doi: tainties,’’ J. Cleaner Prod., vol. 288, Mar. 2021, Art. no. 125666, doi:
10.1109/TPWRS.2010.2082576. 10.1016/j.jclepro.2020.125666.
[21] M. Z. Jahromi, M. M. H. Bioki, M. Rashidinejad, and R. Fadaeinedjad,
‘‘Transmission and generation expansion planning considering loadability
limit using game theory & ANN,’’ in Proc. 11th Int. Conf. Environ. Electr.
Eng., May 2012, pp. 661–666, doi: 10.1109/EEEIC.2012.6221459.
[22] H. Li, X. Liu, Z. Huang, C. Zeng, P. Zou, Z. Chu, and J. Yi, ‘‘Newly
emerging nature-inspired optimization–algorithm review, unified frame-
work, evaluation, and behavioural parameter optimization,’’ IEEE Access, YUHONG WANG (Member, IEEE) received
vol. 8, pp. 72620–72649, 2020, doi: 10.1109/ACCESS.2020.2987689. the B.S. and M.S. degrees in power system
[23] P. Murugan, S. Kannan, and S. Baskar, ‘‘Application of NSGA-II algorithm and its automation from Chongqing University,
to single-objective transmission constrained generation expansion plan-
Chongqing, China, in 1993 and 1995, respectively,
ning,’’ IEEE Trans. Power Syst., vol. 24, no. 4, pp. 1790–1797, Nov. 2009,
and the Ph.D. degree in power system and its
doi: 10.1109/TPWRS.2009.2030428.
automation from Southwest Jiaotong University,
[24] L. F. F. Ledezma and G. G. Alcaraz, ‘‘Hybrid binary PSO for trans-
mission expansion planning considering N-1 security criterion,’’ IEEE Chengdu, China, in 2008.
Latin Amer. Trans., vol. 18, no. 3, pp. 545–553, Mar. 2020, doi: 10.1109/ She is currently a Professor with Sichuan Uni-
TLA.2020.9082726. versity. Her research interests include power sys-
[25] E. G. Carrano, R. T. N. Cardoso, R. H. C. Takahashi, C. M. Fonseca, tem stability and control, HVDC and FACTS,
and O. M. Neto, ‘‘Power distribution network expansion scheduling using renewable energy integration, and artificial intelligence.
dynamic programming genetic algorithm,’’ IET Gener., Transmiss. Dis-
trib., vol. 2, no. 3, pp. 444–455, May 2008, doi: 10.1049/iet-gtd:20070174.
[26] X. Z. Liu, R. X. Yuan, and D. C. Liu, ‘‘Application of simulated annealing
algorithm on transmission network expansion planning,’’ Proc. Chin. Soc.
Univ. Electr. Power Syst. Autom, vol. 22, no. 2, pp. 11–14, 2010.
[27] L. Cheng and T. Yu, ‘‘A new generation of AI: A review and perspective on
machine learning technologies applied to smart energy and electric power XU ZHOU received the B.S. degree in electrical
systems,’’ Int. J. Energy Res., vol. 43, no. 6, pp. 1928–1973, May 2019, engineering from Sichuan University, Chengdu,
doi: 10.1002/er.4333. China, in 2019, where he is currently pursuing the
[28] H. Choi, S. Ryu, and H. Kim, ‘‘Short-term load forecasting based on degree in electrical engineering.
ResNet and LSTM,’’ in Proc. IEEE Int. Conf. Commun., Control, Com- His current research interests include the appli-
put. Technol. Smart Grids (SmartGridComm), Oct. 2018, pp. 1–6, doi: cation of deep learning and reinforcement learning
10.1109/SmartGridComm.2018.8587554. for power system problems.
[29] L. Guomin, T. Yingjie, Y. Changyuan, L. Yinglin, and H. Jinghan,
‘‘Deep learning-based fault location of DC distribution networks,’’ J. Eng.,
vol. 2019, no. 16, pp. 3301–3305, Mar. 2019, doi: 10.1049/joe.2018.8902.
76936 VOLUME 9, 2021

HONG ZHOU received the M.S. degree in QI ZENG received the Ph.D. degree from Sichuan
electrical engineering from Sichuan University, University, China.
Chengdu, China. She is currently an Associate Professor with
He is currently an Electric Power Engineer with Sichuan University. Her main research interests
the State Grid Southwest China Branch, Chengdu. include power system stability analysis and con-
His current research interest includes the optimiza- trol, robust control theory, and HVDC transmis-
tion and planning of electrical power systems. sion technologies.
LEI CHEN received the B.S. degree in electrical

engineering from Sichuan University, Chengdu, SHAORONG CAI received the M.S. degree in
China, in 2019, where he is currently pursuing the landscape architecture from Nanjing Forestry Uni-
degree in electrical engineering. versity, Nanjing, China.
His current research interest includes the appli- He is currently an Electric Power Engineer with
cation of machine learning for power system the State Grid Southwest China Branch, Chengdu.
planning. His current research interest includes the optimiza-
tion and planning of electrical power systems.
ZONGSHENG ZHENG (Member, IEEE) received

the B.S. degree in bioinformatics and the Ph.D.
degree in electrical engineering from Southwest QING WANG received the M.S. degree in electri-
Jiaotong University, Chengdu, China, in 2013 and cal engineering and automation from North China
2020, respectively. Electric Power University, Beijing, China.
From 2018 to 2019, he was a Visiting Scholar He is currently an Electric Power Engineer with
with the Bradley Department of Electrical and the State Grid Southwest China Branch, Chengdu.
Computer Engineering, Virginia Tech-Northern His current research interest includes the optimiza-
Virginia Center, Falls Church, VA, USA. He is tion and planning of electrical power systems.
currently an Associate Research Fellow with the
College of Electrical Engineering, Sichuan University. His research interests
include uncertainty quantification, parameter, and state estimation.
VOLUME 9, 2021 76937

2021 Depp Qnetwork For Solving TNEP

Uploaded by

Copyright:

Available Formats

You might also like

2021 Depp Qnetwork For Solving TNEP

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2021 Depp Qnetwork For Solving TNEP

Uploaded by

Copyright:

Available Formats

Received May 4, 2021, accepted May 20, 2021, date of publication May 24, 2021, date of current version

Transmission Network Dynamic Planning Based

76922 VOLUME 9, 2021

C. CONTRIBUTION PGi,min ≤ PGi ≤ PGi,max , (3)

VOLUME 9, 2021 76923

Ui,min ≤ UiN −1 ≤ Ui,max , (9) (13)

where BL is the line admittance diagonal matrix; 8 is the

8 = Aθ. (20) D. COMPREHENSIVE COST OF TRANSMISSION GRID

VOLUME 9, 2021 76925

forms the strategy to accomplish tasks only by maintaining

76926 VOLUME 9, 2021

VOLUME 9, 2021 76927

4) Accord to the current Qeval parameters, use the agent

select lines based on the ε-greedy strategy, and the interac-

D. INTERPRETABLE TNEP BASED ON DDQN-RESNET

76928 VOLUME 9, 2021

DDQN-ResNet maximum iteration is 200 episodes with

A. THE STATIC TNEP OF IEEE RTS 24-BUS TEST SYSTEM

TABLE 2. Comparison of tnep result on ieee rts 24-bus test system.

FIGURE 7. Comparison chart of power flow between DDQN-ResNet and

TABLE 3. Comparison of power flow between DDQN-ResNet and

B. THE DYNAMIC TNEP OF IEEE RTS 24-BUS

VOLUME 9, 2021 76929

observed, which makes the process have the characteristics

In in the first 1000 steps of Figures 8-10, the distribution

76930 VOLUME 9, 2021

FIGURE 11. The best line identified by DDQN-ResNet in 200 episodes.

Both DDQN-ResNet and DQN achieve transmission grid

VOLUME 9, 2021 76931

well coordinate the relationship among comprehensive cost,

TABLE 6. Dynamic transmission network planning line sequence of

TABLE 5. Transmission network planning line sequence of DQN.

TABLE 7. Dynamic transmission network planning line sequence of DQN

The optimal line construction sequence of DDQN-ReaNet

76932 VOLUME 9, 2021

C. THE DYNAMIC TNEP OF IEEE NEW ENGLAND

TABLE 9. Dynamic transmission network planning line sequence of DQN

FIGURE 16. TNEP result of IEEE New England 39-bus system.

VOLUME 9, 2021 76933

of 6254.23MW. The initial grid consists of 46 lines, which are

76934 VOLUME 9, 2021

VOLUME 9, 2021 76935

76936 VOLUME 9, 2021

LEI CHEN received the B.S. degree in electrical

ZONGSHENG ZHENG (Member, IEEE) received

VOLUME 9, 2021 76937

You might also like