Download as pdf or txt
Download as pdf or txt
You are on page 1of 1

Z. Xiang, et al.

Reliability Engineering and System Safety 199 (2020) 106901

Fig. 5. Sample selection of random action.

ηd
θt + 1 = θt −
E [d 2]t + ξ (18)
where d is the gradient of the actor and critic network, θ is the para-
meter of the actor network and the critic network, η is the learning rate,
and γ and ξ are constants. Finally, the updated parameters of the global
agent are shared with the parallel agents.
The global state is applied to hold and share the experimental points
of the parallel states, as shown in Fig. 7. Once an experimental point of
an element is selected, the limit state function value and the reward of
the experimental point will be saved to the global state. When new
actions are taken in the same element, the limit state function value and
the reward can be determined by calling the corresponding element of
the global state. Therefore, it can avoid repeated calls to the limit state
function when overlapped actions are taken in the same element.
With certain parameter initialization of the agents, the random ac-
tion can be initialized at a starting point on the boundary of the sam-
pling space. Simultaneously, another point on the boundary is selected
as the terminal point. When the random action of any parallel agents
reaches the terminal point, the optimization will be terminated. In
Fig. 6. Comparison of limit state function, normalized limit state function, and addition, the convergence of the failure probability can also be used as
reward function. the termination criterion of the DRL optimization [30,36]. The newly
added experimental points can be used to iteratively update the sur-
not interact with the environment. At each training step of a thread, the rogate model as well as the failure probability. The DRL optimization
actor network takes an action to the environment after observing the will be terminated when the relative error of failure probability con-
input state, and the critic network puts out a state value of the input verges.
state. The state value is given by:
3.6. Construction of the surrogate model
0 for terminal st When the DRL optimization is finished, the experimental points of
V (st , θv ′) = ⎧
⎩ (st , θv ′) for non − terminal st
⎨ V (14) the global state are used to train the surrogate model for structural
reliability analysis. Because of its strong non-linear approximation
where V(st, θv′) is the state value of the t-th training step in the thread, st
ability of data, a deep neural network is used as the surrogate model in
is the input state, and θv′ is the parameters of the critic network of the
this study. The construction of the deep neural network is shown in
parallel agent. In this paper, the actor and critic network are updated
Fig. 8. The deep neural network includes the input layer, output layer,
separately, and the updating gradient for each part of the global agent
and several fully connected hidden layers. The input of the deep neural
can be calculated by [40]:
network is the vectors of the experimental points, and the output is the
da = ∇θa ′ log π (at st ;θa ′)(r (at , st ) + V (st + 1; θv ′) − V (st ; θv ′)) (15) predicted value of the limit state function. The hidden layers are acti-
vated by ReLU activation functions in Eq. (10), while the output layer is
d v = ∂ (r (at , st ) + V (st + 1; θv ′) − V (st ; θv ′))2 / ∂θv ′ (16) modeled by the linear activation function. Before training, the selected
where da is the updating gradient of the actor network of the global experimental points are divided into a training set and testing set ac-
agent; θa′ is the parameters of the actor network of the parallel agent; dv cording to a certain ratio. The training set is applied to train the deep
is the updating gradient of the critic network of the global agent; θv′ is neural network using the mini-batch gradient descent method and the
the parameters of the critic network of the parallel agent. According to RMSProp [48] optimization. The testing set is used to test the accuracy
the updating gradient calculated by the parallel agent, the parameters of the trained deep neural network. At last, the well trained deep neural
of the global agent are updated in each training step. To improve the network is applied for structural reliability assessment.
efficiency of optimization, a variant of gradient optimization called
RMSProp [48] is utilized in training. The updating equations are: 4. Numerical examples

E [d 2]t = γE [d 2]t − 1 + (1 − γ ) dt2 (17) In this section, two examples are used to test the accuracy and

You might also like