On-Orbit Reconfiguration Using Adaptive Dynamic Programming For Multi-Mission-Constrained Spacecraft Attitude Control System

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

International Journal of Control, Automation and Systems 17(4) (2019) 822-835 ISSN:1598-6446 eISSN:2005-4092

http://dx.doi.org/10.1007/s12555-018-9308-5 http://www.springer.com/12555

On-orbit Reconfiguration Using Adaptive Dynamic Programming for


Multi-mission-constrained Spacecraft Attitude Control System
Yue-Hua Cheng, Bin Jiang* O , Huan Li, and Xiao-dong Han

Abstract: For the on-orbit reconfiguration problem of spacecraft attitude control systems under multi-mission con-
straints, the idea of a reinforcement-learning algorithm is adopted, and an adaptive dynamic programming algorithm
for on-orbit reconfiguration decision-making that is based on a dual optimization index is proposed. Two optimiza-
tion objectives, total mission reward and total control cost (energy consumption), are defined to obtain the optimal
reconfiguration policy of the spacecraft attitude control system reconfiguration, and the on-orbit reconfiguration
model for multi-mission constraints is established. Then, based on the Bellman optimality principle, the optimal
reconfiguration policy formulated by the discrete HJB equation is obtained. Since the HJB equation is difficult to
solve accurately, a method of bi-objective adaptive dynamic programming is proposed to obtain the optimal recon-
figuration policy. This method constructs a mission network and an energy network. The method then adopts a
Q-learning-based algorithm to train the networks to estimate the values of total mission reward and total control
cost to achieve the on-orbit optimal reconfiguration decision under multi-mission constraints. Simulation results for
different cases demonstrate the validity and rationality of the proposed method.

Keywords: Adaptive dynamic programming, attitude control system, multi-mission constraints, on-orbit reconfig-
uration, reinforcement learning.

1. INTRODUCTION into the active approach and passive approach, according


to whether fault-detection and isolation is required. Many
With developments in the reliability and safety of advanced methods focus on designing an observer or es-
spacecraft, there is an increasing requirement for attitude timator to perform fault-detection or estimation, such as
control systems (ACS) that are capable of autonomous on- robust observer [5], adaptive observer [6], adaptive fuzzy
orbit reconfiguration to stabilize the spacecraft and imple- estimator [7] and so on [8], and the fault information is
ment predesigned missions in the possible case of an un- then used to reconfigure the controller. Fault tolerant con-
expected fault [1, 2]. The autonomous on-orbit reconfigu- troller (or control law) design is another main work in
ration technology of ACS can reduce the impact on the es- ACS. Especially, adaptive control [9, 10], sliding mode
tablished mission planning and enhance the ability to cope control [11,12] and control allocation [13] are most widely
with faults; On the other hand, information exchange with used techniques. It is worth mentioning that Markovian
ground stations can be reduced to cut down the cost of jump systems (MJS) have recently aroused much atten-
ground support, and task implementation capabilities can tion since they can give better description for systems with
be effectively improved to maximize the use of platform stochastic abrupt structural changes such as the failure or
resources. the switching of multiple modes of the system. Based
Current studies on fault reconfiguration for ACS have on the asynchronization phenomenon, which means that
mostly focused on the design of fault-tolerant controllers the controller mode is asynchronous to the system mode,
(FTC) [3]. For instance, utilizing the analytical redun- a novel asynchronous control via sliding mode approach
dancy relationship of the system, a model-based method have been studied in [14, 15] for MJS.
was used to reconfigure the controller to achieve fault tol- Due to the variety of failures and different mission re-
erance [4]. Generally, the FTC schemes can be classified quirements, it is difficult to design reconfiguration policies

Manuscript received May 9, 2018; revised October 25, 2018; accepted December 11, 2018. Recommended by Associate Editor Niket
Kaisare under the direction of Editor Jay H. Lee. This work is supported by Natural Science Foundation of China (Grant No.61673206), 13th
Five-Year Equipment Pre Research Projects of China (Grant No.30501050403), the Fundamental Research Funds for the Central Universi-
ties of China (Grant No.NZ2016111), and Science and Technology on Space Intelligent Control Laboratory of Beijing (No. ZDSYS-2017-01).

Yue-Hua Cheng and Huan Li are with the College of Astronautics Engineering, Nanjing University of Aeronautics & Astronautics, Nanjing
210016, P. R. China (e-mails: chengyuehua@nuaa.edu.cn, verccino@live.com). Bin Jiang is with the College of Automation Engineering,
Nanjing University of Aeronautics & Astronautics, Nanjing 211106, P. R. China (e-mail: binjiang@nuaa.edu.cn). Xiao-Dong Han is with
China Academy of Space Technology, Beijing 100094, P. R. China (e-mail: willingdong@163.com).
* Corresponding author.

⃝ICROS,
c KIEE and Springer 2019
On-orbit Reconfiguration Using Adaptive Dynamic Programming for Multi-mission-constrained Spacecraft Attitude ...823

in advance. For this reason, the reconfiguration decision- novel way to autonomously gain ACS reconfiguration pol-
making of ACS needs to be developed to meet the require- icy under multi-mission constraints. The idea of a Q-
ment of online learning and optimization autonomously. learning-based algorithm is adopted, and an ADP algo-
Since the pros and cons of the reconfiguration pol- rithm for on-orbit reconfiguration decisions that is based
icy are reflected in the subsequent mission execution, on a dual optimization index is proposed. Moreover, two
reinforcement-learning algorithm, which is with the fea- neural networks are constructed to approximate the itera-
tures of online learning and delayed reward, provides tive performance index function, total mission reward and
a feasible approach for reconfiguration policy decision- total energy consumption, respectively, to implement the
making. Recently, a few researchers have attempted to iterative ADP algorithm and obtain the optimal reconfigu-
introduce reinforcement learning (RL) methods [16] into ration policy.
intelligent control [17]. RL refers to an actor or agent This paper is organized as follows: Section 2 de-
that interacts with its environment and modifies its actions scribes the problem and constructs the on-orbit reconfig-
or control policies based on information received in re- uration decision framework. The reconfiguration model
sponse to its actions [18, 19]. As an application of RL, under multi-mission constraints is established in Section
adaptive dynamic programming (ADP) , which is used to 3, where two performance indices, total mission reward
solve the dimension explode of HJB equation, is becom- and total energy cost, are defined for the reconfiguration
ing a promising research field for intelligent control [20]. policy based on the analysis of system state transition pro-
A large number of ADP-based work are done by D.liu and cesses. Section 4 presents an estimation method for the
Q.Wei: Both the policy iteration ADP method in [21] and action-related variables. In Section 5, the optimal recon-
the novel Q-learning based policy iteration ADP algorithm figuration strategy based on the discrete HJB equation is
in [22] are used to obtain the iterative control law for the obtained by Bellman’s principle of optimality. Then, bi-
discrete-time optimal control of nonlinear systems; [23] objective adaptive dynamic programming is introduced to
is concerned with a novel generalized value iteration ADP solve the HJB equation to achieve an optimal reconfigu-
algorithm, where neural networks are used to approximate ration policy. Case simulations demonstrate the effective-
the iterative performance index function and compute the ness of the proposed method in Section 6, and Section 7
iterative control policy, respectively, to implement the it- concludes the paper.
erative ADP algorithm and solve optimal tracking con-
trol problem of discrete-time nonlinear systems; Based 2. PROBLEM DESCRIPTION
on [23], the work done in [24] introduced a nonquadratic
performance function to overcome the saturation nonlin- Reconfiguration policy of the ACS of a spacecraft can
earity in actuators, and two neural networks were used to affect the implementation of the space mission. To bring
approximate control vectors and performance index func- spacecraft into full play and increase space mission re-
tion. A new Q-learning algorithm is proposed in [25] for ward, the mission requirements should be taken into con-
solving the LQR problem and [26] is about an ADP-based sideration in the design of ACS reconfiguration policies.
control for MIMO system. To the best of our knowl- We analyze the constraints and reconfiguration process of
edge, there are still no discussions focused on the ADP ACS, and then construct the on-orbit reconfiguration de-
algorithms for spacecraft reconfiguration considering sys- cision framework.
tem configuration, task requirements, and constraints, to
achieve autonomous optimal decision, which motives our 2.1. Mission constraints
research. In this paper, the following multi-mission constraints
The operating state and working mode of spacecraft un- are considered.
der failures are different, therefore the decision-making
a) Mission time window constraint
of the control system should also be treated differently
In most cases, a space mission is carried out during a
[27]. What’s more, mission requirements, specific run-
specified time window. To implement a mission, the ACS
ning states, and the work mode for a spacecraft in the
is supposed to bring the spacecraft to a required attitude
case of a fault should be taken into account when mak-
before the time window closes. Suppose two missions are
ing on-orbit decisions for a fault-tolerant controller [28].
scheduled as shown in Fig. 1, where the time windows
For on-orbit spacecraft failure under the demand of multi-
of the two missions are [t1 ,t2 ] and [t3 ,t4 ] , respectively.
task sequences, how to balance the energy consumption of
Suppose that the fault occurs at t0 and tR is the time needed
the reconfiguration process and the attitude pointing accu-
to reconfigure the post-fault ACS, then:
racy requirements under the mission constraints, and seek
{
reasonable and feasible on-orbit reconfiguration policies t0 + tR < t1 , t2 + tR < t3 ,
to guarantee mission implementation as much as possible,
t0 < t0 + tR < t3 , if Γ1 is cancelled due to fault.
is a difficult problem that need to be solved at present.
Motivated by this requirement, this paper provides a (1)

You might also like