Ning Cornellgrad 0058F 12204

DATA-DRIVEN OPTIMIZATION UNDER UNCERTAINTY IN THE ERA OF BIG
DATA AND DEEP LEARNING: GENERAL FRAMEWORKS, ALGORITHMS,
AND APPLICATIONS
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Chao Ning
August 2020
© 2020 Chao Ning
DATA-DRIVEN OPTIMIZATION UNDER UNCERTAINTY IN THE ERA OF BIG
DATA AND DEEP LEARNING: GENERAL FRAMEWORKS, ALGORITHMS,
AND APPLICATIONS
Chao Ning, Ph. D.

Cornell University 2020
This dissertation deals with the development of fundamental data-driven optimization
under uncertainty, including its modeling frameworks, solution algorithms, and a wide
variety of applications. Specifically, three research aims are proposed, including data-
driven distributionally robust optimization for hedging against distributional
uncertainties in energy systems, online learning based receding-horizon optimization
that accommodates real-time uncertainty data, and an efficient solution algorithm for
solving large-scale data-driven multistage robust optimization problems.
There are two distinct research projects under the first research aim. In the first related
project, we propose a novel data-driven Wasserstein distributionally robust mixed-
integer nonlinear programming model for the optimal biomass with agricultural waste-
to-energy network design under uncertainty. A data-driven uncertainty set of feedstock
price distributions is devised using the Wasserstein metric. To address computational
challenges, we propose a reformulation-based branch-and-refine algorithm. In the
second related project, we develop a novel deep learning based distributionally robust
joint chance constrained economic dispatch optimization framework for a high
penetration of renewable energy. By leveraging a deep generative adversarial network
(GAN), an f-divergence-based ambiguity set of wind power distributions is constructed
as a ball in the probability space centered at the distribution induced by a generator

neural network. To facilitate its solution process, the resulting distributionally robust
chance constraints are equivalently reformulated as ambiguity-free chance constraints,
which are further tackled using a scenario approach. Additionally, we derive a priori
bound on the required number of synthetic wind power data generated by f-GAN to
guarantee a predefined risk level. To facilitate large-scale applications, we further
develop a prescreening technique to increase computational and memory efficiencies
by exploiting problem structure.
The second research aim addresses the online learning of real-time uncertainty data for
receding-horizon optimization-based control. In the related project, data-driven
stochastic model predictive control is proposed for linear time-invariant systems under
additive stochastic disturbance, whose probability distribution is unknown but can be
partially inferred from real-time disturbance data. The conditional value-at-risk
constraints on system states are required to hold for an ambiguity set of disturbance
distributions. By leveraging a Dirichlet process mixture model, the first and second-
order moment information of each mixture component is incorporated into the
ambiguity set. As more data are gathered during the runtime of controller, the ambiguity
set is updated based on real-time data. We then develop a novel constraint tightening
strategy based on an equivalent reformulation of distributionally robust constraints over
the proposed ambiguity set. Additionally, we establish theoretical guarantees on
recursive feasibility and closed-loop stability of the proposed model predictive control.
The third research aim focuses on algorithm development for data-driven multistage
adaptive robust mixed-integer linear programs. In the related project, we propose a
multi-to-two transformation theory and develop a novel transformation-proximal
bundle algorithm. By partitioning recourse decisions into state and control decisions,
affine decision rules are applied exclusively on the state decisions. In this way, the
original multistage robust optimization problem is shown to be transformed into an

equivalent two-stage robust optimization problem, which is further addressed using a
proximal bundle method. The finite convergence of the proposed solution algorithm is
guaranteed for the multistage robust optimization problem with a generic uncertainty
set. To quantitatively assess solution quality, we further develop a scenario-tree-based
lower bounding technique. The effectiveness and advantages of the proposed algorithm
are fully demonstrated in inventory control and process network planning.

BIOGRAPHICAL SKETCH
Chao Ning grew up in Shanxi, China. He graduated from University of Electronic
Science and Technology of China, China in 2012 with a Bachelor’s degree in
Automation. He received the M.S. degree in Control Science and Engineering from
Tsinghua University, China, in 2015. He joined Professor Fengqi You’s research group
in late 2015 at Northwestern University to pursue a Ph.D. degree. In 2016 summer, he
transferred to Cornell University with Professor You to continue his Ph.D. program. His
research interests include data-driven optimization under uncertainty, learning for
dynamics and control, big data analytics and machine learning, power systems
operations, and renewable energy systems.
vi
ACKNOWLEDGMENTS
First and foremost, I would like to express my sincerest thanks to my advisor, Professor
Fengqi You, for his kind help, constant support, and heartfelt encouragement.
Throughout my PhD study, he spends great effort to help me with how to do high-impact
research, to have fruitful discussions with me on research ideas, and to encourage me to
go through many research challenges. Without his kind help and constant support on
my research, I will never make this PhD. His vision on research directions, broad
knowledge, unbounded energy, and great enthusiasm about research are always ture
inspiration for me and will have a great impact on my future career. I feel greatly proud
and honored to be his student.
I am also thankful for my committee members, Professor Lindsay Anderson and
Professor Oliver Gao, for their kind guidance and help. They have offered me with very
constructive comments and valuable feedbacks to make this dissertation possible.
My thanks also go to all my colleagues and friends in the PEESE group, who has made
my Ph.D. study life wonderful and enjoyable. Dr. Dajun Yue helped me a lot with using
some softwares, drawing high-quality figures, as well as my group presentations. He
also helped me get into the background of batch process scheduling. Dr. Jian Gong was
always the go-to person when I encountered any types of questions in the lab. He helped
me a lot in learning robust optimization and in providing valuable suggestions on
manuscript writting. Dr. Jiyao Gao was always willing to help me and gave constructive
comments on my presentations. Dr. Daniel Garcia helped me a lot by kindly providing
critical comments on my presentations and editing manuscripts. I really learned a lot
from him on biomass process network. Dr. Karson Leperi helped me a lot by kindly
providing critical comments on my presentations. Dr. Chao Shang and I had great
discussions on data-driven optimization. Dr. Inkyu Lee kindly taught me how to draw
vii
fantastic figures with Powerpoint and had great disscusions with me on energy systems.
Xueyu Tian, Wei-Han Chen, Yanqiu Tao, Ning Zhao, Jiwei Yao, Jack Nicoletti, Raaj
Bora, Akshay Ajagekar, Abdulelah Alshehri, and Xiang Zhao, thank you guys for all
your kind help and the wonderful time with me. Most importantly, they are amazing
friends and I will never forget the happy hours we spent together. It was a pleasure to
discuss electric power systems with Haifeng Qiu and learned a lot from our discussions.
Many thanks to Natalia Lujan Juncua for her kind help in the unit commitment project.
Visiting scholars including Dr. Minbo Yang, Dr. Yuting Tang, Dr. Hua Zhou, Dr. Na
Luo, Dr. Zuwei Liao, Dr. Liang Zhao, Dr. Li Sun, and Dr. Runda Jia helped me with
both life and research, and provided me with valuable guindance and suggestions on
future career.
Last but not the least, I want to express my deepest gratitude to my father and mother
for their unconditional love, support, and encourgagement along the way.
viii
TABLE OF CONTENTS
BIOGRAPHICAL SKETCH ......................................................................................... vi

ACKNOWLEDGMENTS ............................................................................................ vii
TABLE OF CONTENTS .............................................................................................. ix
LIST OF FIGURES ...................................................................................................... xii
LIST OF TABLES ...................................................................................................... xvi
INTRODUCTION .......................................................................................................... 1
1.1 Background on optimization under uncertainty............................................... 2
1.2 Existing methods for data-driven optimization under uncertainty ................ 10
1.3 Various types of deep learning techniques and their potentials..................... 29
1.4 Outline of the dissertation .............................................................................. 32
DATA-DRIVEN WASSERSTEIN DISTRIBUTIONALLY ROBUST

OPTIMIZATION FOR BIOMASS WITH AGRICULTURAL WASTE-TO-ENERGY
NETWORK DESIGN UNDER UNCERTAINTY ...................................................... 36
2.1 Introduction .................................................................................................... 36
2.2 Problem statement.......................................................................................... 42
2.3 Mathematical formulation.............................................................................. 47
2.4 Solution methodology .................................................................................... 55
2.5 Case studies.................................................................................................... 60
2.6 Summary ........................................................................................................ 75
2.7 Appendix: Derivation of Wasserstein distributionally robust counterpart .... 77
2.8 Nomenclature ................................................................................................. 80
DEEP LEARNING BASED AMBIGUOUS JOINT CHANCE CONSTRAINED

ECONOMIC DISPATCH UNDER SPATIAL-TEMPORAL CORRELATED WIND
POWER UNCERTAINTY ........................................................................................... 82
3.1 Introduction .................................................................................................... 82
ix
3.2 Mathematical formulation.............................................................................. 87
3.3 Deep learning based ambiguous joint chance constrained economic dispatch
optimization .............................................................................................................. 90
3.4 Solution methodology .................................................................................... 96
3.5 Computational experiments ......................................................................... 100
3.6 Summary ...................................................................................................... 109
3.7 Nomenclature ............................................................................................... 110
ONLINE LEARNING BASED RISK-AVERSE STOCHASTIC MODEL

PREDICTIVE CONTROL OF CONSTRAINED LINEAR UNCERTAIN SYSTEMS
.................................................................................................................................... 112
4.1 Introduction .................................................................................................. 112
4.2 Problem setup and preliminaries.................................................................. 118
4.3 Online learning based risk-averse stochastic MPC...................................... 123
4.4 The theoretical properties of the proposed online learning based risk-averse
stochastic MPC ....................................................................................................... 135
4.5 Numerical examples .................................................................................... 138
4.6 Summary ...................................................................................................... 144
4.7 Appendix A: The derivation of control objective ........................................ 145
4.8 Appendix B. Proof of Theorem 4.1 ............................................................. 147
4.9 Appendix C: Proof of Proposition 4.1 ......................................................... 151
4.10 Appendix D: Proof of Theorem 4.2 ......................................................... 152
4.11 Appendix E: Proof of Theorem 4.3 .......................................................... 154
x
A TRANSFORMATION-PROXIMAL BUNDLE ALGORITHM FOR SOLVING
LARGE-SCALE MULTISTAGE ADAPTIVE ROBUST OPTIMIZATION
PROBLEMS ............................................................................................................... 156
5.1 Introduction .................................................................................................. 156
5.2 The multi-to-two transformation scheme .................................................... 160
5.3 Transformation-proximal bundle algorithm ................................................ 165
5.4 The lower bounding technique .................................................................... 183
5.5 Applications ................................................................................................. 187
5.6 Summary ...................................................................................................... 209
5.7 Appendix: Tables of computational results in Application 1 ...................... 210
5.8 Nomenclature ............................................................................................... 212
CONCLUSIONS ........................................................................................................ 215

REFERENCES ........................................................................................................... 222
xi
LIST OF FIGURES
Figure 1. The data-driven uncertainty model based on the Dirichlet process mixture
model. ........................................................................................................................... 22
Figure 2. The structure of the biomass with agricultural waste-to-energy network
considered in this work. ................................................................................................ 44
Figure 3. Illustrative figure on the biomass with agricultural waste-to-energy network
with the corresponding data-driven Wasserstein DRO model. .................................... 45
Figure 4. The pseudocode of the proposed reformulation-based branch-and-refine
algorithm for solving (WDRO) problem. ..................................................................... 59
Figure 5. The empirical probability distributions of total cost for (a) the stochastic
programming method, (b) the proposed data-driven Wasserstein DRO approach. ..... 64
Figure 6. The optimal bioconversion network design determined by the stochastic
programming approach. The optimal production capacity is displayed under processes.
...................................................................................................................................... 66
Figure 7. The optimal bioconversion network design determined by the data-driven
Wasserstein DRO approach. The optimal production capacity is displayed under
processes. ...................................................................................................................... 67
Figure 8. Cost breakdowns determined by (a) the stochastic programming method, (b)
the proposed data-driven Wasserstein DRO approach. ................................................ 68
Figure 9. Capital cost distributions determined by (a) the stochastic programming
method, (b) the proposed data-driven Wasserstein DRO approach. ............................ 69
Figure 10. Sensitivity analysis of discount rate for the data-driven Wasserstein DRO
approach. ...................................................................................................................... 70
xii
Figure 11. Sensitivity analysis of the in-sample objective value, out-of-sample average
cost, and computational time with different radii of Wasserstein balls. ...................... 71
Figure 12. Upper and lower bounds in each iteration of the reformulation-based branch-
and-refine algorithm for global optimization of the (WDRO) problem in the case study.
...................................................................................................................................... 72
Figure 13. Out-of-sample performance of stochastic programming and the proposed
Wasserstein DRO method based on the testing of 100 uncertainty scenarios. ............ 74
Figure 14. The dependences of the average cost reduction and standard deviation
reduction on the number of testing samples. ................................................................ 75
Figure 15. The schematic of the six-bus system........................................................ 101
Figure 16. The training process of f-GAN................................................................. 102
Figure 17. The cost breakdown of (a) the DRCCED method with moment information,
and (b) the proposed ED approach. ............................................................................ 104
Figure 18. The power dispatch of each conventional generator determined by the
proposed ED approach. .............................................................................................. 105
Figure 19. The spatial correlations of the ten wind farm energy outputs for (a) real wind
power data, and (b) wind power data generated by f-GAN. The color darkness of one
single cell represents the level of spatial correlation coefficient for corresponding two
wind farms. Comparison of spatial correlations can be made by focusing on the darkness
patterns of heat maps. The temporal correlations of WF10 for (c) real wind power data,
and (d) wind power data generated by f-GAN. The level of auto-correlation coefficient
is depicted by bar height. Comparison of spatial correlations can be done by considering
the height of each bar for every time lag. ................................................................... 106
xiii
Figure 20. The empirical distribution of the wind power utilization efficiency for (a)
DRCCED with moment information, and (b) the proposed approach. ...................... 108
Figure 21. The pseudocode of the proposed online-learning based risk-averse stochastic
MPC algorithm. .......................................................................................................... 134
Figure 22. The average computational times of the proposed online learn-ing based
risk-averse stochastic MPC method over 2,000 time steps. ....................................... 141
Figure 23. (a): The closed-loop trajectories of system states for the proposed online
learning based risk-averse stochastic MPC with 100 realizations of disturbance
sequences, (b): The zoom-in view of state trajectories near the upper limit of x(2). . 143
Figure 24. The online adaption of constraint tightening parameters in the proposed MPC
for time-varying disturbance distribution in a simulation. ......................................... 144
Figure 25. The pseudocode of transformation-proximal bundle. .............................. 172
Figure 26. Inventory profiles determined by different control policies under the worst-
case uncertainty realization. ....................................................................................... 191
Figure 27. Cost breakdowns determined by (a) the affine control policy, (b) the
proposed control policy. ............................................................................................. 193
Figure 28. Lower bounds of multi-period inventory cost determined by the proposed
method and the data-driven approach......................................................................... 193
Figure 29. The impacts of the number of uncertainty scenarios on the generated lower
bound of the original multistage ARO problem and computational time in the data-
driven approach. ......................................................................................................... 194
Figure 30. The schematic of a small-scale process network. ..................................... 196
xiv
Figure 31. The optimal design and planning decisions at the end of the planning horizon
determined by (a) the affine decision rule method, and (b) the transformation-proximal
bundle algorithm. ........................................................................................................ 201
Figure 32. Optimal capacity expansion decisions over the entire planning horizon
bundle algorithm. ........................................................................................................ 202
Figure 33. Revenues and cost break down determined by the affine decision rule method
and the transformation-proximal bundle algorithm. ................................................... 203
Figure 34. The schematic of a large-scale petrochemical process network where
chemical names are listed. .......................................................................................... 204
Figure 35. Revenues and cost break down at each time period determined by the affine
decision rule method (denoted by ADR in the figure) and the transformation-proximal
bundle algorithm (denoted by TPB in the figure). ..................................................... 206
determined the transformation-proximal bundle algorithm. ...................................... 207
Figure 37. Optimal feedstock purchase at each time stage determined by (a) the affine
decision rule method, and (b) the transformation-proximal bundle algorithm. ......... 208
Figure 38. Spider charts showing optimal sale quantities (kt/y) of final products at each
time stage determined by (a) the affine decision rule method, and (b) the transformation-
proximal bundle algorithm. ........................................................................................ 209
xv
LIST OF TABLES
Table 1. Comparisons of problem sizes and computational results of the deterministic
optimization, the conventional stochastic programming method and the data-driven
Wasserstein DRO approach. ......................................................................................... 63
Table 2. The out-of-sample performance of the deterministic optimization, the
conventional stochastic programming method and the data-driven Wasserstein DRO
approach. ...................................................................................................................... 64
conventional stochastic programming method and the data-driven WDRO approach
when the number of training data N=100. .................................................................... 74
Table 4. Comparisons of problem sizes and computational results for the DRCCED
method with moment information and the proposed ED method with/without
prescreening in six-bus test system. ........................................................................... 103
prescreening in IEEE 118 bus system. ....................................................................... 108
Table 6. Mass balance relationships for different processes. ..................................... 197
Table 7. Computational results of different methods in the process network planning
application. ................................................................................................................. 200
Table 8. Computational performances of different solution algorithms in the multistage
robust inventory control problem under demand uncertainty for T=5. ...................... 210
robust inventory control problem under demand uncertainty for T=10. .................... 211
xvi
robust inventory control problem under demand uncertainty for T=15. .................... 212
xvii
CHAPTER 1
INTRODUCTION
Optimization applications abound in many areas of science and engineering [1-3]. In
real practice, some parameters involved in optimization problems are subject to
uncertainty due to a variety of reasons, including estimation errors and unexpected
disturbance [4]. Such uncertain parameters can be product demands in process planning
[5], kinetic constants in reaction-separation-recycling system design [6], and task
durations in batch process scheduling [7], among others. The issue of uncertainty could
unfortunately render the solution of a deterministic optimization problem (i.e. the one
disregarding uncertainty) suboptimal or even infeasible [8]. The infeasibility, i.e. the
violation of constraints in optimization problems, has a disastrous consequence on the
solution quality. Motivated by the practical concern, optimization under uncertainty has
attracted tremendous attention from both academia and industry [4, 9-11].
In the era of big data and deep learning, intelligent use of data has a great potential to
benefit many areas. Although there is no rigorous definition of big data [12], people
typically characterize big data with five Vs, namely, volume, velocity, variety, veracity
and value [13]. Torrents of data are routinely collected and archived in process
industries, and these data are becoming an increasingly important asset in process
control, operations and design [14-18]. Nowadays, a wide array of emerging machine
learning tools can be leveraged to analyze data and extract accurate, relevant, and useful
information to facilitate knowledge discovery and decision-making. Deep learning, one
of the most rapidly growing machine learning subfields, demonstrates remarkable
power in deciphering multiple layers of representations from raw data without any
1
domain expertise in designing feature extractors [19]. More recently, dramatic progress
of mathematical programming [20], coupled with recent advances in machine learning
[21], especially in deep learning over the past decade [22], sparks a flurry of interest in
data-driven optimization [23-36]. In the data-driven optimization paradigm, uncertainty
model is formulated based on data, thus allowing uncertainty data “speak” for
themselves in the optimization algorithm. In this way, rich information underlying
uncertainty data can be harnessed in an automatic manner for smart and data-driven
decision making.
In this chapter, we summarize and classify the existing contributions of data-driven
optimization under uncertainty, highlight the current research trends, point out the
research challenges, and introduce promising methodologies that can be used to tackle
these challenges. We briefly review conventional mathematical programming
techniques for hedging against uncertainty, alongside their wide spectrum of
applications in Process Systems Engineering (PSE). We then summarize the existing
research papers on data-driven optimization under uncertainty and classify them into
four categories according to their unique approach for uncertainty modeling and distinct
optimization structures. Based on the literature survey, we identify three promising
research directions on optimization under uncertainty in the era of big data and deep
learning and highlight respective research challenges and potential methodologies.
1.1 Background on optimization under uncertainty
In recent years, mathematical programming techniques for decision making under
uncertainty have gained tremendous popularity among the PSE community, as
2
witnessed by various successful applications in process synthesis and design [10, 37],
production scheduling and planning [7, 38], and process control [35, 39-42]. In this
section, we present some background knowledge of methodologies for optimization
under uncertainty, along with computational algorithms and applications in PSE.
Specifically, we briefly review three leading modeling paradigms for optimization
under uncertainty, namely stochastic programming, chance-constrained programming,
and robust optimization. For extensive and detailed surveys in the field of conventional
optimization under uncertainty methods, we refer the reader to the previous reviews on
this subject [43, 44].
1.1.1 Stochastic programming
Stochastic programming is a powerful modeling paradigm for decision making under
uncertainty that aims to optimize the expected objective value across all the uncertainty
realizations [45]. The key idea of the stochastic programming approach is to model the
randomness in uncertain parameters with probability distributions [46]. For instance,
product demands are assumed to follow normal distributions in stochastic
programming-based supply chain model [47]. In general, the stochastic programming
approach can effectively accommodate decision making processes with various time
stages. In single-stage stochastic programs, there are no recourse variables and all the
decisions must be made before knowing uncertainty realizations. By contrast, stochastic
programming with recourse can take corrective actions after uncertainty is revealed.
Among the stochastic programming approach with recourse, the most widely used one
3
is the two-stage stochastic program, in which decisions are partitioned into “here-and-
now” decisions and “wait-and-see” decisions.
The general mathematical formulation of a two-stage stochastic programming problem
is given as follows [45].
min c T x   Q  x,   
x X (1.1)
s.t. Ax  d
The recourse function Q(x, ω) is defined by,
Q  x,    min b   y  
T
y  Y
(1.2)
s.t. W   y    h    T   x
where x represents first-stage decisions made “here-and-now” before the uncertainty ω
is realized, while the second-stage decisions y are postponed in a “wait-and-see” manner
after observing the uncertainty realization. The objective of the two-stage stochastic
programming model includes two parts: the first-stage objective cTx and the expectation
of the second-stage objective b(ω)Ty(ω). The constraints associated with the first-stage
decisions are Ax  d, x  X , and the constraints of the second-stage decisions are
W   y    h    T   x and y    Y . Sets X and Y can include nonnegativity,
continuity or integrality restrictions.
The resulting two-stage stochastic programming problem is computationally expensive
to solve because of the growth of computational time with the number of scenarios. To
this end, decomposition based algorithms have been developed in the existing literature,
including Benders decomposition or the L-shaped method [48, 49], and Lagrangean
decomposition [50]. The location of binary decision variables is critical for the design
of computational algorithms. For stochastic programs with integer recourse, the

4
expected recourse function is no longer convex, and even discontinuous, thus hindering
the employment of conventional L-shaped method. As a result, research efforts have
made on computational algorithms for efficient solution of two-stage stochastic mixed-
integer programs [51], such as Lagrangian relaxation [52], branch-and-bound scheme
[53], and an improved L-shaped method [54].
Stochastic programming has demonstrated various applications in PSE, such as design
and operation of batch processes [55-57], optimization of flow sheets [58], energy
systems [59, 60], and supply chain management [61-64]. Due to its wide applicability,
immense research efforts have been made on the variants of stochastic programming
approach. For instance, the two-stage formulation in (1.1) can be readily extended to a
multi-stage stochastic programming setup by utilizing scenario trees. Other extensions
include stochastic nonlinear programming [65], and stochastic programs with
endogenous uncertainties [66, 67].
1.1.2 Chance constrained optimization
As another powerful paradigm for optimization under uncertainty, chance constrained
programming aims to optimize an objective while ensuring constraints to be satisfied
with a specified probability in uncertain environment [68]. As in the stochastic
programming approach, probability distribution is the key uncertainty model to capture
the randomness of uncertain parameters in chance constrained optimization. The chance
constrained program was first introduced in the seminal work of [69], and attracted
considerable attention ever since. Such chance constraints or probabilistic constraints
5
are flexible enough to quantify the trade-off between objective performance and system
reliability [70].
The generic formulation of a chance constrained optimization problem is presented as
follows,
min f  x 
xX
(1.3)
 
s.t.  ξ   G  x, ξ   0  1  
where x represents the vector of decision variables, X denotes the deterministic feasible
region, f is the objective function to be minimized, ξ is a random vector following a
known probability distribution  with the support set Ξ, G   g1 , , g m  represents a
constraint mapping, 0 is a vector of all zeros, and parameter ε is a pre-specified risk
level.
 
The chance constraint  ξ   G  x, ξ   0  1   guarantees that decision x satisfies
constraints with a probability of at least 1−ε. Note that when the number of constraints
m=1, the above optimization model is an individual chance constrained program; for
m>1, it is called joint chance constrained program [71]. A salient merit of chance
constrained programs is that it allows decision makers choose their own risk levels for
the improvement in objectives. To model sequential decision-making processes, two-
stage chance constrained optimization with recourse was recently studied and had
various applications [72, 73].
Despite of its promising modeling power, the resulting chance constrained program is
generally computationally intractable for the following two main reasons. First,
calculating the probability of constraint satisfaction for a given x involves a multivariate
6
integral, which is believed to be computationally prohibitive. Second, the feasible
region is not convex even if set X is convex and G(x, ξ) is convex in x for any
realizations of uncertain vector ξ [68]. In light of these computational challenges, a large
body of related literature is devoted into the development of solution algorithms for
chance constrained optimization problems, such as sample average approximation [74],
sequential approximation [75, 76], and convex conservative approximation schemes
[77]. Note that chance constrained programs admit convex reformulation for some very
special cases. For example, individual chance constrained programs are endowed with
tractable convex reformulations for normal distributions [45]. Chance constraints with
right-hand-side uncertainty are convex if uncertain parameters are independent and
follow log-concave distributions [68].
In the PSE community, chance constraints are usually employed for customer demand
satisfaction, product quality specification, service level, and reliability level of chemical
processes [78-81]. Due to its practical relevance, chance constrained optimization has
been applied in numerous applications, including model predictive control [82, 83],
process design and operation [84], refinery blend planning [85], biopharmaceutical
manufacturing [86], and supply chain planning problem [87-90].
1.1.3 Robust optimization
As a promising alternative paradigm, robust optimization does not require accurate
knowledge on probability distributions of uncertain parameters [91-94]. Instead, it
models uncertain parameters using an uncertainty set, which includes possible
uncertainty realizations. It is worth noting that uncertainty set is a paramount ingredient
7
in robust optimization framework [94]. Given a specific uncertainty set, the idea of
robust optimization is to hedge against the worst case within the uncertainty set. The
worst-case uncertainty realization is defined based on different contexts: it could be the
realization giving rise to the largest constraint violation, the realization leading to the
lowest asset return [95] or the one resulting in the highest regret [96].
The conventional box uncertainty set is not a good choice since it includes the unlikely-
to-happen scenario where uncertain parameters simultaneously increase to their highest
values. The conventional box uncertainty set is defined as follows [97].
U box  u uiL  ui  uUi , i (1.4)
where Ubox is a box uncertainty set, u is a vector of uncertain parameters, ui is the i-th
component of uncertainty vector u. uiL and uiU represent the lower bound and the upper
bound of uncertain parameter ui, respectively. Box uncertainty set simply defines the
range of each uncertain parameter in vector u. One cannot easily control the size of this
uncertainty set to meet his or her risk-averse attitude. To this end, researchers propose
the following budgeted uncertainty set [93].
 
U budget  u ui  ui  ui  zi ,  1  zi  1,

z
i
i  , i 

(1.5)
where Ubudget denotes a budgeted uncertainty set, u and ui have the same definitions as
in (1.4), ui is the nominal value of ui, ui is the largest possible deviation of uncertain
parameter ui, zi denotes the extent and direction of parameter deviation, and Γ is an
uncertainty budget.
Traditional robust optimization approaches, also known as static robust optimization
[98], make all the decisions at once. This modeling framework cannot well represent
8
sequential decision-making problems [5, 99-105]. Adaptive robust optimization (ARO)
was proposed to offer a new paradigm for optimization under uncertainty by
incorporating recourse decisions [106]. Due to the flexibility of adjusting recourse
decisions after observing uncertainty realizations, ARO typically generates less
conservative solutions than static robust optimization [105, 107-109]. The general form
of a two-stage adaptive robust mixed-integer programming model is given as follows:
min cT x  max min bT y

x uU y ( x ,u )
s.t. Ax  d, x  Rn1  Z n2 (1.6)

 
  x, u   y  Rn3 : Wy  h  Tx  Mu
where x is the first-stage decision made before uncertainty u is realized, while the
second-stage decision y is postponed in a “wait-and-see” manner. x includes both
continuous and integer variables, while y only includes continuous variables. c and b
are the vectors of the cost coefficients. U is an uncertainty set that characterizes the
region of uncertainty realizations.
Besides the two-stage ARO framework, the multistage ARO method has attracted
immense attention due to its unique feature in reflecting sequential realizations of
uncertainties over time [110, 111]. In multistage ARO, decisions are made sequentially,
and uncertainties are revealed gradually over stages. Note that the additional value
delivered by ARO over static robust optimization is its adjustability of recourse
decisions based on uncertainty realizations [106]. Accordingly, the multistage ARO
method has demonstrated applications in process scheduling and planning [104, 105,
112].
9
Despite popularity of the above three leading paradigms for optimization under
uncertainty, these approaches have their own limitations and specific application
scopes. To this end, research efforts have been made on “hybrid” methods that leverage
the synergy of different optimization approaches to inherit their corresponding strengths
and complement respective weaknesses [113-120]. For instance, stochastic
programming was integrated with robust optimization for supply chain design and
operation under multi-scale uncertainties [114]. Robust chance constrained optimization
along with global solution algorithms were developed and applied to process design
under price and demand uncertainties [120].
1.2 Existing methods for data-driven optimization under uncertainty
In this section, we review the recent advances in optimization under uncertainty in the
era of big data and deep learning. Recent years have witnessed a rapidly growing
number of publications on data-driven optimization under uncertainty, an active area
integrating machine learning and mathematical programming. These publications cover
various topics and can be roughly classified into four categories, namely data-driven
stochastic program, data-driven chance constrained program, data-driven robust
optimization, and data-driven scenario-based optimization. Unlike the conventional
mathematical programming techniques, these data-driven approaches do not presume
the uncertainty model is perfectly given a priori, rather they all focus on the practical
setting where only uncertainty data are available.
10
1.2.1 Data-driven stochastic program and distributionally robust
optimization
The literature review of data-driven stochastic program, also known as distributionally
robust optimization (DRO), is presented in detail in this subsection. The motivation of
this emerging paradigm on data-driven optimization under uncertainty is first presented,
followed by its model formulation. In this modeling paradigm, the uncertainty is
modeled via a family of probability distributions that well capture uncertainty data on
hand. This set of probability distributions is referred to as ambiguity set. We then present
and analyze various types of ambiguity sets alongside their corresponding strengths and
weaknesses. Finally, the extension of DRO to the multistage decision-making setting is
also discussed, as well as their recent applications in PSE.
In the stochastic programming approach, it is assumed that the probability distribution
of uncertain parameters is perfectly known. However, such precise information of the
uncertainty distribution is rarely available in practice. Instead, what the decision maker
has is a set of historical and/or real-time uncertainty data and possibly some prior
structure knowledge of the probability. Moreover, the assumed probability in
conventional stochastic programming might deviate from the true distribution.
Therefore, relying on a single probability distribution could result in sub-optimal
solutions, or even lead to the deterioration in out-of-sample performance [121].
Motivated by these weaknesses of stochastic programming, DRO emerges as a new
data-driven optimization paradigm which hedges against the worst-case distribution in
an ambiguity set. Rather than assuming a single uncertainty distribution, the DRO
approach constructs an uncertainty set of probability distributions from uncertainty data

11
through statistical inference and big data analytics. In this way, DRO is capable of
hedging against the distribution errors, and accounts for the input of uncertainty data.
The general model formulation of data-driven stochastic programming is presented as
follows [122].
min max   l  x ,ξ (1.7)

xX 
where x is the vector of decision variables, X is the feasible set, l is the objective
function, and ξ represents a random vector whose probability distribution  is only
known to reside in an ambiguity set  . The DRO approach aims for optimal decisions
under the worst-case distribution, and as a result offers performance guarantee over the
family of distributions.
The DRO or data-driven stochastic optimization framework enjoys two salient merits
compared with the conventional stochastic programming approach. First, it allows the
decision maker to incorporate partial distribution information learned from uncertainty
data into the optimization. As a result, the data-driven stochastic programming approach
greatly mitigates the issue of optimizer’s curse and improves the out-of-sample
performance. Second, data-driven stochastic programming inherits the computational
tractability from robust optimization and some resulting problems can be solved exactly
in polynomial time without resorting to the approximation scheme via sampling or
discretization. For example, optimization problem (1.7) for a convex program with
continuous variables and a moment-based ambiguity set is proved to be solvable in
polynomial time [122].
The choice of ambiguity sets plays a critical role in the performance of DRO. When
choosing ambiguity set, the decision maker need to consider the following three factors,
12
namely tractability, statistical meaning, and performance [123]. First, the data-driven
stochastic programming problem with the ambiguity set should be computationally
tractable, meaning the resulting optimization could be formulated as linear, conic
quadratic or semidefinite programs. Second, the derived ambiguity set should have clear
statistical meaning. Therefore, various ways of constructing ambiguity sets based on
uncertainty data were extensively studied [122, 124, 125]. Third, the devised ambiguity
set should be tight to increase the performance of resulting decisions.
One commonly used approach to constructing ambiguity set is moment-based
approaches, in which first and second order information is extracted from uncertainty
data using statistical inference [126]. The ambiguity set that specifies the support, first
and second moment information is shown as follows,
 
    Ξ  1 
 
            (1.8)
 
            
T
  
 
where ξ represents the uncertainty vector, Ξ is the support,  represents the probability
distribution of ξ,  denotes the set of all probability measures,   denotes the
expectation with respect to distribution  . Parameters μ and Σ represent the mean vector
and covariance matrix estimated from uncertainty data, respectively.
The ambiguity set in (1.8) fails to account for the fact that the mean and covariance
matrix are also subject to uncertainty. To this end, an ambiguity set was proposed based
on the distribution’s support information as well as the confidence regions for the mean
and second-moment matrix in the work of [122]. The resulting DRO problem could be
solved efficiently in polynomial time.

13
    Ξ  1 
 
 
                        1 
T 1
(1.9)
 
            2 
T
   
 
where ξ represents the uncertainty vector, Ξ is the support,  represents the probability
distribution of ξ. The equality constraint    Ξ   1 enforces that all uncertainty
realizations reside in the support set Ξ. Parameters ψ1 and ψ2 are used to define the sizes
of confidence regions for the first and second moment information, respectively.
The moment-based ambiguity sets typically enjoy the advantage of computational
tractability. For example, DRO with the ambiguity set based on principal component
analysis and first-order deviation functions was developed [125]. Additionally, the
computational effectiveness of this data-driven DRO method was demonstrated via
process network planning and batch production scheduling [125]. Recently, a data-
driven DRO model was developed for the optimal design and operations of shale gas
supply chains to hedge against uncertainties associated with shale well estimated
ultimate recovery and product demand [127]. However, the moment-based ambiguity
set is not guaranteed to converge to the true probability distribution as the number of
uncertainty data goes to infinity. Consequently, this type of ambiguity set suffers from
the conservatism with moderate uncertainty data. To address the above issue with
moment-based methods, ambiguity sets based on statistical distance between
probability distributions were developed, as shown below,

     d  , 0     (1.10)
14
where  is the probability distribution of uncertain parameters, 0 represents the
reference distribution such as the empirical distribution, d denotes some statistical
distance between two distributions, and θ stands for the confidence level.
Ambiguity set in (1.10) can be further classified based on the adopted distance metric,
such as Kullback-Leibler divergence [128] and Wasserstein distance [124]. For
example, a DRO model was proposed for lot-sizing problem, in which the chi-square
goodness-of-fit test and robust optimization were combined. The ambiguity set of
demand was constructed from uncertainty data by using a hypothesis test in statistics,
called the chi-square goodness-of-fit test [129]. This set is well defined by linear
constraints and second order cone constraints. It is worth noting that the input of their
model is histograms, which make it possible to use a finite dimensional probability
vector to characterize the distribution. The adopted statistic belonged to the phi-
divergences, which motivated researchers to construct distribution uncertainty set by
using the phi-divergences [130].
To account for the sequential decision-making process, researchers recently developed
the adaptive DRO method by incorporating recourse decision variables [131, 132]. A
general two-stage data-driven stochastic programming model is presented in the
following form:
min cT x  max   Q  x ,ξ

x X 
s.t. Ax  d (1.11)
min b ξ yT

Q  x,ξ   yY
 s.t. T ξ x  W ξ y  h ξ
15
where x presents the vector of first-stage decision variables that need to be determined
before observing uncertainty realizations, y denotes the vector of second-stage decision
variables that can be adjustable based on the realized uncertain parameters ξ, sets X and
Y can include nonnegativity, continuity or integrality restrictions, and Q represents the
recourse function. The objective of the above data-driven stochastic program is to
minimize the worst-case expected cost with respect to all possible uncertainty
distributions  within the ambiguity set  . Based on the literature, multistage data-
driven DRO is becoming a rapidly evolving research direction.
Data-driven stochastic programming has several salient merits over the conventional
stochastic programming approach. However, based on the existing literature, there are
few papers on its PSE applications [125, 127]. In real world applications, the trend of
big data has fueled the increasing popularity of data-driven stochastic programming in
many areas, especially in power systems operation. Recently, DRO emerges as a new
data-driven optimization paradigm which hedges against the worst-case distribution in
an ambiguity set, and has various applications in power systems, such as unit
commitment problems [133-136], and optimal power flow [137, 138].
1.2.2 Data-driven chance constrained program
In contrast to the data-driven stochastic programming approach, data-driven chance
constrained programming is another paradigm focusing on chance constraint
satisfaction under the worst-case probability instead of optimizing the worst-case
expected objective. Although both data-driven chance constrained program and DRO
adopt ambiguity sets in the uncertainty models, they have distinct model structures.
16
Specifically, data-driven chance constrained program features constraints subject to
uncertainty in probability distributions, while DRO typically only involves the worst-
case expectation of an objective function with respect to a family of probability
distributions. The chance constrained programming approach assumes the complete
distribution information is perfectly known. However, the decision maker only has
access to a finite number of uncertainty realizations or uncertainty data. On one hand,
such complete knowledge of distribution is usually estimated from limited number of
uncertainty data or obtained from expert knowledge. On the other hand, even if the
probability distribution is available, the chance constrained program is computationally
cumbersome. In practice, one can only have partial information on the probability
distribution of uncertainty. Therefore, data-driven chance constrained optimization
emerges as another paradigm for hedging against uncertainty in the era of big data.
The general form of data-driven chance constrained program is given by,
min f  x 
x X
(1.12)
 
s.t. min  ξ   G  x, ξ   0  1  

where x represents the vector of decision variables, X denotes the deterministic feasible
region, f is the objective function, ξ is a random vector following a probability
distribution  that belongs to an ambiguity set  . G   g1 , , g m  represents a
constraint mapping, 0 is a vector of all zeros, and parameter ε is a pre-specified risk
level. The data-driven chance constraints enforce classical chance constraints to be
satisfied for every probability distribution within the ambiguity set.
The computational tractability of the resulting data-driven chance constrained program
can vary depending on both the ambiguity sets and the structure of the optimization
17
problem. In the following, we summarize the relevant papers according to the adopted
uncertainty set of distributions and optimization structures.
Distributionally robust individual linear chance constraints under the ambiguity set
comprised of all distributions sharing the same known mean and covariance were
reformulated as convex second-order cone constraints [126]. The deterministic convex
conditions to enforce distributionally robust chance constraints were provided under
distribution families of (a) independent random variables with box-type support and (b)
radially symmetric non-increasing distributions over the orthotope support. The worst-
case conditional value-at-risk (CVaR) approximation for distributionally robust joint
chance constraints was studied assuming first and second moment [139], and the
resulting conservative approximation can be cast as semidefinite program. In addition
to moment information, a specific structural information of distributions called
unimodality was incorporated into the ambiguity set, and the corresponding ambiguous
risk constraints were reformulated as a set of second second-order cone constraints
[140]. Instead of assuming unimodality of distributions, data-driven robust individual
chance constrained programs along with convex approximations were recently
developed using a mixture distribution-based ambiguity set with fixed component
distribution and uncertain mixture weights [141].
In real world applications, exact moment information can be challenging to obtain, and
can only be estimated through confidence intervals from uncertainty realizations [122].
To accommodate this moment uncertainty, attempts were made in the context of
distributionally robust chance constraints, including constructing convex moment
ambiguity set [142], employing Chebyshev ambiguity set with bounds on second-order
18
moment [143], characterizing a family of distributions with upper bounds on both mean
and covariance [144]. Ambiguous joint chance constraints were studied where the
ambiguity set was characterized by the mean, convex support, and an upper bound on
the dispersion [145], and the resulting constraints were conic representable for right-
hand-side uncertainty. In addition to generalized moment bounds [146], structural
properties of distributions, such as symmetry, unimodality, multimodality and
independence, were further integrated into distributionally robust chance constrained
programs leveraging a Choquet representation [123]. Nonlinear extensions of
distributionally robust chance constraints were made under the ambiguity sets defined
by mean and variance [147], convex moment constraints [148], mean absolute deviation
[149], and a mixture of distributions [150].
Although moment-based ambiguity sets achieve certain success, they do not converge
to the true probability distribution as the number of available uncertainty data increases.
Consequently, the resulting data-driven chance-constrained programs tend to generate
conservative solutions. To this end, data-driven chance-constrained programs with
distance-based ambiguity set were proposed to alleviate the undesirable consequence of
moment-based data-driven chance-constrained programs. The ambiguity set defined by
the Prohorov metric was introduced into the distributionally robust chance constraints,
and the resulting optimization problem was approximated by using robust sampled
problem [151]. Distributionally robust chance constraints with the ambiguity set
containing all distributions close to a reference distribution in terms of Kullback-Leibler
divergence were cast as classical chance constraints with an adjusted risk level [128].
Data-driven chance constrained programs with ϕ-divergence based ambiguity set were
19
proposed [152], and further extensions were made using the kernel smoothing method
[27, 31]. Recently, data-driven chance constraints over Wasserstein balls were exactly
reformulated as mixed-integer conic constraints [153, 154]. Leveraging the strong
duality result [155], distributionally robust chance constrained programs with
Wasserstein ambiguity set were studied for linear constraints with both right and left
hand uncertainty [156], as well as for general nonlinear constraints [157].
Data-driven chance constrained programs have successful applications in a number of
areas, such as power system [158], stochastic control [159], and vehicle routing problem
[160].
1.2.3 Data-driven robust optimization
As a paramount ingredient in robust optimization, uncertainty sets endogenously
determine robust optimal solutions and therefore should be devised with special care.
However, uncertainty sets in the conventional robust optimization methodology are
typically set a priori using a fixed shape and/or model without providing sufficient
flexibility to capture the structure and complexity of uncertainty data. For example, the
geometric shapes of uncertainty sets in (1.4) and (1.5) do not change with the intrinsic
structure and complexity of uncertainty data. Furthermore, these uncertainty sets are
specified by finite number of parameters, thereby having limited modeling flexibility.
Motivated by this knowledge gap, data-driven robust optimization emerges as a
powerful paradigm for addressing uncertainty in decision making.
A data-driven ARO framework that leverages the power of Dirichlet process mixture
model was proposed [32]. The data-driven approach for defining uncertainty set was
20
developed based on Bayesian machine learning. This machine learning model was then
integrated with the ARO method through a four-level optimization framework. This
developed framework effectively accounted for the correlation, asymmetry and
multimode of uncertainty data, so it generated less conservative solutions. Its salient
feature is that multiple basic uncertainty sets are used to provide a high-fidelity
description of uncertainties. Although the data-driven ARO has a number of attractive
features, it does not account for an important evaluation metric, known as regret, in
decision-making [161]. Motivated by the knowledge gap, a data-driven bi-criterion
ARO framework was developed that effectively accounted for the conventional
robustness as well as minimax regret [162].
In some applications, uncertainty data in large datasets are usually collected under
multiple conditions. A data-driven stochastic robust optimization framework was
proposed for optimization under uncertainty leveraging labeled multi-class uncertainty
data [163]. Machine learning methods including Dirichlet process mixture model and
maximum likelihood estimation were employed for uncertainty modeling, which is
illustrated in Figure 1. This framework was further proposed based on the data-driven
uncertainty model through a bi-level optimization structure. The outer optimization
problem followed the two-stage stochastic programming approach, while ARO was
nested as the inner problem for maintaining computational tractability.
21
Figure 1. The data-driven uncertainty model based on the Dirichlet process mixture
model.
To mitigate computational burden, research effort has been made on convex polyhedral
data-driven uncertainty set based on machine learning techniques, such as principal
component analysis and support vector clustering. A data-driven robust optimization
framework that leveraged the power of principal component analysis and kernel
smoothing for decision-making under uncertainty was studied [34]. In this approach,
correlations between uncertain parameters were effectively captured, and latent
uncertainty sources were identified by principal component analysis. To account for
asymmetric distributions, forward and backward deviation vectors were utilized in the
22
uncertainty set, which was further integrated with robust optimization models. A data-
driven static robust optimization framework based on support vector clustering that aims
to find the hypersphere with minimal volume to enclose uncertainty data was proposed
[164]. The adopted piecewise linear kernel incorporates the covariance information,
thus effectively capturing the correlation among uncertainties. These two data-driven
robust optimization approaches utilized polyhedral uncertainty learned from data, and
thus enjoying computational efficiency. Various types of data-driven uncertainty sets
were developed for static robust optimization based on statistical hypothesis tests [165],
copula [166], and probability density contours [167].
To address multistage decision making under uncertainty, a data-driven approach for
optimization under uncertainty based on multistage ARO and nonparametric kernel
density M-estimation was developed [112]. The salient feature of the framework was
its incorporation of distributional information to address the issue of over-conservatism.
Robust kernel density estimation was employed to extract probability distributions from
data. This data-driven multistage ARO framework exploited robust statistics to be
immunized to data outliers. An exact robust counterpart was developed for solving the
resulting data-driven ARO problem.
In recent years, data-driven robust optimization has been applied to a variety of areas,
such as power systems [33], industrial steam systems [168], planning and scheduling
[112, 166], process control [35], and transportation systems [169].
23
1.2.4 Scenario optimization approach for chance constrained programs
A salient feature of scenario-based optimization is that it does not require the explicit
knowledge of probability distribution as in the stochastic programming approach.
Additionally, scenario-based optimization uses uncertainty scenarios to seek an optimal
solution having a high probabilistic guarantee of constraint satisfaction instead of
utilizing scenarios or samples to approximate the expectation term as in stochastic
programming. Although the scenario-based optimization can be regarded as a special
type of robust optimization that has a discrete uncertainty set consisting of uncertainty
data, it can provide probabilistic guarantee for those unobserved uncertainty data in the
testing data set. Note that the scenario-based optimization approach provides a viable
and data-driven route to achieving approximate solutions of chance-constrained
programs. The scenario-based optimization approach is a general data-driven
optimization under uncertainty framework in which uncertainty data or random samples
are utilized in a more direct manner compared with other data-driven optimization
methods. This data-driven optimization framework was first introduced in [170], and
has gained great popularity within the systems and control community [171]. As in data-
driven chance constrained programs, the knowledge of true underlying uncertainty
distribution is not required in scenario optimization but a finite number of uncertainty
realizations. Specifically, the scenario approach enforces the constraint satisfaction with
N independent identically distributed uncertainty data u(1), …, u(N). The resulting
scenario optimization problem is given by,
min cT x
x X
(1.13)
s.t.  
f x, ui   0, i  1, , N
24
where x is the vector of decision variables, X represents a deterministic convex and
closed set unaffected by uncertainty, c is the vector of cost coefficients, and f denotes
the constraint function affected by uncertainty u. Note that function f is typically
assumed to be convex in x, and can have arbitrarily nonlinear dependence on u, as
opposed to data-driven nonlinear chance constrained program assuming the constraint
function must be quasi-convex in u [147]. Additionally, scenario-based optimization
can be considered as a special case of data-driven robust optimization when the
uncertainty set is constructed as a union of u(1), …, u(N).
 
In the scenario optimization literature,   u 1 , , u N  is referred to as the multi-
sample or scenario that is drawn from the product probability space. Due to the random
nature of the multi-sample, the optimal solution of the scenario optimization problem
(1.13), denoted as x*(ω), is also random. One key merit of the scenario approach is that
the scenario optimization problem admits the same problem type as its deterministic
counterpart, so that it can be solved efficiently by convex optimization algorithms when
f(x, u) is convex in x [172]. Moreover, the optimal solution x*(ω) is guaranteed to satisfy
the constraints with other unseen uncertainty realizations with a high probability [173].
For the sake of clarity, we revisit the following definition and theorem [173].
Definition 1.1 (Violation probability) The violation probability of a given decision x is
defined as follows:
 
V  x    u   f  x, u   0 (1.14)
where V(x) denotes the probability of violation for a given x, and Ξ represents the
support of uncertainty u. We say a decision x is ε-feasible if V(x) ≤ ε.
25
Theorem 1.1 Assuming x*(ω) is the unique optimal solution of the scenario
optimization problem. It holds that
   
n 1 N
 N  V  x*       1      i 1   
N i
(1.15)
i 0  i 
where n is the number of decision variables, N denotes the number of uncertainty data,
and  N is a product probability governing the sample generation.
The above theorem implies that the optimal solution x*(ω) satisfies the corresponding
chance constraint with a certain confidence level. The proof of this theorem depends on
the fundamental fact that the number of support constraints, the removal of which
changes the optimal solution, is upper bounded by the number of decision variables
[170]. Note that (1.15) holds with equality for the fully-supported convex optimization
problem [173], meaning that the probability bound is tight. Additionally, the result holds
true irrespective of probability distribution information or even its support set.
By exploiting the structured dependence on uncertainty, the sample size required by the
scenario optimization problem was reduced through a tighter bound on Helly’s
dimension [174]. Rather than focusing on the constraint violation probability,
considerable research efforts have been made on the degree of violation [175], expected
probability of constraint violation [176], and the performance bounds for objective
values [177]. To make a trade-off between feasibility and performance, the case was
studied where some of the sampled constraints were allowed to be violated for
improving the performance of the objective [178]. Subsequent work along this direction
includes a sampling-and-discarding method [179]. A wait-and-judge scenario
optimization framework was proposed in which the level of robustness was assessed a
26
posteriori after the optimal solution was obtained [180]. Recently, the extension of
scenario-based optimization to the multistage decision making setting was made [181,
182].
While the scenario optimization problems with continuous decision variables are
extensively studied [171], the mixed-integer scenario optimization was less developed.
An attempt to extend the scenario theory to random convex programs with mixed-
integer decision variables was made [183], and the Helly dimension in the mixed-integer
scenario program was proved to depend geometrically on the number of integer
variables. This result suggests that the required sample size can be prohibitively large
for scenario programs with many discrete variables. Along this research direction, two
sampling algorithms within the framework of S-optimization were recently developed
for solving mixed-integer convex scenario programs [184].
In some real-world applications, the required sample size can be very large, resulting in
great computational burden for scenario optimization problems with huge number of
sampling constraints. One way to circumvent this difficulty is to devise sequential
solution algorithms. Along this direction, sequential randomized algorithms were
developed for convex scenario optimization problems [185], and fell into the
framework of Sequential Probabilistic Validation (SPV) [186]. The motivation behind
these sequential algorithms is that validating a given solution with a large number of
samples is less computational expensive than solving the corresponding scenario
optimization problem. Recently, a repetitive scenario design approach was proposed by
iterating between reduced-size scenario optimization problems and the probabilistic
feasibility check [187]. The trade-off between the sample size and the expected number
27
of repetitions was also revealed in the repetitive scenario design [187]. Note that the
classical scenario-based approach is an extreme situation in the trade-off curve, where
one seeks to find the solution at one step. Another effective way to reduce the
computation cost of large-scale scenario optimization is to employ distributed
algorithms [188-190]. Particularly, the sampled constraints were distributed among
multiple processors of a network, and the large-scale scenario optimization problems
can be efficiently solved via constraint consensus schemes [190]. Along this direction,
a distributed computing framework was developed for the scenario convex program
with multiple processors connected by a graph [188]. The major advantage of this
approach is that the computational cost for each processor becomes lower and the
original scenario optimization problem can be solved collaboratively. Other
contribution to reduce computational cost is made based on a non-iterative two-step
procedure, i.e. the optimization step and detuning step [191]. As a consequence, the total
sample complexity was greatly decreased.
Traditionally, the field of scenario optimization has focused on convex optimization
problems, in which the number of support constraints is upper bounded by the number
of decision variables. However, such upper bounds are no longer available in nonconvex
scenario optimization problems, giving rise to research challenges of extending the
scenario theory to the nonconvex setting. To date, few works have considered
nonconvex uncertain program using the scenario approach. One contribution is that of
[192], in which assessing the generalization of the optimal solution in a wait-and-judge
manner through the concept of support sub-sample was proposed. The proposed
approach can be employed to general nonconvex setups, including mixed-integer
28
scenario optimization problems. Another attempt to address nonconvex scenario
optimization made use of the statistical learning theory for bounding the violation
probability, and devised a randomized solution algorithm [193]. The statistical learning
theory-based method provided the probabilistic guarantee for all feasible solutions, as
opposed to the convex scenario approach where such guarantee is valid only for the
optimal solution. This unique feature regarding probabilistic guarantees for all feasible
solutions granted by the statistical learning based method is of practical relevance [194],
since it is computationally challenging to solve nonconvex optimization problems to
global optimality. A class of non-convex scenario optimization problem, which has non-
convex objective functions and convex constraints, was recently studied [195]. Since
the Helly’s dimension for the optimal solution of such non-convex scenario program
can be unbounded, the direct application of scenario approaches based on Helly’s
theorem is impossible. To overcome the research challenge, the feasible region was
restricted to the convex hull of few optimizers, thus enabling the application of sample
complexity results [173].
1.3 Various types of deep learning techniques and their potentials
In this subsection, we present three types of deep learning techniques, including deep
belief networks, convolutional neural networks, and recurrent neural networks, and
explore their potential applications in data-driven optimization under uncertainty.
 Deep belief networks
Among deep learning techniques, deep belief networks (DBNs) are becoming
increasingly popular primarily because its unique feature in capturing a hierarchy of
29
latent features [196]. DBNs essentially belong to probabilistic graphical models and are
structured by stacking a series of restricted Boltzmann machines (RBMs). This specific
network structure is designed based on the fact that a single RBM with only one hidden
layer fall shorts of capturing the intrinsic complexities in high-dimensional data. As the
building blocks for DBNs, RBMs are characterized as two layers of neurons, namely
hidden layer and visible layer. Note that the hidden layer can be regarded as the abstract
representation of the visible layer. There are undirected connections between these two
layers, while there exist no intra-connections within each layer. The training process of
DBNs typically involves the pre-training and fine-tuning procedures in a layer-wise
scheme. Armed with multiple layers of hidden variables, DBNs enjoy unique power in
extracting a hierarchy of latent features automatically, which is desirable in many
practical applications. As a result, DBNs have been applied in a wide spectrum of areas,
including fault diagnosis [197], soft sensor [198], and drug discovery [199]. DBNs can
decipher complicated nonlinear correlation among uncertain parameters. Recently, deep
Gaussian process model was proposed as a special type of DBN based Gaussian process
mappings. Due to its unique advantage in nonlinear regression, deep Gaussian process
model should be used to characterize the relationship between uncertain parameters,
such as product price and demand.
 Convolutional neural networks
Convolutional neural networks (CNNs) are one specialized version of deep neural
networks [200], and they have become increasingly popular in areas such as image
classification, speech recognition, and robotics. Inspired by the visual neuroscience,
CNNs are designed to fully exploit the three main ideas, namely sparse connectivity,
30
weight sharing, and equivariant representations [19]. This kind of neural network is
suited for processing data in the form of multiple arrays, particularly two-dimensional
image data. The architecture of a CNN typically consists of convolution layers,
nonlinear layers, and pooling layers. In convolution layers, feature maps are extracted
by performing convolutions between local patch of data and filters. The filters share the
same weights when moving across the dataset, leading to reduced number of parameters
in networks. The obtained results are further passed through a nonlinear activation
function, such as rectified linear unit (ReLU). After that, pooling layers, such as max
pooling and average pooling, are applied to aggregate semantically similar features.
Such different types of layers are alternatively connected to extract hierarchical features
with various abstractions. For the purpose of classification, a fully connected layer is
stacked after extracting the high-level features. Although CNNs are mainly used for
image classification, they have been used to learn spatial features of traffic flow data at
nearby locations which exhibit strong spatial correlations [201]. Given its unique power
in spatial data modeling, CNNs hold the potential to model uncertainty data with large
spatial correlations, such as demand data in different adjacent market locations. In
addition, the CNNs can be trained for the labeled multi-class uncertainty data to perform
the task of classification. Therefore, the output of the CNN potentially acts as the
probability weights used in the data-driven stochastic robust optimization framework.
 Recurrent neural networks
Besides the aforementioned models for spatial data, recurrent neural networks (RNNs)
are widely recognized as the state-of-the-art deep learning technique for processing time
series data, especially those from language and speech [202]. RNNs can be considered
31
as feedforward neural networks if they are unfolded in time scale. The architecture of
neural networks in a RNN possesses a unique structure of directed cycles among hidden
units. In addition, the inputs of the hidden unit come from both the hidden unit of
previous time and the input unit at current time. Accordingly, these hidden units in the
architecture of RNNs constitute the state vectors and store the historical information of
past input data. With this special architecture, RNNs are well-suited for feature learning
for sequential data and demonstrate successful applications in various areas, including
natural speech recognition [202], and load forecasting [203]. However, one drawback
of RNNs is its weakness in storing long-term memory due to gradient vanishing and
exploding problems. To address this issue, research efforts have been made on variants
of RNNs, such as long short-term memory (LSTM) and gated recurrent unit (GRU)
[204]. By explicitly incorporating input, output and forget gates, LSTM enhances the
capability of memorizing the long-term dependency among sequential data. In
sequential mathematical programming under uncertainty, massive time series of
uncertain parameters are collected. Uncertainty data realized at different time stages
often exhibit temporal dynamics. To this end, deep learning techniques, such as deep
RNNs and LSTM, could be leveraged to decipher the temporal dynamics and
trajectories of uncertainty over time stages.
1.4 Outline of the dissertation
This dissertation focuses on the data-driven optimization under uncertainty. The
roadmap of the dissertation is provided as follows.
32
In Chapter 2, we propose a novel data-driven Wasserstein distributionally robust
optimization model for hedging against uncertainty in the optimal biomass with
agricultural waste-to-energy network design under uncertainty. Instead of assuming
perfect knowledge of probability distribution for uncertain parameters, we construct a
data-driven ambiguity set of candidate distributions based on the Wasserstein metric,
which is utilized to quantify their distances from the data-based empirical distribution.
Equipped with this ambiguity set, the two-stage distributionally robust optimization
model not only accommodates the sequential decision making at design and operational
stages, but also hedges against the distributional ambiguity arising from finite amount
of uncertainty data. A solution algorithm is further developed to solve the resulting two-
stage distributionally robust mixed-integer nonlinear program.
In Chapter 3, we propose a novel deep learning based ambiguous joint chance
constrained economic dispatch (ED) framework for high penetration of renewable
energy. By leveraging a deep generative adversarial network (GAN), an f-divergence-
based ambiguity set of wind power distributions is constructed as a ball in the
probability space centered at the distribution induced by a generator network.
Specifically, wind power data are utilized to train f-GAN, in which its discriminator
network criticizes the performance of the generator network in terms of f-divergence.
Based upon this ambiguity set, a data-driven joint chance constrained ED model is
developed to hedge against distributional uncertainty present in multiple constraints
regarding wind power utilization. To facilitate its solution process, the resulting
distributionally robust chance constraints are equivalently reformulated as ambiguity-
free chance constraints, which are further tackled using a scenario approach. Theoretical
33
a priori bound on the required number of synthetic wind power data generated by f-
GAN is explicitly derived for the multi-period ED problem to guarantee a predefined
risk level. By exploiting the ED problem structure, a prescreening technique is employed
to greatly boost both computational and memory efficiencies.
In Chapter 4, we investigate the problem of designing data-driven stochastic Model
Predictive Control (MPC) for linear time-invariant systems under additive stochastic
disturbance, whose probability distribution is unknown but can be partially inferred
from data. We propose a novel online learning-based risk-averse stochastic MPC
framework in which Conditional Value-at-Risk (CVaR) constraints on system states are
required to hold for a family of distribu-tions called an ambiguity set. The ambiguity set
is constructed from disturbance data by leveraging a Dirichlet process mixture model
that is self-adaptive to the underlying data structure and complexity. Specifically, the
structural property of multimodality is exploited, so that the first and second-order
moment information of each mixture component is incorporated into the ambiguity set.
A novel constraint tightening strategy is then developed based on an equivalent
reformulation of distributionally robust CVaR constraints over the proposed ambiguity
set. As more data are gathered during the runtime of controller, the ambiguity set is
updated online using real-time disturbance data, which enables the risk-averse
stochastic MPC to cope with time-varying disturbance distributions. The guarantees on
recursive feasibility and closed-loop stability of the proposed MPC are established via
a safe update scheme.
In Chapter 5, we develop a novel transformation-proximal bundle algorithm for
Multistage Adaptive Robust Mixed-Integer Linear Programs (MARMILPs). By
34
partitioning recourse decisions into state and control decisions, the proposed algorithm
applies affine control policy only to state decisions and allows control decisions to be
fully adaptive to uncertainty. In this way, the MARMILP is proved to be transformed
into an equivalent two-stage Adaptive Robust Optimization (ARO) problem. The
proposed multi-to-two transformation remains valid for other types of causal control
policies besides the affine one. The proximal bundle method is developed for the
resulting two-stage problem. We theoretically show finite convergence of the proposed
algorithm with any positive tolerance. To quantitatively assess solution quality, we
develop a scenario-tree-based lower bounding technique.
The dissertation concludes in Chapter 6.
35
CHAPTER 2
DATA-DRIVEN WASSERSTEIN DISTRIBUTIONALLY ROBUST

OPTIMIZATION FOR BIOMASS WITH AGRICULTURAL WASTE-TO-ENERGY
NETWORK DESIGN UNDER UNCERTAINTY
2.1 Introduction
With growing concerns over energy crisis and global warming, the utilization of
renewable energy sources is growing rapidly around the globe [205]. As a renewable
energy source, biomass can be easily stored until needed and has the potential to be
converted into a plethora of biofuels and bioproducts [206]. Biomass feedstock has
gained tremendous popularity in industries [207], and the associated processing
technologies, such as pretreatment [208], pyrolysis [209], gasification [210], and
fractionation [211], have advanced significantly in recent years [212]. There are
different generations of biofuels in the existing literature [213]. The first-generation
biofuels typically feature the feedstocks of edible energy crops, such as corn and
sugarcane, and lead to the competition between food and fuel. To address this issue, the
second-generation biofuels utilize lignocellulosic biomass as feedstocks. The third
generation is known to be produced from algae and can reduce land use compared with
the second-generation biofuels [214]. As a result, renewable energy and renewable
chemicals/materials produced from biomass hold promise to replace their non-
renewable, petroleum-based counterparts to mitigate the issues of energy and climate.
Additionally, agricultural and organic waste sources, like animal manure and slurry
[215], can be converted to energy or value-added products through biorefinery
technologies [216]. From the perspective of sustainability, the conversion of agricultural

36
waste serves as an environmentally friendly avenue to satisfy the ever-increasing
renewable energy demand [217]. For instance, food waste is considered as a valuable
resource of sustainable energy because of its higher biodegradability, moisture and
organic content [218]. Given a myriad possible feedstocks and technologies, unraveling
the optimal biomass processing routes from a process and product network in a
systematic way is of paramount importance for the economic competitiveness as well
as environmental sustainability [219]. Meanwhile, uncertain parameters involved in
network design add more complexity in the decision-making process [220]. The
historical realizations of uncertain parameters are usually available in the energy
industry, and this data holds huge potential to support the network design. Recently,
data-driven optimization has become an emerging paradigm to address uncertainty by
employing the power of machine learning techniques [221] that include, but are not
limited to, Bayesian nonparametric models [32], kernel learning [164], principal
component analysis [34], and robust kernel density estimation [222]. Nowadays, a wide
array of machine learning techniques can be leveraged to excavate useful uncertainty
information for better decisions in the bioconversion network design and operation.
Therefore, it is crucial to design biomass with agricultural waste-to-energy network with
the explicit consideration of uncertainty by exploiting the organic integration of
machine learning and mathematical programming.
Due to the significance of energy systems design, a growing body of literature leverages
systematic mathematical programming models to address such a problem. Some
research studies focused on deterministic models without the consideration of
uncertainty [223], including rule-based method [224], superstructure optimization
37
[225], and life cycle optimization [226]. Nevertheless, the issue of uncertainty could
render the solution of a deterministic optimization problem suboptimal or even
infeasible [8]. To this end, the bioenergy system design subject to uncertainty has been
extensively investigated in the existing literature [227]. There are various types of
uncertainties in the biomass network design problem, including biomass supply,
feedstock prices, bioproduct demand, technological conversion rates, policies, and
environmental impacts [228]. One such method is robust optimization, in which
uncertainty is modeled with an uncertainty set [98]. By introducing recourse decisions
[106], adaptive robust optimization based network design method was proposed to
identify economical and efficient biofuel and bioproduct production pathways [108].
While robust optimization has achieved success in various applications, this method
typically generates over-conservative solutions, because it always hedges against the
worst-case uncertainty realization. The stochastic programming approach has gained
popularity due to the fact that it can incorporate the probability distribution information
to alleviate the conservatism, yet it generally scales poorly in the problem dimensions
[45]. Recently, the biodiesel production model considering diversified raw materials
subject to feedstock composition uncertainty was formulated as a chance-constrained
stochastic program to guarantee the technical performance on biodiesel [229]. To
account for risk aversion, stochastic programming models based on conditional value at
risk and downside risk were proposed for the optimal network design of hydrocarbon
biorefinery under supply and demand uncertainties [63]. The stochastic programming
approach was applied to an integrated hydrocarbon biofuel and petroleum network
design problem, and the resulting optimization model aimed to minimize the
38
expectation of costs under a number of scenarios associated with biomass availability,
fuel demand, and technology evolution [230]. To address the design of sustainable
biomass conversion network, a stochastic mixed-integer linear programming (MILP)
model was presented, in which uncertain purchase prices were assumed to follow
normal distributions [231].
The employment of the stochastic programming method is widespread in this area, and
most existing studies typically use the Monte Carlo method to generate uncertainty data
or scenarios based on a predefined probability distribution. In practice, such perfect
information about the true probability distribution of uncertain parameters is rarely
known, and it can only be observable through a finite number of uncertainty data. Due
to such limited amount of uncertainty data, the assumed probability distribution could
significantly deviate from the underlying true distribution. If the stochastic program for
biomass with agricultural waste-to-energy network design is calibrated to a given
uncertainty dataset, the resulting out-of-sample performance tends to be disappointing
when evaluating its optimal solution with a testing dataset [121]. The out-of-sample
performance is the actual performance, in terms of objective values, of a given optimal
solution evaluated at some uncertainty scenarios, which are different from the ones used
to obtain that solution. Consequently, the conventional stochastic programming
approach could have poor out-of-sample performance. The aforementioned issue
prompts the development of data-driven distributionally robust optimization (DRO).
The moment-based ambiguity set in DRO is not guaranteed to converge to the true
probability distribution, as the number of uncertainty data increases. Therefore, this type
of ambiguity set suffers from the conservatism issue [124]. Thus, it is imperative to
39
develop a novel optimization method for biomass network design that can (a) effectively
hedge against the distributional ambiguity; (b) leverage the value of uncertainty data via
statistical machine learning; (c) lead to tractable model formulations that are amenable
for applications; and (d) provide optimal solutions with better out-of-sample
performance in terms of lower average cost and lower variance compared with
conventional stochastic programming.
To fill this knowledge gap, we propose a novel data-driven two-stage Wasserstein
distributionally robust network design model, in which technology selection and sizing
are made at the first stage, while operation decisions are made at the second stage.
Rather than assuming perfect knowledge of probability distribution, we consider a more
realistic setting where the true probability distribution can be inferred from a set of
historical uncertainty data. Based on the Wasserstein metric, we construct the data-
driven ambiguity set as a ball (a.k.a. Wasserstein ball) in the probability space centered
at the uniform distribution on uncertainty data. Although the Lévy-Prokhorov metric
can be used to measure the distance between probability distributions, our research work
adopts the Wasserstein metric rather than using the Lévy-Prokhorov metric in the DRO
framework following the literature [38]. The ambiguity set based on the Wasserstein
metric has a better out-of-sample performance compared with other moment-based
methods [124], as well as better computational tractability [131]. Recently, the
Wasserstein ambiguity set has gained increasing popularity, and is widely adopted in
multistage adaptive DRO [232], adaptive robust stochastic optimization [233], and
distributionally robust chance-constrained optimization [153, 234]. The two-stage
stochastic programming model can be considered as a special case of the proposed DRO
40
model when the “radius” of the Wasserstein ball is tuned to be zero. Nonlinear scaling
functions are introduced in the objective function to accurately calculate each
technology’s capital cost associated with the corresponding capacity. Notably, this
research work involves all the three generations of biofuels. According to the taxonomy
of uncertainty types [228], uncertainty can be classified into three categories, namely
randomness, epistemic, and deep uncertainty. In the studied problem, the uncertainty
belongs to deep uncertainty that is characterized by insufficient knowledge of the
underlying probability distribution. The data-driven DRO approach is suitable to
address this type of uncertainty, because it hedges against the ambiguity of distribution
resulted from such lack of probability information by using a set of plausible
distributions. The data-driven Wasserstein DRO model harnesses the advantages of both
robust optimization and stochastic programming. Specially, adopting a worst-case
orientated approach regularizes the optimization problem and effectively hedges against
the worst-case distribution within the ambiguity set [235], thereby remedying the
drawback of the stochastic programming method. To the best of our knowledge, the
proposed model represents the first attempt to employ the data-driven Wasserstein DRO
to address the biomass network design problem under uncertainty. The resulting
problem is formulated as a multi-level mixed-integer nonlinear program (MINLP),
which cannot be solved directly by any off-the-shelf optimization solvers. The “multi-
level” means that the resulting optimization problem has a “min-max-min” optimization
structure. To address this computational challenge, a solution strategy is further
developed by integrating the reformulation of the worst-case expectation [124], and a
branch-and-refine algorithm [236]. In case studies, we consider the feedstock price
41
uncertainty to demonstrate the effectiveness of the proposed approach. The better out-
of-sample performance of the data-driven Wasserstein DRO in terms of lower average
cost and lower variance is validated in a case study of a biomass with agricultural waste-
to-energy network involving 216 technologies and 172 materials/compounds. A
sensitivity analysis is also performed to evaluate the impact of the ambiguity set’s size
on its corresponding DRO solution.
2.2 Problem statement
In this section, we formally state the problem of biomass with agricultural waste-to-
energy network design considering uncertain parameters as follows. As depicted in
Figure 2, we consider a comprehensive biomass with agricultural waste-to-energy
network. This network has various conversion pathways featuring a diversified portfolio
of biomass feedstocks, organic and agricultural waste feedstocks, biofuel, and
bioproducts. The purpose of this comprehensive network is to convert a variety of raw
materials or feedstocks into sustainable energy and useful bioproducts such as biofuels
and biogas. Accordingly, the network holds great value for not only producing clean
energy, but also for managing agricultural waste. There is a total of 216 processing and
upgrading technologies, as well as 172 materials/compounds in this network. The
feedstocks include soybean, corn, sugarcane, hard wood, soft wood, switchgrass, algae,
cassava, brown grease, corn stover, tomato peels, potato peels, orange peels, olive
waste, municipal solid waste, dairy manure, poultry litter, and swine manure. These
various types of feedstocks in the network are converted to energy and bioproducts in
the following way. First, feedstocks are decomposed into some basic chemical
42
compounds via processing technologies, such as hydrothermal liquefaction [237]. Those
chemical compounds are then used for producing biofuels or bioproducts through
upgrading technologies. The final products are sold to the market. Note that some of the
potential pathways in this network are “waste-to-energy” pathways, meaning that they
convert waste materials into energy-rich bioproducts and biofuels [238]. One main
component in waste feedstocks is the agricultural waste, including food waste and
animal manure [239]. Specifically, tomato peels, potato peels, and orange peels can
serve as feedstocks to produce chemical materials, like beta carotene, chlorogenic acid,
caffeic acid, and pectin. Different types of anaerobic digesters (ADs), such as the mixed
plug AD and the horizontal plug flow AD, present as technologies that convert dairy
animal manure, poultry litter, and swine manure into biogas [240]. As a fuel source, the
biogas can be further used to produce heat and electricity [241], thus providing immense
environmental benefits [242]. Meanwhile, methane can be extracted from municipal
landfills using the methane extraction technology [243].
43
Figure 2. The structure of the biomass with agricultural waste-to-energy network
considered in this work.
The most recognized type of uncertainty is the volatility in purchasing prices of biomass
resources. In this research work, the feedstock price uncertainty is considered for the
following reasons. On one hand, uncertain biomass feedstock prices typically fluctuate
due to policy changes and energy markets. Given the lifetime of equipment, the
uncertainty of feedstock prices can significantly affect the economic performances of
the optimal bioconversion network design. On the other hand, real feedstock price data
are well documented and can be easily acquired to validate the effectiveness of the
proposed data-driven approach. Within the proposed Wasserstein DRO model, useful
statistical information embedded in the uncertainty data is leveraged, and then the
ambiguity set based on the Wasserstein metric is constructed. Note that policy changes
and energy markets could lead to time-variant price distributions, which further cause
the ambiguity of probability distributions. For these two sources of uncertainty, the
44
DRO approach works because it uses an ambiguity set to hedge against the distributional
uncertainty. For those uncertainties whose probability distributions are time-invariant,
the DRO method works, since their underlying true distributions can be partially known
due to the limited number of uncertainty data. If the probability distribution can be
perfectly known to the decision maker, adding the distributional robustness is not
necessary.
Figure 3. Illustrative figure on the biomass with agricultural waste-to-energy network
with the corresponding data-driven Wasserstein DRO model.
In this problem, we aim to identify cost-effective processing pathways from biomass
with agricultural waste-to-energy network by minimizing the worst-case expected total
annualized cost. This worst-case expected cost is taken with respect to all feedstock
price distributions within the Wasserstein ambiguity set. This data-driven ambiguity set
is defined as a ball in the probability space centered at the uniform probability
distribution on biomass feedstock price data. An illustrative figure on the biomass with
agricultural waste-to-energy network along with the data-driven Wasserstein DRO
framework is provided in Figure 3. In the two-stage optimization structure of this data-

45
driven Wasserstein DRO model, the first-stage decisions are the design decisions made
prior to the feedstock price uncertainty realization. The second-stage decisions are
operational decisions that are postponed in a “wait-and-see” manner after knowing the
uncertainty realization. The selected technologies in the network are assumed to be
ready for operation at the second stage. Details of these decisions are summarized as
follows:
First-stage design decisions:
 Technology pathway selection;
 Capacity of each technology in pathways;
Second-stage operation decisions:
 Operation level of each technology in pathways;
 Quantities of biomass feedstock to use;
 Bioproduct and/or biofuel sale amount.
These design and operation decisions are optimized based on the following given
parameters:
 The upper and lower bounds of the capacity of each processing and upgrading
technology;
 Conversion coefficients for each technology;
 The availability of each biomass feedstock;
 A base capacity of each technology;
 An initial capital cost corresponding to the base capacity for each technology;
 Expected life span in years of the processing pathway;
 Discount rate;
46
 The fixed operating expense (OPEX) for each technology;
 An initial, variable OPEX corresponding to the base capacity for each
technology;
 Prices of bioproducts and biofuels;
 Feedstock price data.
2.3 Mathematical formulation
In this section, we first present a data-driven approach to construct the Wasserstein
ambiguity set for the feedstock price uncertainty. With this ambiguity set, a two-stage
adaptive distributionally robust mixed-integer nonlinear optimization model is then
proposed for the biomass with agricultural waste-to-energy network design. Finally, a
solution strategy integrating the reformulation of worst-case expectation and the branch-
and-refine algorithm is developed for solving the resulting non-convex optimization
problem.
2.3.1 Data-driven ambiguity set using Wasserstein metric
As mentioned in the problem statement, feedstock prices are subject to uncertainty and
 
the decision maker has access to the price dataset Dtrain  ξ 1 , , ξ  N  . ξ(n) denotes the
T
n-th data vector of feedstock prices, i.e. ξ    c3, 1 , , c3,I
n 
n n
 , and N represents the
number of data samples. For the stochastic programming-based network design, the
assumed probability distribution might deviate from the underlying true distribution due
to the finite amount of feedstock price data. In addition, relying on a single probability
distribution could lead to the deterioration in out-of-sample performance [121]. To

47
address this issue, we construct a family of probability distributions, also referred to as
ambiguity set, by using historical feedstock price data on hand. To quantitatively
measure the distance between feedstock price distributions, we define the Wasserstein
metric or Wasserstein distance as follows [244].
For any probability distributions 1 ,  2      , the Wasserstein metric or distance
between these two distributions is given by,
d w  1 , 2   min  ξ1  ξ 2   dξ1 , dξ 2 
 2
s.t.  is a joint distibution of ξ1 and ξ 2 (2.1)

with marginal distributions 1 and 2
where     represents the set of all probability distributions with support set Ξ, and
 denotes the norm of a vector. We adopt l1 norm in this work due to its computational
benefits in DRO [124].
From the definition, we can see that the Wasserstein metric is defined through an
optimization problem where the decision variable is a probability distribution. By
considering the decision variable Π as a transportation plan, the Wasserstein metric is
essentially the minimum transportation cost of moving probability mass from
distribution 1 to distribution 2 .
Based on the Wasserstein metric in (2.1), the data-driven ambiguity set for feedstock
prices is represented by,
 
       d w , ˆ N     (2.2)
48
where ˆ N denotes the empirical distribution. The probability distribution ˆ N is the
1 N
uniform distribution on N available feedstock price data, i.e. ˆ N   δξ n , where
N n 1
δξ n represents the Dirac measure at the price data point ξ(n). Note that ˆ N is a discrete
distribution and serves as an estimation of the underlying true distribution true . θ is a
parameter used for controlling the size of the data-driven ambiguity set  . The support
set Ξ can be specified via the upper and lower bounds of uncertain parameters and is
shown as follows.
 
  ξ ξ imin  ξ i  ξ imax , i (2.3)
The data-driven ambiguity set  contains all probability distributions whose
Wasserstein distances from the empirical distribution is no larger than θ. Therefore, the
ambiguity set  can be interpreted as a Wasserstein ball of radius θ centered at the
empirical distribution ˆ N . Note that the size of the data-driven ambiguity set can be
adjusted by using the tuning parameter θ. Specifically, decreasing the value of parameter
θ reduces the size of the Wasserstein ambiguity set. The decision maker can utilize the
radius of the Wasserstein ball to control the level of conservatism.
The radius θ of the Wasserstein ball can be calculated by [245],
 m log  1    C   C N  , if m log  1    C   C N   1
 1 2 1 2
  (2.4)
 
 max  log  1    C1   C2 N  ,1 , else

where β denotes the confidence level, m (m>2) represents the dimension of uncertainty
vector, and N is the number of feedstock price data. Here it is assumed that there exist
49

e   dx    . Note that C1 and C2 are positive constant

x
α>1 and ρ>0 satisfying

numbers. In general, Equation (2.4) is not a practicable way to obtain the Wasserstein
radius, because the constants C1 and C2 are difficult to estimate and could give loose
bounds. For this reason, cross-validation can be used as an empirical way to tune the
Wasserstein radius [124].
The steps of k-fold cross validation to tune the Wasserstein radius are given as follows
[246]. First, ξ   , , ξ 
1 N
are partitioned into k subsets. For each holdout run, only one
subset is used as a training dataset, while the remaining subsets are merged as a
validation dataset. Second, the Wasserstein radius is tuned such that the corresponding
average cost for the validation dataset is minimized. Lastly, the optimal Wasserstein
radius from cross-validation is set to be the average of the optimal radii determined in
the k holdout runs.
The data-driven ambiguity set for the feedstock price is cast as a set of possible
probability distributions that are “close” to the empirical distribution in the sense of the
Wasserstein metric. There are several merits of the Wasserstein ambiguity set. First, this
ambiguity set directly leverages the uncertainty data information via the empirical
distribution, while at the same time effectively hedges against the distributional
uncertainty based upon the Wasserstein metric. This feature is useful in the network
design problem where the distribution of uncertain feedstock prices is only observable
through a finite amount of price data. Second, there exists a statistical guarantee that the
Wasserstein ambiguity set contains the unknown true distribution with a certain
confidence level [245]. Specifically, with the Wasserstein radius in (2.4), it can be
50
1  C1e  C2 N , if   1
m
 
guaranteed that P d w true , ˆ N  
  
1  C e  C2 N  
, else
. This favorable feature
1
equips the resulting DRO solution with better out-of-sample performance in terms of
lower average cost and lower variance. Such out-of-sample performance is of practical
relevance, since price data different from the training dataset Dtrain are used to test the
data-driven network design decision. Third, the decision maker can readily adjust the
level of conservatism by tuning the radius θ of the Wasserstein ball. Lastly, the DRO
problem with the Wasserstein ambiguity set admits a tractable reformulation, which
grants the resulting biomass with agricultural waste-to-energy network design problem
with computational efficiency.
2.3.2 Data-driven Wasserstein distributionally robust network design
model
In this section, we present a novel data-driven Wasserstein distributionally robust
biomass with agricultural waste-to-energy network design model using the data-driven
ambiguity set presented in the previous section. In a biomass with agricultural waste-
to-energy network, biomass feedstocks, such as microalgae [247], and dairy manure
[248], are converted into a variety of biofuels and bioproducts via different processing
and upgrading technologies [219]. One needs to make decisions on the selection of
technology pathway, capacity and operating level of each technology, purchase amounts
of feedstocks and quantities of products to sell. The objective is to minimize the worst-
case expected total annualized cost with regard to the Wasserstein ambiguity set. Since
the proposed model aggregates yearly operations, the issue of biomass feedstock
51
seasonality is not considered in this work. In this research work, we focus on the
selection of technologies for the biomass network design, and do not consider the issue
of inventory. Therefore, the assumption on yearly operation is reasonable. Additionally,
this assumption is adopted in the existing literature [219, 225].
The data-driven Wasserstein DRO model for the network design under uncertainty can
be cast as a multi-level MINLP. The two-stage optimization structure allows for
selection and capacity decisions to be made before uncertainty realizations, while also
allowing for production, purchasing, and sale decisions to be made after uncertainty has
been realized. Specifically, the first-stage decision variables are decisions on the
selection and capacity of technology. The second-stage decisions include production
levels, quantity of biomass to use, and amounts of products to sell. The objective
function of the biomass with agricultural waste-to-energy network design is shown in
(2.5). The constraints include technology capacity constraint (2.6), production level
constraint (2.7), mass balance constraint (2.8), biomass feedstock availability constraint
(2.9), biofuel/bioproduct demand satisfaction constraint (2.10), non-negativity and
integrity constraints (2.11)-(2.12). The data-driven ambiguity set of feedstock prices is
shown in (2.13). A list of indices/sets, parameters and variables is given in the
Nomenclature section, where all parameters are denoted in lower-case symbols, and all
variables are denoted in upper-case symbols. The two-stage Wasserstein DRO (WDRO)
model formulation is presented as follows:
  
c Q j j  max   min   c2, jW j   c3,i Pi   c4,i Si 
sf
(WDRO) min 1, j (2.5)
Y ,Q
jJ

W , P,S  jJ iI iI 
s.t. a1, jY j  Q j  a2, jY j , j  J (2.6)
52
Wj  Qj , j  J (2.7)
Pi   ij W j  Si  0, i  I (2.8)
j
Pi  bi , i  I (2.9)
Si  d i , i  I (2.10)
Q j , Pi , S i , W j  0 i  I , j  J (2.11)
Y j  0,1 , j  J (2.12)



d w , ˆ N    

        1 N  (2.13)
 ˆ N   δc n 
 N n 1 3,i 
where c1,j, c2,j, c3,i, and c4,i respectively represent economic evaluation parameters for
the capital cost associated with technology j, the operating cost associated with
technology j, the purchase cost of biomass feedstock i, and the selling price of
bioproduct i. At the first stage (a.k.a. the design stage), “here-and-now” decisions
involve binary variables Yj indicating the selection of technology j as well as continuous
variables Qj representing the capacity of technology j. These decisions are termed as
“here-and-now”, since they should be made prior to any uncertain feedstock price
realizations. At the second stage or the operational stage, the decision variables,
including the operating level of each technology Wj, the amount of feedstock purchased
Pi, and the amount of product sold Si, can be postponed in a “wait-and-see” manner after
knowing the uncertainty realization.
The objective function can be roughly divided into two terms. The first term represents
the first-stage cost, namely the total capital cost. The nonlinearity arises within the
53
relation between the facility’s capital cost and its capacity [249]. Specifically, the
sf
nonlinear functions, namely power function Q j j , are employed to evaluate technology
capital costs in the (WDRO) model. Following the literature [250], sfj is typically set to
be 0.6. The second term is the worst-case expectation of the second-stage costs, and as
a result, the proposed optimization model is capable of hedging against the worst-case
feedstock price distribution within the Wasserstein ball. Based on Constraint (2.7), it
becomes clear that the decision variable for operating level Wj can be adjusted in the
range from zero to the total capacity of technology j. Therefore, the (WDRO) model
appropriately accommodates the fact that real facilities do not always operate at the
maximum capacity.
It is worth noting that the proposed (WDRO) model reduces to the stochastic
programming when the value of parameter θ is set to be 0, since the induced ambiguity
 
set changes to the singleton set ˆ N . In summary, we develop a data-driven two-stage
Wasserstein distributionally robust MINLP model where the distribution of biomass
feedstock price can only be inferred from a finite training dataset. To effectively hedge
against the distributional uncertainty, the (WDRO) model employs the objective
function of worst-case expected cost with respect to the data-driven Wasserstein
ambiguity set. The proposed biomass with agricultural waste-to-energy network design
model has the following merits. First, it directly incorporates uncertain feedstock price
data into the optimization model. Second, the (WDRO) model effectively accounts for
the ambiguity of the feedstock price distribution, thereby enjoying a better out-of-
54
sample performance in terms of lower average cost and lower variance compared with
conventional stochastic programming.
However, the multi-level optimization structure, coupled with nonconvex terms in the
objective function, makes the resulting optimization problem computationally
challenging. To address this computational challenge, we further develop a solution
method that works in solving the resulting Wasserstein distributionally robust MINLP
problem in the sequel.
2.4 Solution methodology
In this section, we develop a tailored solution method to globally optimize the (WDRO)
problem, which involves a nonconvex objective function. Problem (WDRO) cannot be
solved directly by any off-the-shelf optimization solvers due to the multilevel
optimization structure, as well as the infinite number of probability distributions
sf j
involved in the ambiguity set. The concave function Q j renders the optimization
problem non-convex, which leads to increased computational difficulty. Furthermore,
existing solution methods for two-stage DRO problems cannot handle a mixed-integer
nonconvex objective function [251].
To tackle this computational challenge, we present a tailored solution algorithm based
on the special structure of (WDRO) problem. Specifically, we employ a combination of
the reformulation of the worst-case expectation and a branch-and-refine algorithm to
globally optimize the (WDRO) problem. First, we reformulate the worst-case
expectation problem and obtain a single-level MINLP problem. The branch-and-refine
55
algorithm is then adopted to solve the resulting single-level optimization problem by
leveraging a successive piecewise linear approximation technique.
To facilitate its implementation, we present the explicit form of the two-stage
Wasserstein distributionally robust counterpart (WDRC) for the biomass with
agricultural waste-to-energy network design with uncertain feedstock price as follows.
The step-by-step derivation is provided in Appendix A. Note that the derivation to
reformulate the (WDRO) problem into (WDRC) comes from the literature [124].
N
1
c Qj j   + 
sf
(WDRC) min 1, j n (2.14)
jJ N n 1
s.t. a1, jY j  Q j  a2, jY j , j  J (2.15)
Wnj  Q j , j  J , n  N d (2.16)
Pni    ij Wnj  Sni  0, i  I , n  N d (2.17)

j
Pni  bi , i  I , n  N d (2.18)
Sin  di , i  I , n  N d (2.19)
Q j , Pni , S ni , Wnj  0 i  I , j  J , n  N d (2.20)
Y j  0,1 , j  J (2.21)
c
jJ
2, jWnj   c3,i
 n
iI
Pni   c4,i Sni
iI
(2.22)
iI

  c3,imax  c3,i
 n

 γ ni   c3,imin  c3,i
1  n
 
 γ ni   n , n  N d
iI
2
   Pni  γ ni1  γ ni2   , i  I , n  N d (2.23)
γ ni1 , γ ni2  0, i  I , n  N d (2.24)
56
 γ n1  1
where γ n   2  , γ ni and γ ni2 are the i-th entries of vectors γ n1 and γ n2 , respectively.
 γ n  
max min
c3,i and c3,i represent the upper and lower bounds for the price of feedstock i,
respectively.
By employing the reformulation techniques, we obtain the (WDRC) model which is
equivalent to the (WDRO) model. The (WDRC) for the biomass with agricultural waste-
to-energy network design is formulated as a single-level MINLP problem. One salient
feature of (WDRC) is that its model size, namely the number of decision variables and
the number of constraints, scales linearly with the number of price data N. Moreover,
the uncertain price data are directly incorporated into the proposed (WDRC) model as
witnessed in (2.22).
The resulting (WDRC) problem turns out to be a nonconvex MINLP with separable
concave terms in its objective function (2.14). These concave terms appear due to the
calculation of capital cost based on “six-tenths rule” scaling with technology capacity
[250]. Although this single-level MINLP can be solved directly using some off-the-shelf
optimization solvers such as BARON, it turns out to be computationally expensive when
applied to network design problems [252]. By leveraging the structure of separable
concave terms, we adopt a branch-and-refine algorithm based on a successive piecewise
linear approximation to solve the (WDRC) problem to its global optimality [236]. The
key idea is to approximate the concave capital cost using a series of piecewise linear
underestimates that are formulated via special ordered sets of type 1 (SOS1) variables.
The piecewise linear under-estimator for the capital cost of technology j, denoted by Ej,
is formulated in (2.25)-(2.27).
57
NP
E j   fe jp  PWip , j  J (2.25)
p 1
NP
Q j   fx jp  PW jp , j  J (2.26)
p 1
fe jp  fx 0.6
jp , j  J , p  P (2.27)
where fxjp is the predefined partition point value, fejp denotes the corresponding power
function value, p is the index of partition point, NP is the total number of partition points,
and PWjp a weight for the corresponding partition point.
Constraints on weighting factor PWjp and position indicator PEjp are defined in (2.28)-
(2.33).
NP
 PW
p 1
jp  1, j  J (2.28)
NP
 PE
p 1
jp  1, j  J (2.29)
PW j1  PE j1 , j  J (2.30)
PW jp  PE jp  PE jp 1 , j  J , 2  p  NP (2.31)
PW jNP  PE jNP 1 , i  I (2.32)
PW jp  0, PE jp  SOS1 , j  J (2.33)
where IWj,p is defined as a SOS1 variable such that only one interval is selected.
58
Algorithm. The proposed solution algorithm
1: Set LB←−∞, UB←+∞, iter←0, and ζ;
2: Reformulate problem (WDRO) to problem (WDRC);
3: While UB  LB  
4: iter ← iter + 1;
5: *
Solve problem (WDRC) with piecewise linear objective function, and obtain Qiter
and objective value OBJ*;
6: Update LB  max  LB, OBJ *  ;
7: Evaluate the original nonlinear objective value OBJ∆ using Qiter
*
;
8: Update UB  min UB, OBJ   ;
9: *
Add a new partition point at the candidate solution Qiter ;
10: End
11: Return the optimal solution
Figure 4. The pseudocode of the proposed reformulation-based branch-and-refine
algorithm for solving (WDRO) problem.
sf
Since the capital cost is underestimated when substituting Q j j with Ej, the resulting
MILP problem provides a valid lower bound. Note that the optimal solution of the MILP
problem, which can be regarded as a candidate solution, is feasible to the original
(WDRC) problem. Accordingly, a valid upper bound for the total annualized cost can
be obtained by calculating the original nonconvex objective value with the candidate
solution. The gap is then computed as the difference between the upper and lower
bounds and is utilized for determining whether a new partition point is needed to further
refine the piecewise linear approximation. Partition points are added at the candidate
solutions iteratively until the gap between the upper and lower bounds reaches below a
predefined tolerance. In Figure 4, we present the pseudocode of the developed solution
method for the global optimization of the (WDRO) problem in detailed steps. Note that
the tolerance for the optimality gap is denoted by ζ. The number of partition points
59
increases by one only for those selected technologies in each iteration of the solution
algorithm.
2.5 Case studies
To demonstrate the effectiveness of the proposed data-driven Wasserstein DRO based
network design approach and the solution algorithm, we consider a specific biomass
with agricultural waste-to-energy network design in this section. Optimal solutions are
found and validated through the optimization process of the network design problem
based on the proposed method.
2.5.1 Case description
In the considered biomass with agricultural waste-to-energy network, there are 216
processing and upgrading technologies as well as 172 materials/compounds. In the case
study, we consider the energy market, which involves the demands for biodiesel,
gasoline, ethanol, methane, and biogas. The problem parameters for technologies, such
as mass balance coefficients, generally are not influenced by the geographical region,
whereas the price parameters hold for the geographical region of USA. Note that the
datasets of problem parameters used in this work can be found in the recently published
papers [240, 252]. Thus, the data reflects the recent trend in terms of its age. Because of
market fluctuations, feedstock prices are subject to uncertainty. Since the feedstock
price data are well-documented [253], we use real price data for the case study.
We also implement the deterministic optimization method and the conventional two-
stage stochastic programming method using the same price data as scenarios, in addition
to the proposed data-driven Wasserstein DRO approach for the purpose of comparison.
60
All optimization problems are modelled in GAMS 25.0.3 [254]. The computational
experiments are performed on a computer with an Intel (R) Core (TM) i7-6700 CPU @
3.40 GHz and 32 GB RAM. In each iteration of the developed solution algorithm, an
MILP problem is solved with the solver CPLEX 12.8.0. The optimality gap for CPLEX
12.8.0 is set to be 0, and the optimality tolerance for the reformulation-based branch-
and-refine algorithm is 10-6. In the case studies, the radius of Wasserstein ball θ is
obtained through cross validation [124]. The lower and upper bounds of uncertain
parameters in support set Ξ are estimated using the empirical bounds, which are directly
obtained from the training data. Note that the average price is used as a nominal value
for the deterministic optimization method. For the stochastic programming and DRO
approaches, price uncertainty data, which represent possible price realizations, are used
in their corresponding optimization problems. At the end of the case study, a sensitivity
analysis is performed to investigate how the value of θ influences the Wasserstein DRO
solution.
2.5.2 Results and discussions
The problem sizes and computational results of different methods are summarized in
Table 1. A total of 12 training samples is used for these optimization methods. From the
table, it can be observed that the number of continuous decision variables and the
number of constraints in the reformulated problem (WDRC) are both larger than those
in the two-stage stochastic program. This is because auxiliary variables and constraints
are introduced to reformulate the worst-case expectation problem. Although the mixed-
integer optimization problem for biomass network design is NP-hard, it can be solved
61
within a reasonable amount of time empirically. From Table 1, we can see that the data-
driven Wasserstein DRO problem takes about 50% longer computational time to solve
compared with the deterministic optimization and stochastic programming methods.
Note that the biomass with agricultural waste-to-energy network design problem, which
includes both continuous and discrete decision variables, is a NP-hard problem.
Although the stochastic program and the DRO problem have a similar number of
decision variables and constraints, their model structures are quite different, leading to
different computational times. The successive piecewise linear approximation is
employed to solve the two-stage stochastic program with a nonconvex objective
function. The objective value determined by the conventional stochastic programming
method is $16.37 MM, whereas the objective value determined by the proposed data-
driven Wasserstein DRO approach is $21.67 MM. The reason is that the conventional
stochastic programming method minimizes the expected total cost based on a single
empirical distribution, while the data-driven Wasserstein DRO approach aims for the
lowest worst-case expected cost with respect to a family of candidate feedstock price
distributions. The expected value of perfect information (EVPI) is an important criterion
in decision making under uncertainty, and is used to measure the largest amount a
decision maker would be willing to pay in return for perfect information [45]. Given its
definition, EVPI is suitable for the conventional stochastic programming method instead
of DRO, because the true probability distribution required to calculate the expectation
is not known. In this case, the EVPI is $0.42MM for the stochastic programming
method. As mentioned before, the conventional stochastic programming method can be
62
considered as a special case of the proposed data-driven DRO approach, when the radius
of the Wasserstein ball is set to be zero.
Table 1. Comparisons of problem sizes and computational results of the deterministic
optimization, the conventional stochastic programming method and the data-driven
Wasserstein DRO approach.
Binary Continuous Constraints Objective CPU time (s)

variables variables value ($MM)
Deterministic 216 777 1,165 16.36 6.0
optimization
Stochastic program 216 6,937 9,217 16.37 6.1
WDRC 216 7,094 9,373 21.67 9.7
To demonstrate the advantages of the proposed approach, we perform an out-of-sample
simulation with the testing feedstock price data. The testing dataset consists of different
feedstock price realizations which are obtained from the same source [253]. The number
of testing samples is 60 in this case study. For each method, we calculate their average
cost, worst-case cost, best-case cost, and standard deviation of cost under different
uncertainty scenarios. These statistics of simulation results are summarized in Table 2.
It can be seen from the table that the average cost and the worst-case cost of the proposed
data-driven DRO approach using Wasserstein metric are 5.7% and 17.4% lower than
those determined by the conventional stochastic programming method, respectively.
Additionally, compared to the stochastic programming method, the data-driven
Wasserstein DRO approach leads to a biomass with agricultural waste-to-energy
network design that is less sensitive to feedstock price variations. Specifically, the costs
under different feedstock price scenarios determined by the data-driven Wasserstein
DRO approach feature a 37.1% smaller standard deviation ($3.41MM vs $5.42MM)
63
than its stochastic programming counterpart. The simulation results clearly demonstrate
that the data-driven Wasserstein DRO approach compares favorably against the
deterministic optimization method and the conventional stochastic programming
method in terms of the out-of-sample performance. To showcase the difference between
the stochastic programming and DRO, we present the empirical probability distributions
of total cost for both methods in Figure 5.
conventional stochastic programming method and the data-driven Wasserstein DRO
approach.
Average cost Worst-case Best-case cost Standard deviation

($MM) cost ($MM) ($MM) ($MM)
Deterministic 22.53 33.38 11.98 5.42
optimization
Stochastic program 22.53 33.38 11.98 5.42
Data-driven WDRO 21.24 27.57 14.42 3.41
Figure 5. The empirical probability distributions of total cost for (a) the stochastic
programming method, (b) the proposed data-driven Wasserstein DRO approach.
The network designs determined by the conventional stochastic programming method
and the proposed data-driven WDRO approach are presented in Figure 6 and Figure 7,
64
respectively. Note that the deterministic optimization method, which uses the average
value of feedstock prices, generates the same optimal network design as the stochastic
programming method. The similar results generated by the deterministic optimization
and stochastic programming methods are ascribed to the specific parameter setup and
ranges of uncertainties in this specific case study. For all these methods, the biomass
feedstock of soybeans is selected to produce biodiesel through technologies of handling
and extraction, transesterification and distillation. Polyhydroxybutyrate (PHB – a
biodegradable plastic) is produced as a byproduct. Soybeans are chosen in the biodiesel
process producing glycerol, which can be used to synthesize PHB. The pyrolysis of
switchgrass is selected in the network because of its ability to produce raw bio-oil. This
type of bio-oil can be transformed into a number of products [255], boosting the
process’s flexibility. Fermentation of cassava is also selected by both methods to make
ethanol. Note that the anaerobic digester (AD) converts dairy manure into biogas, which
can be used as a fuel source or further utilized as a material in other chemical reactions
[256]. As shown in Figure 6 and Figure 7, the best way to produce biogas from dairy
manure is through horizontal plug flow AD. Meanwhile, municipal solid waste is used
to produce methane via landfill methane extraction. By comparing the optimal
processing pathways in Figure 6 and Figure 7, the pathway of glycerol to isobutanol is
selected only in the optimal network determined by the stochastic programming method.
65
Figure 6. The optimal bioconversion network design determined by the stochastic
programming approach. The optimal production capacity is displayed under processes.
66
Figure 7. The optimal bioconversion network design determined by the data-driven
Wasserstein DRO approach. The optimal production capacity is displayed under
processes.
The details on cost breakdowns, including capital cost, operating cost, and feedstock
cost are shown in Figure 8. From the donut charts, we can see that more than half of the
total annualized cost comes from purchasing feedstocks for both the stochastic
programming method and the Wasserstein DRO approach. Additionally, the percentage
of the feedstock cost determined by the stochastic programming method is 3% higher,
because a larger quantity of soybeans is purchased in the optimal network design. For
both optimization methods, the capital cost contributes to the second largest portion,
67
meaning that the selection and capacities of technologies play a critical role in lowering
the total annualized cost.
Figure 8. Cost breakdowns determined by (a) the stochastic programming method, (b)
the proposed data-driven Wasserstein DRO approach.
To take a closer look at the capital costs of different approaches, we present the capital
cost distributions determined by the stochastic programming and Wasserstein DRO
methods in Figure 9 (a) and (b), respectively. From Figure 9 (a), we can see that the
majority of capital cost (71.1%) determined by the stochastic programming method
comes from landfill methane extraction and glycerol to isobutanol process. It indicates
that the processing pathways used to produce methane and isobutanol are expensive to
build. Switchgrass pyrolysis accounts for 6.8% of the capital cost, showing that this
technology is a costly component in the pathway from switchgrass to diesel and
gasoline. As for the capital cost distribution determined by the data-driven Wasserstein
DRO approach, we can see from Figure 9 (b) that the landfill methane extraction
contributes to 61.7% of the capital cost and accounts for the largest portion, which is
similar to the result of the stochastic programming method. The second largest cost
68
(9.4%) comes from switchgrass pyrolysis, thus again showing the significance of this
technology in the switchgrass processing pathway. Additionally, technologies regarding
the cassava, including cassava peeling and crushing, cassava fermentation, and cassava
distillation, together account for merely 2.2% of the capital cost, implying that the
pathway producing ethanol from cassava is economically favorable. Note that the ratios
between capital investment costs for the two optimization methods can be different. The
reason for different ratios is that the optimal capacities of some technologies obtained
by the stochastic programming method and the DRO approach are not the same.
Figure 9. Capital cost distributions determined by (a) the stochastic programming
method, (b) the proposed data-driven Wasserstein DRO approach.
Following the existing literature [252], the discount rate is set to be 10%. To investigate
the impact of the discount rate on the computational results of the Wasserstein DRO
approach, we conduct a sensitivity analysis and present the result in Figure 10. From
the figure, we can see that the objective value of the DRO approach increases by 18.0%
when the discount rate changes from 5% to 10%. Additionally, the objective value
grows by 17.3% if we further increase the discount rate from 10% to 15%. Note that the
69
optimal investment decisions, i.e. technology selection and capacity, do not change
when the discount rate increases from 5% to 15%.
Figure 10. Sensitivity analysis of discount rate for the data-driven Wasserstein DRO
approach.
To investigate how the in-sample objective value, out-of-sample average cost, and
computational time of (WDRO) change with the radius of Wasserstein ball, we perform
a sensitivity analysis and present results under different values of parameter θ in Figure
11. The value of θ specifies the size of the Wasserstein ambiguity set, so the decision
maker can use it to adjust the level of conservatism. The ambiguity set encapsulates
more probability distributions as the value of θ increases, meaning that more
distributions are hedged against in the (WDRO) model. Since the (WDRO) model
optimizes the worst-case expected cost with respect to the ambiguity set, increasing the
value of parameter θ leads to a higher in-sample objective value as observed in Figure
11. Additionally, we can observe that the out-of-sample average cost corresponding to
70
the testing samples decreases from $22.53MM to $21.24MM, when the radius of
Wasserstein ball changes from 0.01 to 0.03. When the radius further increases from 0.03
to 0.15, the out-of-sample performance in terms of average cost remains the same. The
optimal Wasserstein radius obtained from cross-validation is 0.25, which results in the
same out-of-sample average cost ($21.24MM) as the radii that perform best on the
testing data in Figure 11. From the orange line in Figure 11, we can see that increasing
the radius of Wasserstein ball does not add computational burden, and that the
computational time for solving the corresponding (WDRO) problem varies from 7.1 s
to 15.8 s.
Figure 11. Sensitivity analysis of the in-sample objective value, out-of-sample average
cost, and computational time with different radii of Wasserstein balls.
To demonstrate the efficiency of the proposed solution algorithm, we display the upper
and lower bounds in each iteration of the algorithm for the instance with Wasserstein
radius of 0.1 in Figure 12. In this figure, the green dots represent the upper bounds, and
71
the yellow circles stand for the lower bounds. The X-axis represents the iteration
number, and the Y-axis denotes the objective function values. From the figure, it can be
seen that the relative optimality gap decreases significantly from 57.9% to 9.8% during
the first two iterations. The reformulation-based branch-and-refine algorithm takes only
three iterations to reduce the relative optimality gap to 0.0%. The result demonstrates
that this solution algorithm works in solving the (WDRO) network design problem.
Figure 12. Upper and lower bounds in each iteration of the reformulation-based
branch-and-refine algorithm for global optimization of the (WDRO) problem in the
case study.
To further explore the impacts of the number of training uncertainty data on the
computational results, we increase the amount of training samples used in the
optimization methods. In this case study, the number of training samples N increases
from 12 to 100. Specifically, we consider a case study in which 100 feedstock price
uncertainty realizations are used in the optimization problems, and another 100
72
uncertainty data are utilized for testing their out-of-sample performances. As the number
of training data increases, the computational times of both stochastic programming and
data-driven Wasserstein DRO methods grow to 14.9 seconds and 30.5 seconds,
respectively. The size of training samples does not influence the problem size of
deterministic optimization, which utilizes the average of training data as the nominal
values of parameters. By contrast, the size of training data has an impact on the problem
sizes of stochastic programming and data-driven Wasserstein DRO. This is because the
number of constraints and continuous variables for both methods increases, as the
amount of training samples grows. The results of the out-of-sample performance for
different methods are plotted in Figure 13, where the green diamonds denote the
Wasserstein DRO solution, and the orange circles represent the stochastic programming
solution. For clear visualization, their average costs over all testing scenarios are
represented as the horizontal lines in the figure. Additionally, the statistics of out-of-
sample performances under the larger amount of uncertainty data are summarized in
Table 3. As can be seen from the table, the data-driven Wasserstein DRO approach still
outperforms the conventional stochastic programming method by lowering the average
cost by 3.8% for the testing dataset. We investigate the dependence of this average cost
reduction as well as the dependence of standard deviation reduction on the number of
testing samples, and present the results in Figure 14. It can be observed from the figure
that both the average cost reduction and the standard deviation reduction change slightly
when the number testing samples is above the threshold of 60.
73
Figure 13. Out-of-sample performance of stochastic programming and the proposed
Wasserstein DRO method based on the testing of 100 uncertainty scenarios.
conventional stochastic programming method and the data-driven WDRO approach
when the number of training data N=100.
Average cost Worst-case Best-case cost Standard deviation

($MM) cost ($MM) ($MM) ($MM)
Deterministic 20.43 33.38 10.39 5.94
optimization
Stochastic program 20.43 33.38 10.39 5.94
Data-driven WDRO 19.66 27.57 13.04 3.90
74
Figure 14. The dependences of the average cost reduction and standard deviation
reduction on the number of testing samples.
2.6 Summary
In this work, we proposed a data-driven two-stage Wasserstein distributionally robust
optimization model for the biomass with agricultural waste-to-energy network design
subject to feedstock price uncertainty. Based on the Wasserstein metric and support set,
the data-driven ambiguity set was constructed that encompassed all candidate
distributions of feedstock price. This ambiguity set was formulated as the Wasserstein
ball with a variable radius, which granted more flexibility in adjusting the level of
conservatism compared with other moment-based ambiguity sets. The resulting
Wasserstein distributionally robust optimization approach not only endowed the
operational decisions with full adaptability, but also hedged against the distributional
ambiguity by considering the worst-case expected cost. To improve its tractability, we
75
derived an equivalent distributionally robust counterpart for the network design problem
by taking advantage of various reformulation techniques to handle the worst-case
expectation. A case study on the biomass with agricultural waste-to-energy network
design was presented. Computational results showed that increasing the size of
ambiguity set did not result in more computational time, and that the proposed method
worked under a large amount of training data. As for the out-of-sample performance,
the proposed approach compared favorably against both deterministic optimization and
stochastic programming by achieving a lower average cost. Specifically, through cross
validation, the optimal Wasserstein radius was tuned to be 0.25, which generated the
best out-of-sample average cost of $21.24MM on the testing samples. The advantage of
the distributionally robust optimization approach lies in its robustness to hedge against
distributional ambiguity. If the underlying true distribution is invariant and the number
of training samples is large enough, the ambiguity of probability distribution is
insignificant. Therefore, the advantage of using ambiguity set, whose size is nonzero, is
not evident. However, when the number of training samples is limited or the probability
distribution is time-variant, the distributional ambiguity is remarkable. In this case, the
merit of the distributionally robust optimization approach becomes more manifest. The
dependence results of average cost reduction and standard deviation reduction on the
number of testing samples showed that these reduction values remained relatively stable
when the number testing samples was above 60.
76
2.7 Appendix: Derivation of Wasserstein distributionally robust
counterpart
For the ease of exposition, we present (WDRO) model for biomass with agricultural
waste-to-energy network design in the following abstract form, in which vectors and
matrices are introduced.
min f  x   max   l  x, ξ  

x X 
s.t. Ax  g (2.34)
min cTy y   Gξ T y

l  x, ξ    y
 s.t. Wy  h  x 
where x denotes the vector of all first-stage decisions including Yj and Qj; y is the vector
of second-stage decisions Wj, Pi, and Si; f(x) represents the first-stage cost; ξ is the vector
of uncertain parameters c3,i; and l(x, ξ) presents the second-stage cost. Note that the
second-stage cost is divided into two parts, namely the deterministic cost c Ty y
unaffected by uncertainty and the random cost (Qξ)Ty depending on the specific
uncertainty realization.
In the following paragraphs, we explain the reformulation of the worst-case expectation
max   l  x, ξ   step-by-step. Based on the definition of data-driven Wasserstein


ambiguity set in (2.2), we can re-express the worst-case expectation max   l  x, ξ  

as the following optimization problem (2.35)-(2.37) for any feasible first-stage
decisions [245]:
1 N
max
i 
 l  x, ξ  n  dξ 
N n1 
(2.35)
77
1 N
s.t. 
N n1 
ξ  ξ  n n  dξ    (2.36)
1 1
 n  dξ   , n  N (2.37)
N  N
where  denotes the set of measures, and n is the conditional probability
distribution of ξ1=ξ given that ξ2=ξ(n).
According to the strong duality of the generalized moment problem [257], we can obtain
its dual optimization problem given by (2.38)-(2.40).
1 N
min    +
 , sn
 n
N i 1
(2.38)
s.t. l  x, ξ     ξ  ξ  n    n , ξ  , n  N d (2.39)
0 (2.40)
where λ and τn are the dual variables corresponding to constraints (2.36) and (2.37),
respectively.
Since constraint (2.39) holds for any uncertainty realizations within the support set Ξ,
we can reformulate constraint (2.39) below.
max l  x, ξ     ξ  ξ      n , n  N d
n
(2.41)
ξ  
The left-hand side of constraint (2.41) can be re-expressed by,
ξ  zn * 

max l  x, ξ   max z n T ξ  ξ  
n


 max min l  x, ξ   z n T ξ  ξ  n 
ξ z n *   
 (2.42)
 min max l  x, ξ   z n T
z n *   ξ 
ξ  ξ  
n
78
where  * denotes the dual norm and zn is the introduced decision variables. Since l1
norm is adopted in this work, the corresponding dual norm is l∞ norm. Note that the
second equality is based on the minimax theorem [124].
Based on (2.41) and (2.42), we further reformulate constraint (2.41) as follows.
max l  x, ξ   z n T ξ  ξ 
ξ 
 n
   , n  N
n d (2.43)
zn *
  , n  N d (2.44)
For the ease of derivation, we express the support set (2.3) in the following compact
matrix form:
  ξ Cξ  d (2.45)
I   ξ max 
where C    and d   min  .
 I  ξ 
According to the definition of l(x, ξ) in (1.6), we further reformulate the left-hand side
of constraint (2.43) below.

max 
 
min
ξξ Cξ d y n  y n Wy n  h x     
cTy y n   Gξ T y n   z n T ξ  ξ  n  



 min max  cTy y n   Gξ  y n  z n T ξ  ξ   
y n y n Wy n  h x  ξξ Cξ d 
T n
   (2.46)
 min min d T γ n  cTy y n  z n Tξ  n  

  
y n  y n Wy n  h x  γ n  γ n CT γ n G T y n  z n  
where γn represents the vector of dual variables corresponding to constraints Cξ  d .
Note that the first equality in (2.46) is due to the minimax theorem [124], and the second
equality holds because of the strong duality of linear programs.
79
By substituting z n  G T y n  CT γ n into constraints (2.43) and (2.44), and replacing the
left-hand side of constraint (2.43) with (2.46), we equivalently reformulate the (WDRO)
model (1.6) to (2.47).
N
1
min
x ,  , sn , n , y n , γ n
f  x     +
N

i 1
n
s.t. Ax  g, x  X
Gξ    
T T
y n  cTy y n  d  Cξ 
n
γ n   n , n  N d
n
(2.47)
Wy n  h  x  , n  N d
G T y n  CT γ n   , n  N d
*
γ n  0, n  N d
2.8 Nomenclature
Sets
I set of compounds, indexed by i
J set of technologies, indexed by j
Nd set of training uncertainty data, indexed by n
P set of partition points, indexed by p
Parameters
a1,j lower bound of the capacity of technology j
a2,j upper bound of the capacity of technology j
bi availability of compound i
c1,j coefficient for economic evaluation
c2,j coefficient for economic evaluation
c3,i coefficient for economic evaluation of feedstock price
80
c4,i price of bioproduct i
κij conversion coefficient of compound i in technology j
θ radius of the Wasserstein ball
Binary variable
Yj selection of technology j
SOS1 variable
PEj,p indicator for interval p of technology j
Continuous variables
Ej approximated nonlinear term of technology j
Pi quantity of feedstock i to purchase
PWj,p weighting factor for partition point p of technology j
Qj the capacity of technology j
Si quantity of compound i to sell
Wj operating level of technology j
81
CHAPTER 3
DEEP LEARNING BASED AMBIGUOUS JOINT CHANCE CONSTRAINED
ECONOMIC DISPATCH UNDER SPATIAL-TEMPORAL CORRELATED WIND
POWER UNCERTAINTY
3.1 Introduction
Economic dispatch (ED) is one of the most fundamental decision-making problems in
power systems operations [258]. It seeks to determine the optimal power output of
available generators for serving electricity demand with the minimum operating cost
[259]. With a pressing need to reduce carbon emissions from fossil fuels, the penetration
of renewable energy sources, especially wind power, into power grids has increased
rapidly in recent years. However, such a high penetration poses a threat to the security
and reliability of large-scale power systems due to the intermittency of renewable power
generation. Therefore, it is of particular importance to investigate the ED problem
subject to the wind power uncertainty [260].
Tremendous research efforts have been devoted to developing ED optimization under
uncertainty models for an effective utilization of wind power. Methods along this
direction can be broadly categorized into three paradigms, namely, robust optimization,
stochastic optimization, and distributionally robust optimization [261]. A robust multi-
period ED model was presented based on a dynamic uncertainty set, which modeled the
temporal and spatial correlations of wind output using linear systems [262]. By
regarding uncertainty sets as decision variables, a robust dispatch framework was
proposed to identify the do-not-exceed limit of renewable generations [263]. In [264],
the storage device dynamics and renewable energy variability were considered in an
82
affinely adjustable robust multi-period power dispatch. A two-level robust method was
developed to address the multi-microgrid ED problem subject to wind power and tie-lie
disconnection uncertainties [265]. To reduce the curtailment of wind power, an interval
output schedule of wind farms and a set-point schedule of conventional generators were
introduced in a look-ahead robust ED formulation [266], as well as in an adaptive robust
formulation [267]. The aforementioned robust ED methods typically suffer from the
issue of conservatism, because they aim to immunize against the worst-case realization
and fail to leverage the available information of wind power distribution.
An alternative approach was stochastic ED [268, 269], which mitigates the conservatism
by exploiting the probability distribution of renewable energy generation. The
percentage of wind energy utilization in ED was ensured by casting it as chance
constraints [270]. In [271, 272], versatile distribution models were suggested and
applied to the stochastic ED problem. To cope with non-Gaussian distributions and
spatial correlations among multiple wind farms, the Gaussian mixture model was
introduced to chance constrained ED problems [273, 274]. In [275], beta kernel density
representation was employed to model wind power distribution in a look-ahead ED
problem with individual chance constraints. To account for joint chance constraints in
power dispatch, an iterative bounding technique was developed using support vector
classification [276]. Recently, chance constrained ED problems were handled with a
scenario approach [277, 278]. In this scenario-based ED method, a quantifiable risk of
constraint violation was guaranteed with a sufficiently large number of uncertainty data
drawn from the underlying true distribution. The above research works on stochastic
ED heavily hinge on the accurate knowledge regarding wind power distributions.
83
However, such perfect information on the probability distribution of uncertain wind
power is far-fetched in practice. Instead, power system operators typically only have
access to a certain amount of renewable energy data for the ED problem.
Nowadays, data-driven optimization under uncertainty is gaining tremendous popularity
[221], and it emerges as a promising paradigm for decision making in electric power
systems [136, 279-282]. The distributionally robust chance constrained ED framework
is one of those attempts, aiming to take advantage of both robust optimization and
stochastic programming. Rather than assuming an exactly known distribution, it
constructs a family of probability distributions, called the ambiguity set, from a finite
number of data. For distributionally robust power dispatch problems, ambiguity sets for
renewable energy were commonly characterized by moment information [283-287], as
well as additional unimodality information [288]. By using both mean and covariance
to describe the ambiguity set of wind power, a distributionally robust ED model was
proposed for hydro-thermal-wind power systems [289]. To further improve the
flexibility of power systems, the co-optimization of ED and the do-not-exceed limit was
cast as a distributionally robust joint chance constrained program, which focused on the
marginal distribution information without incorporating correlations [290]. A two-sided
distributionally robust chance constraint on transmission line capacity limit, which was
then reformulated as second-order cone constraints, was proposed in [291]. A robust
chance constrained power dispatch problem was investigated based on an ambiguity set
consisting of Gaussian distributions whose mean and variance were within certain
ranges [292]. The co-dispatch of energy, reserve and storage was suggested, and the
confidence bands of cumulative distribution function for a one-dimensional random
84
variable were used for constructing ambiguity sets [293]. In addition, another widely
adopted means of constructing an ambiguity set is via the notion of statistical distance
between probability distributions [137, 294-297]. The existing distributionally robust
chance constrained ED models primarily focus on individual chance constraints,
ignoring the spatial-temporal correlations of wind power uncertainty embedded within
different constraints. However, we should consider simultaneous constraint satisfaction
for the reliability of power systems, and incorporate the complicated, and likely
nonlinear, correlations in order to further reduce the conservatism of the optimal ED
solutions.
To fill the knowledge gap, we propose a novel data-driven ambiguous joint chance
constrained ED optimization framework by taking advantage of a powerful
unsupervised deep learning technique. To decipher the hidden spatial-temporal
correlations of wind power uncertainty, a class of generative adversarial networks
(GANs), namely f-GAN, is employed. Through a competition by a pair of neural
networks, namely generator and discriminator, the underlying true distribution is
implicitly modeled as a mapping represented by the generator network from a latent
space to an uncertainty space. Notably, GANs are extremely effective in learning
complex distributions of high-dimensional data samples, without presuming any
specific forms of probability distributions [298]. These salient features of GANs make
it desirable for optimization under uncertainty [221, 299]. Based on the extracted
uncertainty information, an ambiguity set for wind power distributions is devised as an
f-divergence ball in the probability space centered around the distribution embodied in
the generator network. Note that the f-divergence not only plays a key role in designing
85
a unified framework of deep GANs, but also in providing a natural way to characterize
the distance-based ambiguity set. Rather than disregarding uncertainty correlations and
enforcing ambiguous individual chance constraints separately, the proposed framework
accounts for the simultaneous satisfaction of multiple constraints. As a result, the
proposed framework is capable of greatly alleviating the conservatism of ED solutions.
The proposed ED optimization framework ensures that the worst-case probability of
violating constraints with regard to wind utilization is below a tunable risk level. To
facilitate its implementation in large-scale power systems, we develop an efficient
solution method by combining a reformulation technique and a scenario approach.
Additionally, a theoretical bound on the required number of wind scenarios generated
by f-GAN is established for the multiperiod ED problem. To reduce the number of
constraints in the resulting scenario program, a prescreening technique is further
developed through the exploitation of the ED problem structure. An illustrative six bus
and IEEE 118 bus test systems are used to demonstrate the effectiveness of the proposed
ED framework.
The major contributions of this chapter are summarized as follows.
 A novel deep learning based ambiguous joint chance constrained ED
optimization framework that accounts for the spatial-temporal correlations of
wind power;
 An f-divergence based ambiguity set for wind power distributions using f-GAN,
of which the training objective intimately aligns with the choice of divergence
for characterizing ambiguity set;
86
 A tailored solution method for ambiguous joint chance constrained ED problem
based on the integration of the reformulation technique with the scenario
approach;
 Theoretical bound on the data complexity of f-GAN for the ambiguous joint
chance constrained multi-period ED problem to guarantee constraint violation
within a tunable risk level.
3.2 Mathematical formulation
In this section, a distributionally robust or ambiguous joint chance constrained ED model
formulation considering intermittent wind power is presented. For the multi-period ED
problem, one schedules energy production and allocation at each time period to minimize
the total cost. The decisions include conventional thermal energy dispatch, wind power
dispatch, and load shedding amount. The available wind power is assumed to be
uncertain.
The multi-period ambiguous joint chance constrained ED optimization model is
formulated as follows. The objective of the ED problem is to minimize the total cost in
eq. (3.1). The total cost includes the operating cost of thermal units, as well as electricity
load shedding cost. Eq. (3.2) enforces the energy balance for each time period. The
minimum and maximum power output limits of each thermal unit are specified in
Constraint (3.3). Constraints (3.4)-(3.5) enforce the ramping rate limits of thermal units
for each time period. The capacity constraints for transmission lines are described in
Constraints (3.6)-(3.7). Constraint (3.8) specifies the non-negativity of ED decisions.
The ambiguous joint chance constraint (3.9) requires that, with a worst-case probability
87
of at least 1   , the power outputs of wind farms cannot exceed the random wind power
[133], and that the percentage of wind utilization is at least β. The satisfaction of these
two requirements is a random event because the available wind power is random and
wind power dispatch wbt is a scheduled quantity [270]. The wind power Wbt is subject to
uncertainty, and  is a set of its probability distributions or ambiguity set. A list of
indices/sets, parameters, and variables is given in the Nomenclature section. The
ambiguous joint chance constraint enjoys the following advantages. First, it provides a
stronger guarantee on overall power systems security than individual chance constraints
by enforcing simultaneous constraint satisfaction. Second, unlike robust optimization, it
enables a systematic trade-off between economic performance and the risk level of
constraint violation [32]. Additionally, compared with conventional joint chance
constraints, it is well capable of hedging against distributional ambiguity due to the finite
amount of available wind power data.
 
min   CiV pit   CbLS qbt  (3.1)
 i t
pit , wbt , qbt
b t 
 
s.t.   p it  wbt    Dbt   qbt , t (3.2)
b  iGb  b b
Pi min  pit  Pi max , i, t (3.3)
pit  pi ,t 1  RU i , i, t (3.4)
pi ,t 1  pit  RDi , i, t (3.5)
 
K   p bl it  wbt   Dbt  qbt    Fl , l , t (3.6)
b  iGb 
88
 
K   p
bl it  wbt   Dbt  qbt     Fl , l , t (3.7)
b  iGb 
pit , wbt , qbt  0, i, b, l , t (3.8)
 wbt  Wbt , b, t 

 
inf     1  (3.9)
  t wbt    b t Wbt

b

The ED problem is cast as the above ambiguous joint chance constrained program.
Notably, the joint feature of ambiguous chance constraints not only leads to a less
conservative model formulation, but also enables further incorporation of correlations
among uncertain wind power at different buses and time periods. By contrast,
conventional distributionally robust chance constrained ED focuses on ambiguous
individual chance constraints (3.10)-(3.11) for the ease of reformulation. Such
ambiguous individual chance constraints can be unreasonably conservative since the
constraints for each bus and each time period are respected separately.
 
inf  wbt  Wbt  1  ˆbt , b, t

(3.10)
 
inf   wbt     Wbt   1  ˆ (3.11)

b t b t 
Given the ambiguous individual chance constraints, the existing works typically
construct ambiguity set  by employing the mean and variance of wind power
generation. The knowledge gap to fill is a data-driven ambiguity set that is capable of
accurately capturing complicated spatial-temporal correlations among wind power
sources. Such an informative ambiguity set can be seamlessly integrated with the
developed ambiguous joint chance constrained ED formulation.
89
3.3 Deep learning based ambiguous joint chance constrained
economic dispatch optimization
In this section, we propose a novel deep learning based ambiguous joint chance
constrained optimization for ED with a high penetration of renewable wind energy. We
first present an introduction to the f-divergence and f-GANs. Then, we develop a deep
learning based ambiguity set of wind power distributions with the f-divergence. Finally,
a data-driven ambiguous joint chance constrained ED model is presented based upon
the mathematical formulation in Section 3.2.
3.3.1 The f-Divergence and f-GANs
In this subsection, we first present the f-divergence. To measure the discrepancy between
two probability distributions P and Q, we introduce the f-divergence, also known as ϕ-
divergence, as follows:
 dP 
Df  P Q   f   dQ (3.12)

 dQ 
where function f is a convex, lower semi-continuous function satisfying 𝑓 1 0. The
f-divergence is widely used in the field of information theory and machine learning. It is
general enough to encompass a wide variety of divergences, such as KL divergence
(𝑓 𝑟 𝑟𝑙𝑜𝑔 𝑟 ), total variation ( 𝑓 𝑟 |𝑟 1| ), Pearson χ2-divergence ( 𝑓 𝑟
𝑟 1 ), and among others.
The functions of the f-divergence in this work are two-fold. On one hand, the divergence
represented by eq. (3.12) provides a powerful way to characterize ambiguity sets. On the
other hand, it plays a critical role in defining the objective function for a f-GAN model.
90
In this sense, the f-divergence offers a unique vehicle to link the ambiguity set used in
ambiguous joint chance constraints with the deep learning model.
The promise of GANs is to decipher rich and hierarchical structures to characterize the
complicated wind power distribution, which exhibits spatial-temporal correlations. As a
powerful generalization of vanilla GAN, the f-GAN has achieved great success in
machine learning and computer vision [300]. The f-GAN model leverages the variational
formulation of eq. (3.12), which is presented below [301]:
 
D f  P Q   max  x ~ P   x     x ~ Q  f *   x   

(3.13)
where  denotes the expectation, and the maximum in the above variational formulation
is taken over all possible functions φ. The set of functions can be well approximated by
the expressive class of deep neural networks. Note that f * in eq. (3.13) represents the
convex conjugate function of f, which is defined as f *  t   sup u tu  f  u  .
In the f-GAN architecture, there are two deep neural networks, namely generator network
G and discriminator network T. Suppose the generator network is parametrized by θ,
while the discriminator network is parametrized by ω. The functions representing
generator and discriminator are denoted by G   and T    , respectively.
Generator: The generator network takes the noise vector Z with a known distribution Z
as input, and outputs data samples through up-sampling operations of deconvolutional
layers and fully connected layers. Note that Z is typically selected as uniform
distribution or Gaussian distribution. The goal of generator network G is to fool the
discriminator network by generating data samples that mimic the real wind power data.
This is accomplished by tuning parameter θ to transform distribution Z to some

91
distribution G that is “close” to the underlying true data distribution r in terms of the
f-divergence. The loss function of generator network G, denoted by LG, is given as
follows.
  
LG    Z  f * T  G  Z   

(3.14)
where LG is utilized to update parameter θ. A small value of LG implies that the data
samples produced by generator G look realistic through the eyes of discriminator T.
Discriminator: The generative model is pitted against the discriminator model. Given a
generator network, the real data samples and the samples generated by network G are fed
into the discriminator network T. Specifically, the discriminator network aims to
distinguish the generated wind power data from real ones. It employs a series of down-
sampling operations to generate a scale value T  x  , where x is either a sample from the
true data distribution or distribution G induced by the generator network. Let X be a
random variable drawn from the true data distribution r . The loss function that the
discriminator seeks to minimize is expressed by,
  
LT    X T  X     Z  f * T  G  Z   

(3.15)
where LT represents the loss function of the discriminator, and is employed to update
parameter ω. In light of eq. (3.13), the discriminator network T serves as a critic that
quantifies the f-divergence between r and G .
Formally, the f-GAN is framed as the following two-player minimax game with value
function V  G , T  :
    
min max V  G , T    X T  X     Z  f * T  G  Z   

(3.16)
92
where function T  x   g f  H   x   . The output activation function g f :   dom f * is
introduced to respect the domain of a conjugate function f* in eq. (3.16), while H   x 
is a function parameterized by ω that is exempt from output restriction.
The competition between the generator network and the discriminator network in the
above minimax game drives both deep neural networks to refine their model parameters.
Eventually, wind power data produced by the generator network become
indistinguishable from the true data [300].
Notably, f-GAN is general enough to incorporate several types of GANs. For example,
the vanilla GAN can be regarded as a special case when f-divergence is chosen as the
Jensen-Shannon divergence [302]. Additionally, by feeding random noise vectors, the
generator network in f-GAN is capable of efficiently sampling “realistic” wind power
data from a probability distribution induced by the weights of the feedforward neural
network. This salient feature is of particular significance in solving the resulting
ambiguous joint chance constrained ED problem using a scenario approach.
3.3.2 Deep learning based ambiguity set
In this subsection, we develop a novel ambiguity set for wind power distributions based
on f-GANs. Given infinite capacities of networks G and T, and access to an infinite
amount of wind power data, distribution G is exactly the same as distribution r from
a theoretical point of view [302], and their f-divergence reduces to zero accordingly.
However, in practice, probability distribution G induced by generator network G
might not be perfect due to the finite amount of available training wind power data.
93
Therefore, one needs to construct a family of distributions based on the information
obtained by deep learning, rather than relying on a single probability distribution.
To hedge against the ambiguity of wind power distribution, we develop a set of
distributions using the f-divergence and the deep generator network in f-GAN, which is
expressed by,

  w   D f  w G     (3.17)
where w denotes the probability distribution for the random vector of wind power Wbt,
 represents the set of all probability distributions, and ρ is the divergence tolerance
or radius of divergence ball. Parameter ρ can be used to adjust the size of the ambiguity
set, which reflects the risk-aversion level of decision-makers.
The developed ambiguity set is devised as a f-divergence ball centered around
distribution G induced by the generator network in f-GAN. Unlike typical divergence-
based ambiguity sets that adopt the discrete empirical distribution as a reference
distribution, the proposed ambiguity set utilizes the continuous wind power distribution
induced by generator network, namely G , as a reference. As a result, the proposed one
is effective in encompassing the continuous underlying true distribution r by avoiding
the possession of discrete support.
The proposed deep learning based ambiguity set enjoys several advantages. First,
compared with conventional moment-based ambiguity set, the proposed deep learning
based ambiguity set can accurately capture the spatial-temporal correlations of wind
power at different buses and time periods. Second, the proposed method makes no
assumption on the specific form of wind power distributions owing to the power of deep
94
generative modeling. Another nice feature is stated as follows: if a certain f-divergence
is chosen in f-GAN to learn distribution G , the same type of f-divergence can be
employed for constructing an ambiguity set. Specifically, if χ2-divergence is adopted in
f-GAN, we know from the previous subsection that the learned distribution G should
be quite close to the true data distribution r in terms of χ2-divergence. Thus, employing
the χ2-divergence to characterize the ambiguity set, rather than using other types of f-
divergence, potentially mitigates the conservatism issue of the corresponding
distributionally robust ED solutions.
3.3.3 Data-driven joint chance constrained ED model
Equipped with the data-driven ambiguity set (3.17), a deep learning based ambiguous
joint chance constrained ED optimization model, denoted by (DL-AJCC-ED), is
formulated as follows. For the ease of exposition, the model is represented in a compact
form with matrices and vectors.
min cT x (3.18)
s.t. x  S (3.19)
 
 ai  x  ξ  bi  x  , i  1   ,   
T
(3.20)
where x denotes the vector of decision variables including pit, wbt, and qbt; c denotes the
vector of cost coefficients, and ξ is the vector of uncertain wind power. The objective
function (3.18) represents eq. (3.1), while set S stands for a domain defined by
deterministic constraints without uncertain parameters, including constraints (3.2)-(3.8)
. Notably, data-driven joint chance constraint (3.20) corresponds to constraint (3.9),
95
leveraging the deep learning based ambiguity set  .
The ED problem with a large penetration of wind power is formulated as an ambiguous
joint chance constrained program. However, the resulting optimization problem cannot
be solved directly by any off-the-shelf optimization solvers, since constraint (3.20)
involves an infinite number of probability distributions.
3.4 Solution methodology
To address the computational challenge of solving large-scale (DL-AJCC-ED) problem,
we develop an efficient solution method that integrates reformulation and constraint
sampling techniques in this section.
3.4.1 Ambiguous joint chance constraint reformulation
For constraint (3.20), the corresponding joint chance constraints must be satisfied for all
probability distributions within the deep learning based ambiguity set. Therefore, we
consider the worst-case distribution, and ensure that the joint chance constraints are
respected under the worst-case distribution, as dictated in (3.21).

 T

inf  ai  x  ξ  bi  x  , i  1   (3.21)
where the infimum is taken over the wind power distributions in the deep learning based
ambiguity set.
Since the χ2-divergence is appropriate for small risk levels [280], we employ it for
training f-GAN and constructing the ambiguity set for the rest of this chapter. The
ambiguous joint chance constraint (3.21) can be further reformulated based on the
following proposition, of which the proof can be found in [152].
96
Proposition 3.1: Given a χ2-divergence based ambiguity set, the ambiguous joint chance
constraint (3.21) is respected if and only if the following classical joint chance constraint
(3.22) is satisfied.
 
G ai  x  ξ  bi  x  , i  1   
T
(3.22)
where   denotes the adjusted risk level, given as follows.
 2  4      2   1  2  
    (3.23)
2  2
By applying Proposition 1, we transform the ambiguous joint chance constrained
program (DL-AJCC-ED) given in eq. (18) -(20) into an ambiguity-free counterpart,
given in eq. (18), (19) and (22), with probability distribution G induced by the deep
generator network.
3.4.2 Scenario approach and sample complexity
Since f-GAN is an implicit generative model, there is no explicit expression for
distribution G . As a result, it is computationally challenging to further derive an
analytical reformulation for constraint (3.22). To address this challenge, we leverage the
efficient sampling power of GANs, and employ the scenario approach.
The scenario approach, a.k.a. constraint sampling, has been widely used in a variety of
optimization problems. The key idea of the scenario approach is to draw independently
identically distributed random scenarios from the probability distribution and enforce
the constraints with respect to all sampled uncertainty scenarios. It is worth noting that
the scenario approach is well suited for the deep learning based joint chance constraint
97
(3.22), because it only requires the scenario sampling from G rather than the explicit
expression of wind power distribution. Therefore, the scenario approach facilitates the
approximate solutions of the resulting ambiguity-free joint chance-constrained program.
 
Suppose ξ G1 , , ξ G K  are the generated wind power data produced by the generator
network in f-GANs, where K represents the number of data samples. Thus, constraint
(3.22) can be approximated by,
ai  x  ξG   bi  x  , i, k   K 
T k
(3.24)
where set  K   1, 2, , K  .
The scenario approach provides a theoretical guarantee that, with a sufficiently large
value of K, the optimal solution of the corresponding scenario program satisfies the
ambiguity-free chance constraints (3.22) with a high probability. By exploiting the
structure of the multi-period ED problem, we explicitly derive the data complexity for
the generator network in f-GAN, as provided in the following proposition.
Proposition 3.2: Given a confidence level 1     0,1 , and risk level    0,1 for the
(DL-AJCC-ED) problem. If K  N   ,   , the optimal solution of the corresponding
scenario program is guaranteed to satisfy constraint (3.21) with a probability of 1   .
The explicit expression of N   ,   is given in (3.25).
 1 
4  ln  BW  NT  1
N  ,       (3.25)
 2  4      2   1  2  
2 
 1
where BW represents the number of buses having wind farms, NT denotes the total
98
number of time periods, and ρ is the radius of the deep learning based ambiguity set.
Proof. The required number of wind power samples is dependent on the number of
support constraints. Note that the support constraints are defined as those constraints,
the removal of which changes the optimal solution. For a fixed b and t, constraint
wbt  Wbt is active for the scenario with the smallest wind power value at bus b and time
t, among K samples. Similarly, for constraint w

b t bt    b t Wbt , it is supported
by the scenario with the largest total wind power over all buses and time periods.
Therefore, the number of support constraints is at most BW  NT  1 for the multi-period
ED problem.
Based on the notion of support constraints [277, 303], we could obtain the bound on the
data complexity below.
2 1 
N  ,     ln  BW  NT  1 (3.26)
   
According to eq. (3.23), we further have eq. (3.25) when χ2-divergence is employed,
which completes the proof. □
Remark 3.1 The sample complexity in eq. (3.25) is for the generator network in f-GANs.
In contrast to the finite amount of real historical wind power data, the unique merit of f-
GAN lies in its capability of efficiently generating as many data samples as required in
eq. (3.25).
Remark 3.2 We can further leverage a prescreen technique described as follows. After
sampling N(, ) wind power scenarios from the generator network, instead of putting
all scenarios into (3.24), a prescreening technique can be leveraged to select the most
critical uncertainty scenarios. Specifically, the lowest levels of wind power at each bus
99
and time period, as well as the highest level of total wind power over all BW buses and
NT time periods are selected beforehand among N(, ) generated data. This technique
greatly increases the computational efficiency of solving the (DL-AJCC-ED) problem
without affecting the solution quality of the optimal ED decisions.
3.5 Computational experiments
In this section, case studies on the six-bus and IEEE 118-bus systems are presented. To
demonstrate the advantages of the proposed approach, we also implement the
conventional distributionally robust chance constrained ED (DRCCED) method with an
ambiguity set based on first and second-order moment information of 𝑊 , of which the
model summary is given as follows.
min eq. (1)
s.t. eq. (2)-(8), (10)-(11)
All optimization problems are solved with CPLEX 12.8.0, implemented on a computer
with an Intel (R) Core (TM) i7-6700 CPU @ 3.40 GHz and 32 GB RAM. The optimality
tolerance for CPLEX 12.8.0 is set to 0. In case studies, the confidence level is set to be
99.9%, ρ is 0.05, and the risk level is set to be 10%.
3.5.1 Illustrative six-bus system
The six-bus system has three conventional thermal generators and 11 transmission lines
[27]. Additionally, three wind farms are installed at Buses 1-3. To promote a high
penetration of wind energy, the percentage of wind utilization β is set to be 30%. The
load shedding cost is set to be $5/MW [270]. The wind power data we use comes from
the NREL Wind Integration Dataset [304].

100
The system diagram of the six-bus system is shown in Figure 15 [1, 2]. From Figure 15,
we can see that there are three conventional thermal generators and 11 transmission lines.
Additionally, three wind farms are installed at Buses 1-3, as illustrated in Figure 15.
Figure 15. The schematic of the six-bus system.
The training process of the f-GAN in case study on the six-bus system is shown in Figure
16. In Figure 16, the x-axis represents the number of iterations, while the y-axis denotes
the value of losses. From Figure 16, we can readily observe that it achieves a fast
training speed at the beginning of the training process. As the training moves on, the f-
divergence between real wind power distribution and the generated distribution
approaches to zero. This implies that the wind data generated by the generator neural
101
network look as realistic as the real ones, and that they cannot be distinguished by the
discriminator network.
Figure 16. The training process of f-GAN.
Due to the high wind utilization percentage of 30%, the resulting DRCCED with
moment information turns out to be infeasible with both constraint (10) and constraint
(11). For the ease of comparison, we implement different ED methods without
constraint (11) regarding the efficiency of wind utilization. The number of wind power
scenarios used in the proposed approach is determined by Proposition 2. According to
eq. (3.25), N(, )=3,102.48 and the number of scenarios is chosen to be 3,103.
The computational results are provided in Table 4. Compared with DRCCED with
moment information, the proposed ED problem has a larger number of constraints
because it introduces a set of constraints for each generated wind power scenario. As a
result, it consumes 2.6 more CPU seconds of solution time. In terms of economic
performance, the proposed ED method is more cost-effective than the DRCCED method
102
with moment information via slashing the total cost by 33.3%. As can be observed from
the results in Table 4, the proposed method with the prescreening technique significantly
reduces its memory and computational time, which are comparable with those of the
DRCCED method with moment information.
prescreening in six-bus test system.
The proposed ED
DRCCED
The proposed ED method with pre-
with moment
Parameter method screening
information
Cont. Var. 364 364 364
Constraints 985 450,776 985
Min. cost ($) 42,438.8 28,308.3 28,308.3
CPU time (s) 1.2 3.8 1.1
To illustrate the benefit of the derived theoretical bound in Proposition 2, we compare
it with an arbitrary scenario number of 50 in a testing case. The optimal ED decision
determined by this arbitrary scenario number has a constraint violation probability of
28% based on 100 testing wind power scenarios generated by f-GAN. This constraint
violation probability is much higher than the prescribed risk level of 10%, thus
jeopardizing the security of power systems under intermittent wind energy. By contrast,
the theoretical bound leads to an optimal ED decision without constraint violation,
which satisfies the requirement on risk level. Based on the comparison results, the
103
benefit of the derived theoretical bound lies in that it quantitatively dictates the required
number of scenarios to guarantee a predefined risk level.
To take a closer look at the cost breakdowns in case study on the six-bus system, we
present the cost distributions determined by the DRCCED method with moment
information and the proposed approach in Figure 17(a) and Figure 17(b), respectively.
From Figure 17(a) and Figure 17(b), we can readily see that the load shedding costs
account for more than 25% of the total costs for both methods. This is ascribed to the
relatively low load-shedding price. Notably, the percentage of load shedding determined
by the DRCCED method with moment information is 23% higher than that of the
proposed approach. The reason for this is described as follows. The DRCCED method
with moment information is less effective in wind power utilization compared with the
proposed approach. Thus, it more frequently resorts to load shedding as an alternative.
Figure 17. The cost breakdown of (a) the DRCCED method with moment
information, and (b) the proposed ED approach.
104
When the load-shedding price increases from $5/MW to $15/MW, no electricity load is
shed over the entire time horizon (NT=24). The power outputs of conventional
generation units are displayed in Figure 18. From Figure 18, we can see that most of the
conventional power comes from Generator 1. Accordingly, the operating cost of
Generator 1 contributes to 61.24% of the total cost for the proposed ED approach.
Figure 18. The power dispatch of each conventional generator determined by the
proposed ED approach.
3.5.2 Constraints IEEE 118-bus system
In this subsection, we consider a large-scale IEEE 118-bus system to demonstrate the
scalability and effectiveness of the proposed deep learning based ED approach. This
system consists of 118 buses, 54 thermal generators, 186 transmission lines, and 91
loads [305]. Moreover, ten wind farms (denoted by WF1-WF10) are installed at Buses
8, 12, 23, 36, 42, 56, 69, 77, 88 and 93. In this case study, the percentage of wind power
utilization β is set to be 15%, and the load shedding cost is $15/MW.
105
Figure 19. The spatial correlations of the ten wind farm energy outputs for (a) real
wind power data, and (b) wind power data generated by f-GAN. The color darkness of
one single cell represents the level of spatial correlation coefficient for corresponding
two wind farms. Comparison of spatial correlations can be made by focusing on the
darkness patterns of heat maps. The temporal correlations of WF10 for (c) real wind
power data, and (d) wind power data generated by f-GAN. The level of auto-
correlation coefficient is depicted by bar height. Comparison of spatial correlations
can be done by considering the height of each bar for every time lag.
To demonstrate that the adopted f-GAN is able to capture spatial and temporal
correlations, we compute the Pearson correlation and auto-correlation coefficients for
106
the wind farms located at different buses. Figure 19(a) and 19(b) visualize the spatial
correlation results of the real wind power data and generated data, respectively. Note
that darker colors indicate stronger correlations in these heat maps. By comparing
Figure 19(a) and 19(b), we can easily identify the resembling patterns of spatial
correlation for the real data and generated data. For example, from Figure 19(b), we
observe that the wind power output at WF1 have strong positive correlations with the
ones at WF4, WF7 and WF10, which is exactly consistent with the underlying true
information indicated by Figure 19(a). To characterize the degree of temporal
correlations in wind power time series, the autocorrelation coefficients are calculated
for each wind site. The results of real data and generated data for WF10 are displayed
in Figure 19(c) and 19(d). By inspection, the generated wind output series of WF10 has
similar auto-correlation coefficients as the real ones. Note that the auto-correlation
coefficients of generated wind data are close to the real one for other wind farms as well.
Thus, the wind scenarios generated by f-GAN retain both spatial and temporal
correlations among multiple wind farms.
The problem sizes and computational results are summarized in Table 5. By setting
ˆbt  ˆ  0.1 241 based on the Bonferroni approximation, the corresponding DRCCED
problem with moment information is infeasible. Thus, in Table 5, the computational
results for the DRCCED method with moment information ( ˆbt  ˆ  0.1 ) are provided.
Note that N(, ) equals to 9,747.19 based on eq. (3.25). Thus, the number of wind power
scenarios K is set to be 9,748. In contrast to the DRCCED method with moment
information, the proposed approach generates more cost-effective ED solution
($401,477.9 vs $825,403.8). With the prescreening technique, the proposed ED problem

107
not only has much fewer constraints for memory savings, but also can be solved around
one order of magnitude faster.
prescreening in IEEE 118 bus system.
The proposed ED
DRCCED
The proposed ED method with pre-
with moment
Parameter method screening
information
Cont. Var. 7,133 7,133 7,133
Constraints 17,088 28,780,485 17,088
Min. cost ($) 825,403.8 401,477.9 401,477.9
CPU time (s) 22.3 223.7 22.6
Figure 20. The empirical distribution of the wind power utilization efficiency for (a)
DRCCED with moment information, and (b) the proposed approach.
108
To examine the wind energy utilizations of the DRCCED method with moment
information and the proposed ED approach, we compute the percentage of wind
utilization for each generated wind power scenario and obtain its empirical probability
distributions, as shown in Figure 20. As can be observed from Figure 20(a), the wind
utilization percentage for the DRCCED method with moment information ranges from
18.02% to 29.01%. By contrast, the proposed approach promotes a higher penetration
of renewable energy with a minimum wind-utilization percentage being 59.11%, which
is much higher than the prescribed percentage of wind power utilization 15%. Moreover,
the probability distribution in Figure 20(b) is lopsided with more probability mass
locating on higher values. This observation again illustrates that the proposed approach
facilitates an effective use of intermittent renewable wind power.
3.6 Summary
In this work, a novel f-GAN based ambiguous joint chance constrained ED optimization
framework was proposed. The deep learning based ambiguity set well captured the
spatial-temporal correlations of wind power uncertainty. To address the resulting ED
problem, a solution method integrating a reformulation technique with the scenario
approach was developed. Additionally, we derived a theoretical bound on the required
number of generated wind power data, which depended on the number of installed wind
farms and the number of time periods in the ED problem. The comparison results with
an arbitrarily chosen number of scenarios showed that the developed theoretical bound
enjoyed a quantitative guarantee on constraint satisfaction in ED. A prescreening
technique could be further leveraged to speed up the solution process, thus facilitating
109
the scalability of the proposed approach in the large-scale IEEE 118 bus system.
Computational results showed that the proposed approach outperformed the
conventional method by generating less conservative ED solution while ensuring a
predefined risk level.
3.7 Nomenclature
Sets and Indices
b Index for buses
i Index for generators
l Index for transmission lines
t Index for time periods
   Ambiguity set based on deep learning
Parameters
CiV Fuel cost of generator i
CbLS Load shedding cost of bus b
Dbt Load demand located at bust b at period t
Fl Capacity of transmission line l
Kbl Power flow distribution factor for the transmission line l due to
the net injection at bus b
Pimin Minimum power output of generator i
Pimax Maximum power output of generator i
RUi Ramp up rate of generator i
110
RDi Ramp down rate of generator i
𝑊 Uncertain wind power generation at bus b at period t
θ Weights of generator network in f-GANs
ω Weights of discriminator network in f-GANs
β Prescribed percentage of wind power utilization
ε Risk level for ambiguous joint chance constraint
ξ Random vector of wind power parameters
Decision Variables
pit Power output of generator i at time period t
qbt Load shedding amount at bus b at time period t
wbt Power dispatch of wind farm at bus b at time period t
111
CHAPTER 4
ONLINE LEARNING BASED RISK-AVERSE STOCHASTIC MODEL
PREDICTIVE CONTROL OF CONSTRAINED LINEAR UNCERTAIN SYSTEMS
4.1 Introduction
Over the past few decades, model predictive control (MPC) has established itself as a
modern control strategy with theoretical grounding and a wide variety of applications
[306-309], because of its distinguishing feature of addressing multivariate dynamic
systems subject to control input and state constraints. Implemented in a receding horizon
fashion, MPC solves a finite-horizon optimal control problem at each sampling instant
and only performs the first control action. This procedure is repeated at the next instant
with a new measurement update. However, the presence of uncertainty could inflict
severe performance degradation or even loss of feasibility on conventional MPC if the
uncertainty is not explicitly accounted for [310].
Motivated by this fact, a lot of research efforts have been made on designing MPC that
accounts for the uncertainty of prediction. Robust MPC strategies, in which disturbances
are modeled using a bounded and deterministic set, aim to satisfy the hard constraints
of states and control inputs for all possible uncertainty realizations [311-314]. Designing
robust MPC is necessary and efficient when state and input constraints need to be
immunized against uncertainty. By exploiting the probabilistic nature of uncertainty in
controller designs, stochastic MPC methods are capable of tolerating constraint
violations in a systematic way [315]. Additionally, stochastic MPC can increase the
112
region of attraction by the means of chance constraints, thus allowing for the systematic
trade-off between constraint satisfaction and control performance [316].
Due to its attractive feature, stochastic MPC has stimulated considerable research
interest from the control community [315-317]. The existing literature in stochastic
MPC can be typically grouped into two main categories depending on whether the
knowledge of uncertainty distribution is perfectly known or not. The first category
subsumes research that focuses on tightening or adaptive tightening for chance
constraints via the explicit use of uncertainty distributions [318-325]. The conventional
stochastic MPC strategies rely heavily on the assumption that the probability
distribution of disturbance is known a priori. However, such perfect knowledge of
distribution is rarely available in practice. Instead, only partial information regarding
disturbance distribution can be inferred from data. In such data-driven settings, chance
constraints in these stochastic MPC frameworks are no longer ensured because there
always exists a gap between the underlying true distribution and the estimated one. For
the case with unknown disturbance distribution, the probabilistic constraints can be
reformulated using the scenario-based approach [326-328], or by means of the
Chebyshev-Cantelli inequality [329-331]. Specifically, the Chebyshev-Cantelli
inequality based method guarantees constraint satisfaction for any distributions sharing
the same mean-covariance information, which is in the same spirit with an emerging
paradigm called distributionally robust control [159, 332-334]. In the paradigm of
distributionally robust control, an ambiguity set is a set comprising all possible
probability distributions characterized by certain known properties of stochastic
disturbances. Distributionally robust control methodologies have been proposed for
113
linear systems with probabilistic constraints assuming the first and second-order
moment information of uncertainty distribution. However, only global mean and
covariance information is utilized, and the existing ambiguity set is not updated in an
online manner based on newly collected disturbance data.
With ever-increasing availability of data in control systems, there is a growing trend to
leverage the information embedded within data to improve control performance. The
remarkable progress in machine learning and big data analytics leads to a broad range
of opportunities to integrate data-driven systems with model-based control systems [35].
Additionally, the dramatic growth of computing power has enabled such organic
integration. Recently, learning-based MPC has attracted increasing attention from the
control community [335-338]. One such method leveraged statistical identification tools
to build a data-driven system model for enhanced performance, while utilizing an
approximate model with uncertainty bounds for constraint tightening in robust tube
MPC [335, 339]. Along the same research direction, a learning-based robust MPC was
developed to integrate control design with offline system model learning via a set
membership identification [340]. For single-input single-output system under stochastic
uncertainty, an adaptive dual MPC was designed [341], where control served as a
probing role to learn system model [342, 343]. To learn system nonlinearities from data,
several MPC strategies leveraged Gaussian process regression which provides system
dynamic model as well as residual bounds [344, 345]. Learning-based MPC is well
suited for the exploitation of data value to enhance control performance [336], and as
such makes it a practical and appealing control tool in the era of big data [346].
However, most of existing learning-based MPC methods focus on learning system
114
models, which essentially learn a function by the means of regression techniques along
with their error bounds. Although some of these learning-based MPC methods allow for
online or adaptive model learning, most of them focus on the robust control framework.
Few studies have organically integrated online learning with stochastic MPC.
Therefore, from both theoretical and practical standpoints, a systematic investigation on
online learning based stochastic MPC with theoretical guarantees on recursive
feasibility and closed-loop stability is needed.
To fill this research gap, there are several computational and theoretical challenges that
need to be addressed. The first research challenge is how to devise a data-driven
ambiguity set of disturbance distributions in a nonparametric manner such that it is
adaptive to data complexity automatically. In the context of online learning, one can
hardly pin down the complexity of disturbance model at the beginning, since more data
stream in over the runtime of MPC and data complexity can grow over time. Another
key research challenge is how to develop a framework that organically integrates online
learning with MPC for intelligent control. In particular, with more and more data
collected, the online learning method needs to be scalable with sample size in terms of
both memory and computational time. The third challenge lies in the development of a
computationally tractable constraint tightening method which hedges against
distributional ambiguity in stochastic MPC while leveraging structural properties of
uncertainty distribution. This calls for the theory extension in distributionally robust
optimization with the nonparametric ambiguity set, since there are no theoretical results
available for direct use. The fourth key research challenge is how to guarantee recursive
feasibility and stability of online learning-based stochastic MPC. This challenge arises
115
from the integration between online learning with stochastic MPC. Specifically, the
online update of disturbance distribution information could jeopardize the property of
recursive feasibility.
This work proposes an online learning-based risk-averse stochastic MPC framework for
linear time-invariant systems affected by additive disturbance. Instead of assuming
perfect knowledge of disturbance probability distribution, we consider a more realistic
setting where the distribution can be partially inferred from data. To immunize the
control strategy against the distributional ambiguity, Conditional Value-at-Risk (CVaR)
constraints are required to be observed for all candidate distributions in an ambiguity
set. As opposed to the conventional ambiguity set using global mean-covariance
information, a nonparametric data-driven ambiguity set is constructed by taking
advantage of a Dirichlet process mixture model (DPMM) [347]. Specifically, we devise
the ambiguity set based on the structural property, namely multimodality, along with
local first and second-order moment information of each mixture component, the
number of which is automatically derived from disturbance data. During the runtime of
the controller, real-time disturbance data are exploited to adapt the uncertainty model in
an online fashion based on an incremental variational inference algorithm. The
developed distribution-learning-while-control scheme alternates between learning
ambiguity set from real-time disturbance data and controlling the system with updated
uncertainty information. Afterwards, we propose a constraint tightening technique for
the resulting distributionally robust CVaR constraints over the DPMM-based ambiguity
set. Specifically, the constraints are equivalently reformulated as Linear Matrix
Inequality (LMI) constraints, which are amenable for efficient computation.
116
Additionally, we introduce a safe online update scheme for ambiguity set such that the
recursive feasibility and closed-loop stability are ensured. Numerical simulation and
comparison studies show that the proposed MPC method enjoys less-conservative
control performance compared with the conventional distributionally robust control that
uses global mean and covariance information of disturbance. Additionally, thanks to the
online learning scheme, the proposed MPC is advantageous in terms of low constraint
violation percentage under time-varying disturbance distribution.
The major contributions of this work are summarized as follows:
 A novel online Bayesian learning based risk-averse stochastic MPC framework that
improves control performance based on real-time data;
 An online data-driven approach with DPMM to devise ambiguity sets that are self-
adaptive to the underlying structure and complexity of real-time disturbance data;
 A novel constraint tightening method for risk-averse stochastic MPC, in which data-
driven CVaR constraints over the DPMM-based ambiguity set are equivalently
reformulated as computationally tractable LMIs;
 Theoretical guarantees on recursive feasibility and stability of the proposed MPC
with the introduction of a novel safe update scheme for ambiguity sets.
One important contribution of this work is a novel and organic integration of online
learning with stochastic MPC under time-varying uncertainty. Moreover, a novel
nonparametric ambiguity set based on the DPMM is first developed in this manuscript.
Note that the objective of the constraint tightening method is to facilitate the solution of
the proposed MPC. To the best of our knowledge, few research works investigate
theoretical guarantees on recursive feasibility and stability of stochastic MPC in the face
117
of time-varying disturbance distribution. Therefore, the establishment of recursive
feasibility and stability with the safe update scheme is another novel contribution,
although some widely used control techniques are employed.
Notation: The notation used in this chapter is standard. For sets  and  ,
 
    a  b a  , b   and     a a     denote the Minkowski set
addition and Pontryagin set difference, respectively. [A]j represents the j-th row of the
matrix A, and [a]j denotes the j-th entry of the vector a. For matrices A and B, their
Frobenius inner product is represented as A  B . The concatenated vector of
c0 k ,  , c N 1 k is denoted by c N k . For integers p and q, the notation [p, q] represents all
the integer numbers between p and q. For a real number α,    max  , 0  .


4.2 Problem setup and preliminaries
In this section, we describe the problem setup, including system dynamics,
distributionally robust CVaR constraints on system states, hard constraints on control
inputs, objective function, as well as some basics on risk-averse stochastic MPC.
Consider the following linear, time-invariant uncertain system with additive stochastic
disturbance.
xk 1  Axk  Buk  wk (4.1)
where xk   n is system state, uk   m denotes control input, wk   n represents

additive disturbance. Let   x  n Hx  h  
and   u   m Gu  g  be the
polytopic constraints on state and input, both of which contain the origin in the interior.
We make the following assumptions regarding the system and additive disturbance.
118
Assumption 4.1 The measurement of system state xk is available at time k.
This is a common assumption. At each sampling time t+1, the realization of disturbance
at previous time instant can be obtained by wt  xt 1  Axt  But . Therefore, we have
access to real-time disturbance data for online learning which refines the knowledge of
uncertainty.
Assumption 4.2 The matrix pair (A, B) is controllable.
Assumption 4.3 The additive disturbance w has a bounded and convex support set
  w Ew  f  , which contains the origin in the interior.
Given the state measurement at time k, the predicted model in MPC is given by,
xl 1 k  Axl k  Bul k  wl k , x0 k  xk (4.2)
where xl k and ul k represent the l-step ahead state and control input predicted at time k,
respectively. Disturbance wl k denotes the l-step ahead stochastic disturbance with
distributional properties available at time k.
By leveraging the probability distribution of disturbance w, stochastic MPC allows
constraint violation via the use of chance constraints. Following the literature [318, 348,
349], we consider the chance constraints of states, as in (4.3). The advantage of such
“one-step-ahead” chance constraint is that it facilitates the recursive feasibility analysis
and ensures the closed-loop performance.
 
  H i xl 1 k   h i xl k  1   i , i  1, p  (4.3)
119
where H  pn , h p , parameter [ε]i is a pre-specified risk level for i-th constraint
on system state. Note that (4.3) becomes hard constraints when the probability mass of
all disturbances is strictly greater than zero and [ε]i is set to be zero.
Chance constraints ensure the constraints be satisfied with a probability of at least 1−[ε]i,
relying on the assumption that the disturbance distribution is perfectly known.
Stochastic MPC can enlarge the feasible region of the corresponding finite-horizon
optimal control problem via tunable risk levels in chance constraints, thus improving
upon objective performance. By assuming disturbance distribution, the tightening
parameters can be computed offline by the inverse of the cumulative distribution. In
practice, however, such precise knowledge of probability is rarely available, and only
partial information can be inferred from historical as well as real-time disturbance data.
Due to the finite amount of uncertainty data, the assumed disturbance probability 
could deviate from the underlying true distribution. Consequently, the actual constraint
violation resulted from conventional stochastic MPC could become worse than the pre-
specified one. Additionally, the chance constraints per se (in the form of (4.3)) focus on
the frequency of constraint violation and fail to account for the violation magnitude
without penalizing the magnitude in cost function as implemented in soft constrains
[159, 350]. Therefore, we introduce the definition of CVaR and distributionally robust
CVaR constraints as follows.
Definition 4.1 (Conditional Value-at-Risk). For a given measureable loss L: n   ,
probability  on n , and tolerance level    0,1 , the Conditional Value-at-Risk
120
(CVaR) of random loss function L at level  with respect to probability  is defined
below.
    1
 -CVaR  L   inf    L   
 
      

(4.4)
where   denotes the expectation with respect to probability  . The CVaR can be
interpreted as the conditional expectation of loss L above the 1   quantile of
probability distribution of L.
To address the issue of conventional chance constraints, we propose to use a
 
distributionally robust CVaR version of constraints   H i xl 1 k   h i xl k  1   i ,
which is provided as follows.
sup -CVaR  
  
k i
 H  x
i l 1 k 
  h i  0 (4.5)
where   k  is defined as an ambiguity set constructed based on disturbance data
information available up to time k, and denotes the conditional probability given xl k
.Unlike the chance constraints, the distributionally robust CVaR constraints
sup -CVaR  
  
k i
 H  x
i l 1 k 
  h i  0 not only hedge against the distributional
ambiguity, but also penalize severe constraint violations that could be detrimental to
system safety.
Meanwhile, hard constraints are imposed on control inputs due to the physical limitation
of actuators, as shown below.
Gul k  g (4.6)
where G   qm and g   q .

121
Following the common practice in stochastic MPC, we split the predicted state into its
nominal part and stochastic error part as follows:
xl k  zl k  el k (4.7)
where zl k and el k denote the nominal part and stochastic error part of predicted state
xl k , respectively.
An MPC strategy with a predictive horizon N is considered. By employing an error
feedback, the predicted input for the uncertain system can be represented by
ul k  Kel k  vl k (4.8)
where K is a stabilizing feedback gain, vl k represents the predicted control input for the
nominal system corresponding to zl k . The nominal control vl k can be formulated as
follows.
vl k  Kzl k  cl k (4.9)
where cl k   m , l  0,, N  1 are decision variables in the receding horizon optimal
control problem, and cl k  0, l  N .
With predictive control laws ul k  Kel k  vl k and vl k  Kzl k  cl k , the system dynamics
for nominal states and stochastic error are given below.
zl 1 k  Azl k  Bvl k , z0 k  xk (4.10)
el 1 k   el k  wl k , e0 k  0 (4.11)
where Φ=A+BK is Schur stable with the stabilizing feedback gain K.
We consider the cost for the nominal system in this work. Specifically, the control
objective is to minimize the infinite horizon cost at sampling time k, as given below.
122
 

J    zlTk Qzl k  vlTk Rvl k (4.12)
l 0
where Q   nn , Q  0 , R mm , and R  0 .
Furthermore, the following assumption on detectability is made such that there exists
an LQR solution.
Assumption 4.4 The matrix pair (A, Q1/2) is detectable.
Suppose the feedback gain matrix K is chosen to be LQR optimal, we can further rewrite
(4.12) as (4.13). The detailed derivation is provided in Appendix A.
N 1
J    clTk  R  BT PB  cl k  xkT Pxk (4.13)
l 0
where P denotes the solution of the Lyapunov equation T P  Q  K T RK  P . Note
that the second term xkT Pxk is a constant. A quadratic finite-horizon cost is given by
 
N 1
J N c N k   clTk  R  BT PB  cl k (4.14)
l 0
 
T
where c N k  c0 k T ,, cN 1 k T represents the decision vector at time k with a horizon
of length N.
4.3 Online learning based risk-averse stochastic MPC
In this section, we propose a novel risk-averse stochastic MPC framework based on
online Bayesian learning. We first develop a data-driven approach to construct an
ambiguity set for the stochastic disturbance based on the DPMM. Then, an efficient
constraint tightening method for the CVaR constraints on system states over the
ambiguity set is developed for the synthesis of stochastic predictive controller. Finally,
123
based on an online safe update scheme, the predictive control algorithm that organically
integrates online learning with risk-averse stochastic MPC is described.
4.3.1 Online Bayesian learning for streaming disturbance measurement
data
To automatically decipher the structural property of the disturbance distribution, we
employ a nonparametric Bayesian model, called DPMM, which is briefly described as
follows.
The Dirichlet process (DP) constitutes a fundamental building block for the DPMM that
relies on mixtures to characterize data distribution. The DP is technically a probability
distribution over distributions. Suppose a random distribution G follows a DP
parameterized by a concentration parameter α and a base measure G0 over space Θ0,
denoted as G ~ DP  , G0  . For any fixed partitions (A1, …, Ar) of Θ0, we have the
following
 G ( A1 ), , G ( Ar )  ~ Dir  G0 ( A1 ), ,  G0 ( Ar )  (4.15)
where Dir represents the Dirichlet distribution.
Following the stick-breaking procedure [351], a random draw from the DP can be
expressed by G   k 1  k  k  , where  k   k  j 1 1   j  is the weight, φk is

 k 1
sampled from G0, and δ(φk) denotes the Dirac delta function at k . The parameter
𝛽̅ represents the proportion being broken from the remaining stick, and follows a Beta
distribution, denoted as 𝛽̅ ~ Beta (1, α).
124
The Bayesian nonparametric model, i.e. DPMM, employs k as the parameters of some
data distribution. Based on the DP, we summarize the basic form of a DPMM as follows
[352, 353]:
 k , k k 1 ~ DP  , G0 

ln ~ Mult   (4.16)
on ~ F (ln )
where Mult denotes a multinomial distribution, ln is the label indicating the component
or cluster of observation on, n is an index ranging from 1 to Nd, and data o1, …, oNd are
distributed according to a family of distributions F. Based on the stick-breaking process,
G is discrete with probability one. Such discreteness further induces the clustering of
data.
Due to its computational efficiency, the variational inference has become a method of
choice for approximating the conditional distribution of latent variables in the DPMM
given observed data [353]. In the variational inference, the problem of computing the
posterior distribution is formulated as an optimization problem, which can be solved
using a coordinate ascent method. Following the literature [352], we use the mixtures of
 
Gaussian in this work. Therefore, we can choose n ~ NW k , k , k ,  k 1 , where
k  k , H k  includes mean vector and precision matrix and NW represents the normal
Wishart distribution. A variational distribution is used to approximate the true posterior
in terms of Kullback–Leibler divergence [353, 354].
For the online learning setting, suppose that the real-time data wt is collected for the
control system. To learn from the streaming data, we employ an online variational
125
inference algorithm in this work [347]. This algorithm features faster computation and
bounded memory requirement for each round of learning. It is well-suited to learning the
distribution online from real-time data over the runtime of MPC. The algorithm iterates
between the model building phase and the compression phase. In the model building
phase, clump constraint Cs is the set of indices satisfying that i, j  Cs , disturbance
data wi and wj are generated from the same mixture component. Disturbance data within
the same clump are summarized via the average sufficient statistics, which encapsulates
all the information needed for the purpose of inference [347]. The new disturbance data
at time t are used to update the inference results, then they are discarded to reduce the
memory overhead. As a result, the algorithm is attractive in terms of both bounded
memory requirement and fast computation. By introducing the compression phase, the
algorithm not only is computationally efficient, but also requires bounded memory
space. The clump constraints are determined in the compression phase in a top-down
recursive fashion. Specifically, the computation burden at each model update using new
disturbance data does not grow with the processed disturbance data amount. For more
details on this online learning algorithm for the DPMM, we refer the readers to [347].
Based on the online variational inference results available up to time k, we propose a
data-driven ambiguity set, as given in Definition 4.2, by leveraging both multimodality
and moment information of each mixture component.
Definition 4.2 (Data-driven ambiguity set). The ambiguity set based on the DPMM,
denoted as  , is defined as follows.
m 
k
j 1

    j   j ,  j  , j 
k k k
 (4.17)
126
where   w Ew  f  is the support set of the additive disturbance under
Assumption 4.3, m(k) denotes the number of mixture components, the mixing weight
 jk  indicates the occurring probability of each mixture component,  j represents a
basic ambiguity set,  j

k
and j  , respectively, denote the mean and covariance
k
estimates for the j-th mixture component obtained from the online learning algorithm.
Note that the support set plays a key role in ensuring recursive feasibility by the means
of terminal set.
The data-driven ambiguity set for the stochastic disturbance based on the DPMM is
devised as a weighted Minkowski sum of several basic ambiguity sets, the number of
which is automatically determined from disturbance data using the online variational
inference algorithm. Each basic ambiguity set  j is cast as follows.
 
    d   1
 
 

 j  ,  j  , j 
k k
           d   

 
j
k
 (4.18)
 
 
T

  T    d   jk    jk   jk  
  
where  represents the set of positive Borel measures on n , and ρ is a positive
measure.
There are several highlights of the proposed data-driven ambiguity set. First, the
ambiguity set is devised in a nonparametric manner so that it automatically
accommodate its complexity to the underlying structure and complexity of disturbance
data. Second, each basic ambiguity set is devised using the mean and covariance
information, which endows the resulting stochastic MPC with enormous computational
127
benefits. Third, the proposed ambiguity set leverages the fined-grained distribution
information, namely local moment information. This feature implies that the resulting
stochastic MPC enjoys a less conservative control performance compared against the
control method with a conventional ambiguity set based on global moment information.
4.3.2 A novel constraint tightening based on distributionally robust CVaR
constrained optimization
In this section, we present a novel constraint tightening technique for distributionally
robust CVaR constraints sup -CVaR  

  
k i
 H  x
i l 1 k 
  h i  0 over the data-driven
ambiguity set  , as well as constraint tightening for input constraints Gul k  g .
The purpose of constraint tightening is to obtain constraints for system states of the
nominal system, such that the states of the uncertain systems are satisfied with a
frequency of at least 1−[ε]i.
For state constraints, the corresponding distributionally robust CVaR constraints can be
equivalently reformulated as LMI constraints on the predicted nominal system states, as
given in the following theorem.
Theorem 4.1 (Constraint tightening). The system satisfies the data-driven
distributionally CVaR constraints if and only if the nominal system satisfies the
tightened constraints zl 1 k    k    l
i 1    k  given below.
 i  , with 

  k   z   n Hz  h    k   (4.19)
where [η(k)]i is the optimal objective value of the following problem (4.20).
128
min 
 
m
s.t.  i   i    j tij   j T ij    j   j  Tj   ij  0
j 1
 1 
 ij
2
 ij  E T ij  
   0, j (4.20)
 1   ET T
  ij ij  tij  f ij 
T
2 

 ij
1
2
 
ij   H i  E T ij 
T

   0, j

 1    H  T  ET

T
 ij ij tij   i    f ij 
T
2 i

ij  0, ij , ij  0, j
Note that we drop the index k from m  k  ,  j  ,  j  and j  for notational simplicity.
k k k
The constraints in (4.20) are LMI constraints. The proof of Theorem 4.1 is provided in
Appendix B.
Remark 4.1 The support set is assumed to be a polyhedron in Assumption 4.3, because
it facilitates the LMI reformulation in the proposed constraint tightening. Specifically,
constraints with a polyhedral support set can be reformulated as computationally
efficient LMI constraints in the proposed MPC. In principle, the assumption on
polyhedral support can be relaxed to a general compact convex set. In this case, those
constraints still admit robust counterparts, yet more complicated than LMIs.
The optimization problem in (4.20) is a computationally tractable semi-definite program
(SDP), which can be solved efficiently using the off-the-shelf optimization solvers, such
as SEDUMI and MOSEK. The distribution information of disturbance is learned online
through the incremental variational inference algorithm, and is further incorporated to
(4.20) to perform constraint tightening, thus improving the control performance in an
online fashion.
129
For hard constraints on control inputs Gul k  g , the corresponding constraint
tightening result is obtained in as follows, based on the tube-based strategy.
vl k  l    K  l 1
i 0 i   (4.21)

where   u   m Gu  g . 
4.3.3 The proposed online learning based risk-averse stochastic MPC
algorithm with a safe update scheme
The proposed online learning based risk-averse stochastic MPC In this section, we first
introduce the finite horizon optimal control problem of the proposed MPC with a safe
ambiguity set update scheme. Then, the overall description of the resulting online-
learning based risk-averse stochastic MPC algorithm is provided in detail.
In the online learning based risk-averse stochastic MPC paradigm, the finite horizon
optimal control problem is solved repeatedly online. The optimal control problem
needed to be solved at time k is denoted as (OL-SMPCk), given in (4.22)-(4.27).
min J N c N k
cN k
  (4.22)
s.t. zl 1 k  Azl k  Bvl k , z0 k  xk (4.23)
vl k  Kzl k  cl k (4.24)
zl 1 k   l 1 , l   0, N  1
k
(4.25)
vl k  l , l   0, N  1 (4.26)
zN k  f 
k
(4.27)
130
where sets  l k 1 and  l k 1 are defined using a safe update scheme described as follows.
In the developed update scheme, the condition is identified under which it remains safe
to utilize the tightened constraints with the updated ambiguity set in the risk-averse
stochastic MPC. Specifically, if a candidate solution satisfies the tightened constraints
resulted from current results of online variational inference, one can safely incorporate
the newly-learned uncertainty information into the control problem; otherwise, one
resorts to the tightened constraints from the previous sampling time.
To describe the condition checked by the safe update scheme, we need the definition of
candidate solution given as follows.
Definition 4.3 (Candidate solution) Given an optimal solution
 
c*N k  c0* k , c1*k , , c*N 1 k to the MPC problem at time k, the candidate solution at time
instant k+1 is defined by

c k 1  c1*k , , c*N 1 k , 0  (4.28)
Note that it is common to employ the candidate solution as the shifted optimal input
augmented by zero [314, 355]. This work employs the dual mode prediction paradigm
[356], namely Mode 1 corresponds to vl k  Kzl k  cl k , l  0, , N  1 , while Mode 2
corresponds to vl k  Kzl k , l  N . As a result, the terminal controller vl k  Kzl k for the
nominal system zl 1 k  Azl k  Bvl k is able to steer the nominal system state (in the
terminal set) to the origin. The last term of zero in the candidate solution actually comes
from the shifted solution at Mode 2 ( c N k  0 ). Therefore, it is not restrictive to require
the last term to be zero. The explicitly given candidate solution plays a critical role not
131
only in establishing recursive feasibility, but also in proving closed-loop stability [314,
355].
Based on the candidate solution, the safe update scheme checks the following
conditions:
ˆ  k 1
zl k 1   (4.29)
l
zN k 1  ˆ f 
k 1
(4.30)
where zl k 1 denotes the predicted state of nominal system corresponding to the
candidate solution c k 1 , set ˆ l k 1    k 1   l

i 1  ˆ  k 1 is defined using
 i  , and set  f
terminal constraints.
ˆ  k 1 , we define (maximal) robust positively invariant

To formally define terminal set  f
set below [357].
Definition 4.4 (Robust positively invariant set) A set Ω is a robust positively invariant
set for system xk 1  f  xk , wk  and constraint set  ,   , if    and
xk 1  , wk   for every xk   .
Definition 4.5 (Maximal Robust Positively Invariant set) A set Ω is a maximal robust
positively invariant (MRPI) set for system xk 1  f  xk , wk  and constraint set   ,   ,
if Ω is a robust positively invariant set and contains all robust positively invariant sets.
Remark 4.2 The MRPI sets are computed by using a standard approach based on the
recursion of predecessor sets, a.k.a. backward reachable sets [357, 358].
132
ˆ  k  is defined as the Maximal Robust
Definition 4.6 (Terminal set) The terminal set  f
Positively Invariant (MRPI) set for the following system zN k 1  zN k   N wk that
ˆ  k  and Kz   .
satisfies the tightened constraints z N k   N Nk N
Based on Definition 4.6, the terminal set should be MRPI set which satisfies
ˆ  k   N  
 ˆ  k  and z  ˆ  k   z  ˆ  k  , Kz   .
f f f N N
Note that the terminal controller using the feedback gain K respects state and input
constraints, when operated in the terminal set.
It is safe to update the ambiguity set when an indicator called flag equals to 1; otherwise,
when flag=0, updating the ambiguity set could jeopardize the recursive feasibility of the
proposed MPC. Therefore, the corresponding tightened constraints adopted in the online
learning based stochastic MPC is given by
l
k 1 ˆ
 flag   l
k 1
 1  flag    
l
k
(4.31)
Similarly, we can safely update the terminal set in the following way:
f
k 1 ˆ  k 1  1  flag    k 
 flag   (4.32)
f f
The proposed MPC algorithm is detailed in Figure 21. The online-learning based risk-
averse stochastic MPC algorithm can be roughly divided into two blocks: (i) offline
computation of the sets, ambiguity set construction based on historical disturbance data,
and (ii) online learning from real-time disturbance data and online optimization. At each
time, the MPC strategy only implements the first control action. From Figure 21, we
can see that it alternates online optimal control of the system, and online learning for
ˆ  k 1
real-time disturbance data. If the candidate solution satisfies the conditions zl k 1   l
133
ˆ  k 1 , the newly learned uncertainty information is incorporated into the
and zN k 1   f
predictive control strategy to improve the control performance over its runtime. In the
MPC framework, a finite-horizon optimal control problem is solved at each sampling
time, and only the first control action is performed as control input uk . The
corresponding first control action u0* k  Ke0 k  v0* k . Since e0 k  0 , we can further have
u0* k  0  v0* k  v0* k . Therefore, we have control input uk  v0* k as in Step 5 of the
algorithm.
Algorithm Online-learning based stochastic MPC algorithm

Offline: Given the initial state x0, construct an ambiguity set in (17)
 0
1: from historical data, and determine  l using (19)-(20), l in (21),
and f0 based on Definition 6.
2: Online:
3: for k=0,1…do
4: Solve the optimal control problem (OL-SMPCk) in (22)-(27);
Apply control policy in (8) for l=0, i.e. uk  v0 k ;
*
5:
6: Measure the current system state xk 1 and obtain wk using (1);
7: Run online learning for DPMM in (16) with real-time data wk ;
Update ambiguity set in (17), and obtain ˆ l k 1 using (19)-(20) and
8:
ˆ f  based on Definition 6;
k 1
9: if zl k 1  ˆ l k 1 in (29) and zN k 1  ˆ fk 1 in (30)

10: flag=1;
11: else
12: flag=0;
13: end
14: Determine set  l k 1 and fk 1 using (31)-(32);
15: end
Figure 21. The pseudocode of the proposed online-learning based risk-averse
stochastic MPC algorithm.
134
Remark 4.3 The upper bound of memory requirement of the online learning algorithm
 n 2  3n 
is   1  N c  nN s , where N c is the number of clumps, N s is the number of
 2 
singlets, and n denotes the data dimension. Note that computational cost for storing the
 n 2  3n 
sufficient statistics for each clump is   1  . Compared with the batch learning
 2 
algorithm, the computational complexity of the adopted online learning algorithm is
only O  K  N c  N s  1  during the model building phase [347], where K denotes the
maximum number of components.
4.4 The theoretical properties of the proposed online learning based
risk-averse stochastic MPC
In this section, the properties of the proposed online learning based risk-averse
stochastic MPC algorithm, namely recursive feasibility (formally defined in Definition
4.7) and closed-loop stability, are derived.
Definition 4.7 (Recursive feasibility) If the finite-horizon optimization problem of MPC
is initially feasible, it remains feasible for all the subsequent sampling instant.
As pointed out in Section 4.1, the important property of MPC, namely recursive
feasibility, might be compromised by the time-varying probabilistic constraints with
online updated ambiguity set of disturbance distributions. To this end, a novel safe
update scheme for the ambiguity set is developed for the MPC framework, along with
the terminal set or terminal constraints [357].
135
By employing the safe ambiguity set update scheme developed in Section 4.3.3, the
recursive feasibility and closed-loop stability are ensured despite that the disturbance
distribution might be time-varying. Note that disturbance support  is assumed to be
not time-varying for the ease of exposition, as indicated in Assumption 4.3 where
matrices E and f are not indexed by time k. For the time-varying support, the derivation
and conclusion of proposed constraint tightening in Theorem 4.1 are still valid. The
only issue with the varying support is that the MRPI set could become empty when the
support becomes sufficiently large, which further leads to the infeasibility issue.
A standard assumption underpinning tube based MPC on the terminal set is made as
follows [312, 313]..
Assumption 4.5 There exists a nonempty terminal set *f for the tightened constraints
based on a worst-case scheme, i.e. ˆ *l   *   l

i1 i   with
 *   z  n Hz  h   where    max  H   .
 0 0 i i
 
Proposition 4.1. Under Assumption 4.5, the terminal MRPI set f  is always
k
nonempty based on the constraint tightening in Theorem 4.1.
The proof of Proposition 4.1 is given in Appendix C.
Remark 4.4 Based on the proof in Appendix C, we can see that set *f is a robust
positively invariant set, not necessarily the MRPI set, for the updated tightened
constraints. Therefore, in the case where very limited computational budget is imposed,
the offline-computed set *f can serve as the terminal set to guarantee recursive
feasibility and stability without re-computing the MRPI set at each time step.
136
First, we prove the recursive feasibility of the proposed MPC, as given in the following
theorem.
Theorem 4.2 (Recursive feasibility). Let   xk  denote the feasible region of the finite
horizon optimal control problem (OL-SMPCk) for state xk. If   x0    , then given
Assumptions 4.1-4.5, we have   xk    , k .
The proof of Theorem 4.2 is provided in Appendix D.
Remark 4.5 Since the distribution of stochastic disturbance can be arbitrarily time-
varying, the feasibility of the candidate solution based on the adaptive constraints
(without using the safe update scheme) cannot hold universally. To the best of our
knowledge, the developed safe update scheme presents the first attempt to successfully
address such recursive feasibility issue of stochastic MPC subject to time-varying
disturbance distributions.
Before proving the stability of the proposed MPC, we provide the definition of minimal
RPI set as follows.
Definition 4.8 (minimal robust positively invariant set) Minimal robust positively
invariant set R∞ is defined as follows.
R  limli 0  i  (4.33)
l 
The following theorem establishes the stability of the closed-loop system with the
proposed MPC strategy.
Theorem 4.3 (closed-loop stability). Given that   x0    , the closed-loop system
state asymptotically converges to a neighborhood of the original under the proposed
online-learning based risk-averse stochastic MPC.

137
The proof of Theorem 4.3 is provided in Appendix E.
4.5 Numerical examples
In this section, we apply the proposed online-learning based stochastic MPC to
numerical examples to illustrate its effectiveness and advantages. We also implement
the risk-averse stochastic MPC using global mean and covariance in the ambiguity set
and the risk-averse stochastic MPC without online learning, in addition to the proposed
MPC approach for the purpose of comparison. The online learning algorithm and the
MPC control strategies are implemented in MATLAB R2018a. The computational
experiments are performed on a computer with an Intel (R) Core (TM) i7-6700 CPU @
3.40 GHz and 32 GB RAM. We use the YALMIP toolbox in MATLAB R2018a [359].
The GUROBI 8.0 solver is adopted to solve the finite-horizon optimal control problem,
and SeDuMi 1.3 is employed to solve the constraint tightening problem (4.20). The
related sets, including terminal sets and the robust positively invariant set, are obtained
via multi-parametric toolbox 3.0 [360]. The prediction horizon N is set to be nine. Note
that the distributionally robust control method (Van Parys et. al, 2016) considers a
setting similar to ours, e.g. the risk-averse stochastic control setting where the
disturbance distribution is only partially known. This is why we consider the
comparison with this existing distributionally robust control method.
4.5.1 Example with disturbance data distribution having multimodality
In this section, we use a benchmark numerical example, control of the constrained
sampled double integrator [312], to demonstrate the effectiveness of the proposed MPC.
The system dynamics in this benchmark example is defined by

138
 1 1  0.5 
xk 1    xk    uk  wk (4.34)
 0 1 1 
The state and control constraints in the risk-averse stochastic MPC is given as follows.
  
k
 
sup  -CVaR 0.2  0 1 xl 1 k  2  0 (4.35)

  u u  5  (4.36)
The initial condition of system states x0   5, 2  , and the support set of disturbance
T
 
is   w w   0.6 . In the numerical example, the matrix gain K is decided to be the
1 0
unconstrained LQR solution with matrix Q    and R=0.01. Specifically, the
0 1
 0.6696 0.3370 
matrix gain K   0.6609  1.3261 , so     . Set R∞ can be
 0.6609 0.3261
computed using an approximation method [361], in which an upper bound on the
approximation error can be specified a priori.
To demonstrate control performance and computational time, 100 closed-loop
simulations are performed with a simulation horizon of 20 time steps. The closed-loop
cost function is J cost   ks1  xkT Qxk  ukT Ruk  , where simulation horizon length Ts=20.
T
Compared with the risk-averse stochastic MPC with a mean-covariance ambiguity set,
the proposed MPC method exploits the fine-grained uncertainty information, and is less
conservative via reducing the closed-loop cost by an average of 9.77% over all
simulation runs. To take a closer look at the computational time breakdown of the
proposed MPC, we present the average computational time for online learning,
constraint tightening through solving (4.20), online control via solving (OL-SMPCk),
139
and computing the MRPI set in Figure 22. The average values of computational time
are calculated over 2,000 (20 100) time steps. From the results, we can see that the
average computational time for the online learning is merely 0.344 s. It is
computationally efficient to update pamater η online, and it takes 0.215 s on average to
solve optimization problem (4.20). Even though additional computation is needed at
each time step, the proposed MPC not only enables the incorporation of updated
distribution information to improve control performance, but also features an acceptable
computational cost. Since the example is a benchmark problem, the computational time
comparison is representative of the computational performance for the proposed
approach.
Remark 4.6 The adopted approach of computing the MRPI set is computationally
tractable, because the MRPI set is guaranteed to be computed in a finite number of
recursions [358]. In the above numerical example, it takes only 0.057s on average to
compute the MRPI set, and the longest time of computing the MRPI set is 0.105s.
140
Figure 22. The average computational times of the proposed online learn-ing based
risk-averse stochastic MPC method over 2,000 time steps.
4.5.2 Example with a time-varying disturbance distribution
To demonstrate the significance of online learning in the proposed MPC approach, we
consider an example with a time-varying disturbance distribution. Specifically, the
standard deviation used to generate historical disturbance data is 0.005, while the
standard deviation increases to 0.3 for the real-time disturbance data generation. In this
1 0
example, matrix Q    , R=1, and the support set of disturbance is
0 1
 
  w w   0.10 , and the distributionally CVaR constraints is given below.
  
k
 
sup  -CVaR 0.15  0 1 xl 1 k  1.2  0 (4.37)
141
To take a closer look at constraint violations under time-varying disturbance
distributions, 100 runs of simulations are performed with different realizations of
disturbance sequences. Figure 23 shows a set of state trajectories using the proposed
MPC for a simulation horizon of 20 time steps. Note that the prediction horizon N=9.
For the proposed MPC strategy, the average constraint violation in the first nine steps
over all simulations is 7.2%, even if the disturbance variation becomes significant
because of the larger standard deviation. In constrast, the performance of risk-averse
stochastic MPC using distributionally robust CVaR constraints without online learning
deteriorates, and the corresponding average constraint violation increases to 22.3%.
This constraint violation is higher than the prescribed tolerance of 15.0%, thus
jeopardizing the safety of the control system. Notably, the risk-averse stochastic MPC
without online learning scheme implements constraint tightening offline using historical
disturbance data and its parameter η remains constant over time.
142
Figure 23. (a): The closed-loop trajectories of system states for the proposed online
learning based risk-averse stochastic MPC with 100 realizations of disturbance
sequences, (b): The zoom-in view of state trajectories near the upper limit of x(2).
By leveraging the online update of the ambiguity set, the time-varying distribution is
well captured by the proposed MPC method, and the constraint tightening is adaptive to
the distribution accordingly. Figure 24 shows the parameter η in the constraint
tightening of the proposed MPC over time in a simulation run. From Figure 24, we can
readily see that the effect of adaptation in the proposed MPC is evident. Specifically, η
increases from 0.016 to 0.10 based on the updated information of the stochastic
143
disturbance. The values of flag equal to one over the entire horizon, which indicates that
the adaptive solution is activated at each time step.
Figure 24. The online adaption of constraint tightening parameters in the proposed
MPC for time-varying disturbance distribution in a simulation.
4.6 Summary
In this chapter, an online learning-based risk-averse stochastic MPC framework was
developed for the control of linear time-invariant systems affected by additive
disturbance. It incorporated online learning into MPC with desirable theoretical control
guarantees and less conservative control performance. Based on the DPMM, a
systematic approach to construct the ambiguity set from real-time disturbance data was
developed, which leveraged the structural property of multimodality and local moment
information. Additionally, the exact reformulation for the distributionally robust CVaR
144
constraints was derived as LMI constraints to facilitate constraint tightening. The online
adaptation of ambiguity set with real-time measurements helped to improve control
performance by actively learning disturbance distribution online. We introduced the
safe update scheme to ensure the recursive feasibility and stability of the resulting MPC.
The computational results demonstrated that the control performance of the developed
MPC is less conservative compared with the one using mean and covariance information
of disturbance distribution. Additionally, for the case with time-varying disturbance
distribution, the deterioration of constraint satisfaction was ameliorated in the proposed
online learning based MPC framework.
4.7 Appendix A: The derivation of control objective
Based on the dual-mode prediction dynamics, the nominal system
zl 1 k  Azl k  Bvl k , z0 k  xk can be characterized by an autonomous system shown
below [356].
yl 1 k   yl k , l  0
(4.38)
 z0 k 
 
c0 k    B 
where the initial state y0 k   , and the system matrix    0   , matrix
   
cN 1 k 
 
 0 Im  0 
 
    
   I m 0  0  , and matrix    .
 0 0  Im 
 
0 0  0 
145
Based on the autonomous system dynamics, the stage cost of
 

J    zlTk Qzl k  vlTk Rvl k can be written below.
l 0
  R  Kz 
T
zlTk Qzl k  vlTk Rvl k  zlTk Qzl k  Kzl k  cl k lk
 cl k
(4.39)
ˆ
 ylTk Qylk
 Q  K T RK K T R 
where matrix Qˆ   .
  RK  T R 
T
Thus, we can rewrite J  as follows.

J    ylTk Qy
ˆ  yT y
lk 0k 0k (4.40)
l 0
where  is the positive definite solution of the Lyapunov equation   T   Qˆ .
  zc 
We express matrix  as    z  . By substituting  ,  and Q̂ into the
  cz c 
equation   T   Qˆ , we have
 z  T  z   Q  K T RK (4.41)
cz  T cz   T  BT z   RK  (4.42)
c   B  z  B   T cz B   B   zc   T c   T R
T T
(4.43)
Since feedback gain K is the LQR optimal solution and (4.41), we have  is the unique
solution of the Riccati equation and  z  P . Based on the expression of LQR solution
K    BT  z B  R  BT  z A , we can further have

1
146
BT  z   RK
 1

 BT  z A  B  B T  z B  R  B T  z A  R  BT  z B  R  BT  z A
1
(4.44)
 B  z A   B  z B  R  B  z B  R  B  z A
T T T 1 T
0
According to (4.42) and (4.44), we have cz  T cz   0 , which implies cz  0 is
the solution. Since matrix  is symmetric, we can further have  zc  0 . Therefore, the
equation (4.42) leads to the following.
c  T c   T  R  BT PB   (4.45)
 c1,1  c1, N  
 
Let matrix c       . By plugging the block matrix into (4.45), we
  N ,1 
 c  cN , N  
arrive at the following expression:
 R  BT PB, i j
c
i, j 
 (4.46)
0, i j
N 1
 
Based on (4.40) and (4.46), we have J    clTk R  BT PB cl k  xkT Pxk .
l 0
4.8 Appendix B. Proof of Theorem 4.1
Proof. Based on xl k  zl k  el k and el 1 k   el k  wl k , we have
xl 1 k  zl 1 k   el k  wl k (4.47)
Note that conditional distributionally robust CVaR constraints need to be hold for any
reachable el k . Therefore, we leverage the tube-based method, and the original
distributionally robust CVaR constraints can be cast as follows.
147
zl 1 k  ˆ l k 1    k    l
i 1 i   (4.48)
  k  is explicitly given by,

where set 

     z sup  -CVaR  
k
  k  i
 H  z   h   H  w   0, i  1, p
i i i lk
(4.49)
We will provide a constraint tightening method to reformulate the following constraint.
sup  -CVaR  
  
k i
 H  z   h    H  w   0
i i i lk
(4.50)
We can further reformulate the distributionally robust CVaR constraint, according to
the definition of CVaR and the stochastic minimax theorem [139, 362, 363].
  
k i
 H  z   h    H  w 
i i i lk
k  
   i

 sup inf   i 

1

 i   H  z  h   H  w
i i i lk
 i  
 

(4.51)
i 

 inf   i 

1
sup 
 i k    H  z  h   H  w
i i i lk
 i 
 


where   
 represents the expectation with respect to probability . The first equality
holds based on Definition 1. The second equality is based on a stochastic minimax
theorem.
The worst-case expectation problem sup  

 
k
 H  z  h   H  w
i i i lk
 i   can be

rewritten as the following optimization problem.
148
m
    H  z   h    H     

sup j  i i i i  j  d 
1 ,, m  j 1
  j  d   1 
  (4.52)

s.t.    j  d    j  j  1, , m


  T  j  d    j   j  Tj 
 
By taking the dual of optimization problem (4.52), we have (4.53)-(4.56).
  t 
m
min j ij   j T ij    j   j  Tj   ij (4.53)
tij ,ij ,ij
j 1
s.t. tij   T ij   T  ij  0,    , j (4.54)

  H i z   h i   H i   i  tij  (4.55)
  T ij   T ij  0,   , j
ij  0, j (4.56)
where tij, ωij, and Ωij are the dual variables corresponding to the constraints in the
ambiguity set. Since constraints (4.54)-(4.55) are semi-infinite constraints, some further
reformulations are needed.
Constraint tij   T ij   T  ij  0,    , j can be reformulated as follows.
min tij   T ij   T ij   0, j (4.57)

 
We can further reformulate constraint min tij   T ij   T ij   0, j as the

 
following LMI constraints, based on the duality of convex quadratic programs and
Schur complement [172].
149
 1 
 ij
2
 ij  E T ij  
   0, j  1, , m (4.58)
 1   ET T
  ij ij  tij  f ij 
T
2 
where ij  0 is a vector of dual variables corresponding to the constraints E  f .
Similarly, constraints (4.55) can be recast as LMI constraints below.

 ij
1
2

ij   H i  E T ij
T
 

   0, j (4.59)

 1    H  T  ET
 tij   i   h i   H i z  f ij 
T
T
 ij ij
2 i

where ij  0 is a vector of dual variables corresponding to the constraints E  f .
Thus, the distributionally robust CVaR constraints
  
k i
 H  z   h   H  w   0 over the DPMM-based ambiguity set is
i i i lk
reformulated as follows.
 i  i    j tij   jT ij    j   j  Tj   ij   0

m
j 1

 ij
1
2
ij  E T ij  
   0, j
 1   ET T
  ij ij  tij  f ij 
T
2  (4.60)

 ij
1
2
 
ij   H i  E T ij

T

   0, j

2
ij i ij ij i i 
 1    H  T  ET T t     h   H  z  f T 
i ij 

ij  0, ij  0 , ij  0, j
We consider the following constraint:
 H i z   hi    k  i (4.61)
where
150
  k    min 
 i 
(4.62)
s.t. sup  -CVaR  
 
k i
 H i wl k    0
The constraint in (4.62) can be converted into constraints (4.20). Then, we convert
(4.61) and (4.62) as follows.
 H i zl 1 k  
max

h    
i
k 
i
(4.63)
  k    sup  -CVaR
 i
  k 
 
 i  H i wl k   0 

We reformulate constraint (4.63) into (4.60) using the same reformulation technique.
This completes the proof. □
4.9 Appendix C: Proof of Proposition 4.1
Proof. First, we introduce an ambiguity set *         d   1 . Consider the


constraint tightening based on ambiguity set * below.
 i  min 
s.t. sup  -CVaR  
*
i
 H  w i lk
  0 (4.64)
By using the similar technique in Appendix B, we can reformulate (4.64) as follows.
 i  min 
s.t.  i  i  ti  0
(4.65)
ti  0
ti    i  max  H i 
 
 1 
According to constraints in (4.65), we can further have i  max  H i     1 ti .
     
 i 
Since ti  0 , we have  i  max  H i   0 i .

 
151
Since   k    * , it holds that 0 i      , i, k . Given inequality 0 i      ,
k k
i i
we can further have ˆ *l   l k  , k . Based on ˆ *l   l k  , k and Definition 4.6, *f is
a robust positively invariant set (not necessarily MRPI set) for the updated tightened
constraints. Since *f is nonempty under Assumption 4.5, there always exists a
nonempty MRPI set according to Definition 4.5. This completes the proof. □
4.10 Appendix D: Proof of Theorem 4.2
Proof. Given that   xk    , we want to show   xk 1    . Suppose
 
c*N k  c0* k , c1*k , , c*N 1 k is the optimal solution to problem (OL-SMPCk) at time k.
Hence, the corresponding optimal nominal state evolution is given below.
zl*1 k  zl*k  Bcl*k , z0* k  xk (4.66)
Based on the optimal solution, an explicit candidate solution can be constructed as
 
c k 1  c1*k , , c*N 1 k , 0 using Definition 4.3. The nominal state under control input c k 1
is shown as follows.
z0 k 1  xk 1  Axk  Buk  wk

  A  BK  xk  Bc0* k  wk (4.67)
 z1*k  wk
Based on z0 k 1  z1*k  wk , we further have the following equalities.
z1 k 1  z0 k 1  Bc0 k 1
 
  z1*k  wk  Bc1*k (4.68)
 z2* k  wk
152
According to z0 k 1  z1*k  wk and z1 k 1  z2* k  wk , we have the following relation by
induction.
zl k 1  zl*1 k  l wk , l   0, N  1 (4.69)
Similarly, we have the relationship between the optimal solution at time k and the
candidate at time k+1, as presented by,
vl k 1  vl*1 k  K l wk , l   0, N  1 (4.70)
For the optimal control problem (OL-SMPCk+1), we can consider two scenarios,
namely flag=0 and flag=1. For the scenario in which flag=0, we have  l k 1   l k 
according to l
k 1 ˆ  k 1  1  flag    k  . Based on z  z*  l w and
 flag   l l l k 1 l 1 k k
vl k 1  vl*1 k  K l wk , we have the following:
zl*1 k  l 1  zl k 1  l 

k k 1
(4.71)
vl*1 k  l 1  vl k 1  l (4.72)
Next, we derive the candidate nominal state at the end of the horizon as follows.
zN k 1  AzN 1 k 1  BvN 1 k 1
  
 A z *N k   N 1wk  B Kz *N k  K  N 1wk  (4.73)
 z *N k   N wk
Note that f

k 1
 f  because flag=0. Based on the definition of the terminal set, we
k
have the following.
z*N k  f   zN k 1  f 

k k 1
(4.74)
153
Up to now, we have checked all the constraints, we can conclude that the candidate
solution is feasible for the problem (OL-SMPCk+1) if flag=0.
For the scenario where flag=1, it implies that the constructed solution satisfies the
constraints in (OL-SMPCk+1), according to the safe update scheme in Section 4.3.3,
4.11 Appendix E: Proof of Theorem 4.3
Proof. We define the optimal objective value for the problem (OL-SMPCk) as Jk below.
N 1 2
Jk  V *
N  xk    cl*k 1 
 R  B PB T
(4.75)
l 0
Since the candidate solution is feasible and the objective function remains the same for
both scenarios where flag=0 and flag=1, we have
J k 1  VN  xk 1 
N 1 2 N 1 2 2 (4.76)
  cl k 1 
  cl*k 
 J k  c0* k 
  
l 0 l 1
where VN  xk 1  represents the objective value corresponding to the candidate solution.
2
By rearranging J k 1  J k  c0* k 
, we have the following inequality.

2
J k 1  J k   c0* k 
(4.77)

2
By adding J k 1  J k   c0* k 
from k=0, we have the follows.

 2
c
k 0
*
0k 

 J0  J (4.78)
154
 2
Based on c
k 0
*
0k 

 J 0  J  , we have lim c0* k  0 for both scenarios where flag=0
k 
and flag=1. Additionally, the convergence of the state to a neighbor of the origin is
further established.
Then, we consider the asymptotical dynamic behavior of system state as follows.
 k k

lim xk  lim   k x0    i 1 Bc0* k 1    i 1wk i 
k  k 
 i 1 i 1 
(4.79)
 k

 lim    i 1wk i 
k 
 i 1 
Note that the second equality holds because Φ is Shur and lim c0* k  0 .
k 
 k 
According to lim xk  lim    i 1wk i  , we have the following relation.
k  k 
 i 1 
lim xk  R  i0  i  (4.80)

k 
Based on lim xk  R , the system state converges to a neighborhood of the origin,

k 
namely minimal RPI set, under the proposed online-learning based risk-averse
stochastic MPC strategy. This completes the proof. □
155
CHAPTER 5
A TRANSFORMATION-PROXIMAL BUNDLE ALGORITHM FOR SOLVING
LARGE-SCALE MULTISTAGE ADAPTIVE ROBUST OPTIMIZATION
PROBLEMS
5.1 Introduction
In recent years, robust optimization has become an increasingly popular methodology
to immunize optimization problems against uncertainty among both control and
optimization communities [98, 334, 364-367]. Robust optimization can be roughly
classified into three categories: static robust optimization, two-stage Adaptive Robust
Optimization (ARO), and multistage ARO. In static robust optimization, all the
decisions are made prior to observing uncertainty realizations [368]. By contrast, two-
stage ARO allows recourse decisions to be adaptive to realized uncertainties [106], thus
typically generating less conservative solutions than static robust optimization [369].
As a result, the two-stage ARO method has a variety of applications [370]. To overcome
the limitation of two-stage structures, multistage ARO emerges as a practical yet more
computationally challenging paradigm for sequential decision making processes under
uncertainty [110]. In the multistage setting, the decision maker can dynamically adjust
decisions based on the observed uncertainty realizations [371]. Multistage ARO
problems are prevalent in various control problems, including constrained robust finite-
horizon optimal control [364], multiperiod portfolio optimization [372-374], and robust
feedback model predictive control problems [375, 376].
Despite its attractiveness in modeling dynamic decision making under uncertainty, ARO
problems in general are notoriously demanding to solve [377]. To this end, extensive
156
research effort has been made toward solution techniques for ARO problems. One
popular approach is the affine control policy (the so called affine decision rule)
approximation [106], in which recourse decisions are restricted to be affine functions of
uncertainty realizations [378, 379]. In this way, ARO problem reduces to a static (single-
stage) robust optimization problem, which can be further addressed efficiently using
duality-based reformulation or constraint generation [112, 380]. However, affine
control policy sacrifices optimality for tractability [182, 381-383]. Instead of relying on
control policies, the K-adaptability method devises K contingency plans beforehand and
picks a best one among these preselected plans after observing uncertainty realizations
[384, 385]. Reformulation-approximation methods conservatively express two-stage
ARO problem as a single-level optimization problem [386, 387]. The Benders
decomposition and extreme point enumeration were proposed as two exact solution
techniques exclusively suitable for two-stage ARO problems [103, 388-390]. Despite
the broad application scope of the multistage setting, solution techniques for multistage
ARO problems are limited based on the existing literature, and they usually suffer from
an unsatisfactory trade-off between solution quality and computational tractability.
Hence, the research objective of our work is to propose a general algorithmic strategy
for multistage ARO and demonstrate its use in robust optimal control.
This chapter proposes a novel multi-to-two transformation-proximal bundle algorithm
to solve Multistage Adaptive Robust Mixed-Integer Linear Programs (MARMILPs). In
a multistage decision-making setting, decision variables can be partitioned into two
different groups, namely state decision variables and control/local decision variables
[391, 392]. We first propose a novel multi-to-two transformation scheme that converts
157
the multistage ARO problem into an equivalent two-stage counterpart. Specifically, by
enforcing only state decision variables to be affine functions of uncertainty, the original
MARMILP is reduced into a Two-stage Adaptive Robust Mixed-Integer Linear
Program (TARMILP). The proposed transformation scheme frees control decision
variables from the affine control policy restriction, thereby leading to a higher-quality
robust optimization solution [393]. We perform theoretical analysis to prove that such
transformation is valid if state decisions follow causal control policies, such as affine
and piecewise affine control policy [106, 381]. The multi-to-two transformation scheme
is general enough to be combined with existing two-stage ARO solution algorithms for
solving MARMILPs. Specifically, we adopt a proximal bundle algorithm for the exact
solution of the resulting TARMILP. Since the worst-case recourse function in the two-
stage ARO problem lacks an analytical expression and can be non-smooth, the bundle
method is employed with an oracle evaluating the function value and its sub-gradients
at a query point [394]. The Moreau-Yosida regularization is leveraged to determine the
next iteration in a decomposition framework [395]. Notably, the assumption on stage-
wise independence of uncertainty is not required for the multi-to-two transformation
scheme. As a result, the proposed algorithmic framework can accommodate temporal
dynamics exhibited by uncertainties across different time stages. Convergence analysis
of the proposed algorithm is presented for any types of uncertainty sets. Compared with
existing multistage ARO solution methods, including the affine control policy method
[106] and the piecewise affine control policy approach [396], the proposed algorithm
enjoys a more attractive trade-off between solution quality and computational
tractability. The affine control policy method assumes that both state and control
158
decision variables are affine functions of uncertainty [106], which is stronger than the
assumption adopted in this work. Chen & Zhang (2009) splits the uncertainty into the
positive and negative parts, and applies affine control policies to the parameterized
uncertainties. Thus, the piecewise affine control policy developed by Chen & Zhang
(2009) essentially assumes a piecewise affine dependence of uncertainty for both state
and control decision variables. Additionally, these approaches do not require an oracle
for evaluating the function value and its sub-gradients [106, 396], whereas the proposed
method needs such oracle to obtain cutting planes. To test and evaluate the performance
of the proposed algorithm, an application to constrained robust optimal control of
dynamic inventory systems is presented. Although this chapter only presents the
applications to robust optimal inventory control and process network planning, it
focuses on the development of a general methodology for multistage ARO, which can
be potentially applied to a variety of control problems.
The major contributions of this work are summarized as follows.
 A novel multi-to-two transformation scheme along with its theoretical analysis for
the solution of multistage ARO problems by applying decision rules only to
adjustable state decisions;
 A transformation-proximal bundle algorithm for solving multistage ARO problems
that provides an attractive trade-off between solution quality and tractability;
 An efficient procedure to construct lower bounds of MARMILPs based on a
scenario-tree problem with uncertainty scenarios generated by the cutting-plane
based proximal bundle algorithm;
159
 Application to the constrained robust optimal control of dynamic inventory systems
under demand uncertainty alongside comparisons with affine and piecewise affine
disturbance-feedback control policies.
5.2 The multi-to-two transformation scheme
In this section, we propose a novel multi-to-two transformation scheme for multistage
ARO problems. By employing the affine control policy only to state decision variables,
the proposed scheme can transform the original multistage ARO problem into its
equivalent two-stage ARO counterpart. First, a general model formulation of an
MARMILP is presented. We then develop the transformation scheme, in which affine
control policies are only applied to state decision variables. Finally, a theoretical
analysis is performed to prove that the MARMILP is converted to an equivalent two-
stage ARO problem via this multi-to-two transformation scheme.
In multistage ARO problems, decisions are made sequentially, and uncertainties are
revealed over time stages. The MARMILP in its general form is shown as follows:
T
min max cx   dtst  ut   fty t  ut  
x , st  , y t   uU
t 1
 
s.t . Tt x  A t st  ut   Bt st 1  ut 1  (5.1)
 Wt y t  u   h  H t ut , u  U , t
t 0
t
Lt x  Et st  ut   G t y t  ut   m t0  M t ut , u  U , t
where T is the total number of time stages, u1, …, uT are uncertainties revealed over T
stages, x is a vector of “here-and-now” decisions made prior to any uncertainty
realizations, s1, …, sT are adjustable state decision variables, and y1, …, yT are adjustable
control decision variables. Note that the “here-and-now” decisions x include continuous
160
and integer variables, while the adjustable or recourse decisions involve continuous
decision variables. The prime symbol ′ stands for the transpose of a generic vector. Let
vector ut=[u1′, …, ut′]′ be the concatenated vectors of past uncertainty realizations, and
u=[u1′, …, uT′]′. c is the vector of cost coefficients corresponding to “here-and-now”
decisions. dt and ft are the vectors of cost coefficients corresponding to state decisions
and control decisions made at stage t, respectively. The state decision variables link
optimization problems of successive stages, while control decisions are only involved
in the current time stage [391]. Also note that a large class of multistage ARO problems
can be reformulated in this form through the introduction of additional variables and
constraints [392].
Remark 5.1 Constrained robust optimal control problems of linear systems with
disturbance-feedback control policies can be regarded as special instances of the
multistage ARO problem (1) without “here-and-now” decision x. The equality
constraints in (1) describe the state dynamics of a discrete-time linear system subject to
additive disturbance.
Decisions st(ꞏ) and yt(ꞏ) are general control policies or mappings, enabling the recourse
actions to be fully adaptive to observed uncertainty realizations. The multistage ARO
problem given in (1.6) is computationally intractable due to the infinite dimensions of
the mappings or policies. To this end, affine control policy is resorted to as a tractable
approximation technique that restricts both st(ꞏ) and yt(ꞏ) to be affine functions of
uncertainty realizations. However, such computational tractability induced by the
conventional decision-rule-based method is usually obtained at a huge expense of
solution quality. Note that, in conventional robust optimal control, the control policies
161
st(ꞏ) and yt(ꞏ) are restricted to be affine with respect to uncertainty or disturbance. The
key idea of the proposed multi-to-two transformation scheme is to restrict only state st(ꞏ)
to follow an affine control policy as shown in (5.2), while endowing control variables
yt(ꞏ) with full adjustability to the observed uncertainty realizations.
st  ut   Pu
t
t
 qt (5.2)
where Pt and qt are the coefficients of the affine function and must be determined before
uncertainty realizations. Note that Pt is a matrix, qt is a vector, and they are of
appropriate dimensions. The above control policy is causal or non-anticipative, because
it only depends on the past uncertainty realization ut instead of the future ones. After
plugging the control policy (5.2) into the multistage ARO problem (1.6), the
MARMILP under the multi-to-two transformation scheme can be formulated as
follows:
 T
 T
min max  cx   dtq t    d tPt u t  fty t  u t  
x , Pt , qt uU
y t    t 1  t 1  
s.t . A t  Pt ut  qt   B t  Pt 1ut 1  qt 1  (5.3)

 Wt y t  u t   h t0  H t u t  Tt x, u  U , t
Et  Pt u t  qt   G t y t  ut   m t0  M t u t  Lt x, u  U ,t
where control decision yt(ꞏ) is a general function of uncertainty realizations.
For the ease of exposition, we present the nested formulation of multistage ARO
problem (5.3) in (5.4).
  
min  f 0  xˆ   max min 1  f1  y1    max min T fT  yT   (5.4)
u1U1 y11  xˆ ,u  uT UT yT T  xˆ ,u 
 
xˆ0

162
where xˆ  x, Pt , q t  is an aggregated “here-and-now” decisions, set Ω0 represents its
feasible region, and set Ωt( x̂ , ut) is the feasible region of adjustable control decisions
at stage t as given in (5.5). U1, …, UT denote uncertainty sets of uncertain parameters at
different time stages.
 At  Pt ut  qt   Bt  Pt 1ut 1  qt 1  
 
 
t  xˆ , ut   y t  Wt y t  ht0  H t ut  Tt x  (5.5)
 
 Et  Pt u  qt   G t y t  mt  M t ut  Lt x 
t 0
The objective functions in the nested multistage ARO formulation (5.4) at different time
stages are explicitly defined in (5.6).
 T
 0 
f ˆ
x  cx   dtqt
(5.6)
 t  1
 f y  f y  d P ut , t  1,  , T
 t t t t t t
Since we do not assume uncertainty to be stage-wise independent, uncertainty set U in
the MARMILP can be treated as a “joint” uncertainty set. In this sense, the uncertainty
set for stage t is given by (5.7),

U t  Projut U u1 ,,u t 1  (5.7)
where Ut is defined as the projection of uncertainty set U onto ut given the values of u1
to ut-1.
By employing affine control policies only to state decisions, the multi-to-two
transformation scheme converts (1.6) into problem (5.4). The following theorem
provides a theoretical proof that the multistage ARO problem (5.4) is equivalent to a
two-stage ARO problem. Therefore, the multistage ARO problem is reduced to a two-
stage ARO problem through the proposed transformation scheme.

163
Theorem 5.1 If the affine control policy (5.2) is used only for adjustable state decisions,
the multistage ARO problem (1.6) is transformed into a two-stage ARO problem given
below.
 

 
T T
min  cx   dtqt   max min  dtPt ut  fty t  (5.8)
xˆ 0
 t 1   t t    t 1
uU y y y  xˆ ,ut , t

where y=[y1′, …, yT′]′ be the concatenated control decisions.
Proof. Since the multistage ARO problem (1.6) is reformulated as (5.4) by applying
affine control policy in (5.2), we only need to concern the equivalence between
optimization problems (5.4) and (5.8). Considering the max-min optimization problem
in (5.4) at t=T−1, we have
 
max min  fT 1  y T 1   umax min T fT  y T  
 T 1
uT 1UT 1 yT 1T 1 xˆ ,u  T UT yT 1  x
ˆ ,u 

 
 max  min fT 1  y T 1   max min T fT  y T  
uT 1UT 1 yT 1T 1  xˆ ,uT 1  uT UT yT T  xˆ ,u 
 
(5.9)
 
 max max  min T 1 fT 1  yT 1   min T fT  y T  
uT 1UT 1 uT UT yT 1T 1  xˆ ,u  yT T  xˆ ,u 
 
 T 
 max   minˆ t f t  y t  
uT 1 , uT ProjuT 1 , uT  U u1 ,,uT 2  t T 1 y t t  x ,u  
The first equality in (5.9) is based on the fact that the optimization problem at t=T does
not involve control decisions at stage T-1. The second equality in (5.9) is valid because
the feasible region of yT-1 and fT-1(yT-1) do not depend on uT. The above derivation can
be performed backward until t=1, and as a result, the nested formulation collapses.
Therefore, we can further rewrite the nested formulation (5.4) as follows.
164
 T 
min  f 0  xˆ   max   min t ft  y t  
u1 ,,uT Proju1 ,,uT  U  t 1 y t t  xˆ ,u 
 
xˆ 0

 T 
 min  f 0  xˆ   max  min t ft  y t   (5.10)
xˆ 0

uU
 t 1 yt t  xˆ ,u  
 T 
 min  f 0  xˆ   max
xˆ 0
min
 t t    t 1
 ft  y t  
 
The first equality in (5.10) is due to the definition of projection. The second equality is
valid because the inner minimization problem can be decoupled by stage, given x̂ and
u. According to (5.6) and (5.10), the multistage ARO problem (5.4) is equivalent to a
two-stage ARO problem (5.8), which concludes the proof. □
Remark 5.2 Following a similar procedure, we can readily prove that such
transformation scheme is still valid if the adjustable state decision variables follow other
types of causal control policies. The proposed scheme is general enough to embrace
more advanced control policies, such as piecewise affine and polynomial control
policies.
Remark 5.3 One highlight of the proposed transformation scheme lies in its capability
of being employed in conjunction with existing two-stage ARO solution algorithms for
solving MARMILPs. Accordingly, the multi-to-two transformation scheme opens a new
avenue for a variety of multistage ARO solution algorithms.
5.3 Transformation-proximal bundle algorithm
In this section, a proximal bundle method is first adopted for solving the resulting two-
stage ARO problem (5.8). We then propose an algorithmic framework for solutions of
multistage ARO problems by combining the proposed multi-to-two transformation
165
scheme with the proximal bundle method and present the convergence analysis of the
proposed solution algorithm.
5.3.1 A multistage robust optimization solution algorithm
The proximal bundle algorithm has proved to be an efficient solution method in various
optimization areas, such as non-smooth optimization [397], robust optimization [398],
and stochastic programming [399]. In the following, we present the proximal bundle
method for solving two-stage ARO problem (5.8).
The worst-case recourse function of the two-stage ARO problem, denoted as Q  xˆ  , is
shown in (5.11).
 
T
Q  xˆ   max
 t
min
t  


t 1
d P u
t t
t
 ft y t (5.11)
where the “max-min” optimization problem is often referred to as an adversarial
optimization problem.
Based on the definition of the worst-case recourse function, two-stage ARO problem
(5.8) can be considered as a minimization problem whose objective function is given by
(5.12).
 T

F  xˆ    cx   dtqt   Q  xˆ  (5.12)
 t 1 
where F  xˆ  is the objective function of two-stage ARO problem (5.8).
Due to the multi-level optimization structure, the objective function F  xˆ  does not
have an analytical expression and is computationally expensive to evaluate. As a class
of regularized cutting plane methods, the proximal bundle method is proved to be
166
suitable for addressing this type of optimization setting [400]. In the proximal bundle
method, bundle information includes the past query points xˆ l (l=1, .., k), their
 
corresponding function values F xˆ l , and sub-gradients of function F at these query
points. We need to solve the max-min optimization problem in (5.11) to obtain the
function value and a sub-gradient at one query point. To this end, the two-level
optimization problem (5.11) is equivalently reformulated into a single-level one by
replacing the inner minimization problem in (5.11) using KKT conditions [401]. The
resulting sub-problem denoted as (SUP) is given by
  d P u 
T
max t t
t
 ft y t
uU
y t , φt , πt t 1
s.t . Wtφt  G t πt  ft , t
πt  0, t (SUP)
y t  t  xˆ , ut  , t
G t y t  Et  Pt ut  qt   mt0  M t ut  Lt x   πt   0, t , i
 i i
where φt and πt are the dual variables corresponding to the constraints in (5.5) at stage
t, and i denotes the element index of a vector.
To address bilinear terms in the complementary slackness constraints in (SUP), we
introduce binary variables and linearize these constraints into (5.13) by using the big-M
method, which is a standard technique used in literature [402].
 πt i  M   wt i , t , i
(5.13)
G t y t  Et  Pt ut  qt   mt0  M t ut  Lt x   M  1   w t   , t , i
 i  i
167
where wt represents a vector of binary decision variables, and M is a large positive
number. If uncertainty set U is a polyhedral set, the reformulation of (SUP) is a Mixed-
Integer Linear Program (MILP).
With sub-gradients and function values, we build the optimality cutting plane model for
F  xˆ l  shown in (5.14) [398].

Fk  xˆ   max F  xˆ l   g l , xˆ  xˆ l
l 1,, k
 (5.14)
where Fk  xˆ  is the optimality cutting plane model at the k-th iteration. gl is one sub-
gradient of the objective function F at the l-th query point and can be obtained using
optimal dual variables in the same way as the Benders decomposition [389]. Notably,
Fk  xˆ  is a lower piecewise linear approximation of function F. Note that the two-stage
ARO problem (5.8) may not satisfy the relative complete recourse assumption.
Therefore, for some query point xˆ l , there exist certain uncertainty realizations that
 
render the second-stage optimization problem infeasible. This implies F xˆ l   or
equivalently xˆ l  dom F , where dom represents the domain of a function. In the
proximal bundle method, for a given xˆ l , we either derive a lower linearization of
function F (optimality cut) or obtain a cutting plane that separates xˆ l and dom F
(feasibility cut) [398]. To check whether xˆ l  dom F or not, the following Feasibility
Problem (FP) needs to be solved.
168
max min
 
uU y t ,α t ,α t ,βt
 1α
t

t  1α t  1βt 
s.t . At  Pt ut  qt   Bt  Pt 1ut 1  qt 1   Wt y t  α t  α t
 ht0  H t ut  Tt x, t (FP)
Et  Pt u  qt   G t y t  βt  m  M t ut  Lt x, t
t 0
t
α t , α t , βt  0, t
where αt+, αt−, and βt are slack variables, and 1 is the vector of ones in an appropriate
dimension. Similar to problem (SUP), we reformulate problem (FP) in the max-min
form into a single-level optimization problem using the KKT condition and linearize
complementary slackness constraints using the big-M method [402]. Let  xˆ l denote  
 
the optimal value of problem (FP) associated with a query point xˆ l . If  xˆ l  0 , there
exist feasible second-stage decisions for any uncertainty realizations in uncertainty set
 
U. Thus, we have xˆ l  dom F and only need optimality cuts. If  xˆ l  0 , the worst-
case uncertainty realization can lead to the nonexistence of feasible recourse decisions.
As a result, the feasibility cut is required.
To determine the next query point, we consider the Moreau-Yosida regularization of
Fk  xˆ  given by (5.15),
1
G  xˆ   Fk  xˆ  
2
xˆ  z k (5.15)
2tk
where zk is the stability center for the k-th iteration and tk is the positive proximal
parameter [394, 400]. Note that the stability center represents the best current iterate.
The proximal bundle method uses the regularization term to make sure that the next
iterate is not far away from the stability center. In the proximal bundle algorithm, we
169
iteratively refine the cutting plane models by adding new query points on the fly. The
optimal solution of the following Master Problem (MP) provides the next query point.
1 2
min   xˆ  z k
xˆ 0 ,  2tk
s.t .   F  xˆ l   g l , xˆ  xˆ l , l  Lo (MP)
0  F  xˆ l   g lf , xˆ  xˆ l , l  L f
where η is an auxiliary variable. Lo and Lf denote the index sets of optimality and
feasibility cuts, respectively. Akin to the Benders decomposition, constraint
  F  xˆ l   gl , xˆ  xˆ l corresponds to an optimality cut, while
0  F  xˆ l   glf , xˆ  xˆ l is a feasibility cut. Besides the cuts derived in the dual space,
optimality cuts in the primal space can be added as well [103]. It is worth noting that
the above master problem is a Mixed-Integer Quadratically Constrained Program
(MIQCP), which can be solved efficiently by using the off-the-shelf optimization
solvers such as CPLEX and GUROBI.
In the proximal bundle method, the expected decrease δk defined in equation (5.16) is
used to determine whether to update the stability center or remain at the current stability
center. Also, the expected decrease is used to check the stopping criterion in the
proximal bundle algorithm.
1 k 1 k
 k  F  z k   Fk  xˆ k 1  
2
xˆ  z (5.16)
2tk
where xˆ k 1 is an optimal solution to (MP). To circumvent unnecessary moves, the
proximal bundle method updates the stability center only when the objective is
 
sufficiently decreased, i.e. F xˆ k 1  F z k  m k .  
170
The proximal bundle method is adopted for the two-stage ARO problem and designed
in a decomposition framework to transform the original tri-level optimization problem
into a single-level problem. The cutting-plane models are refined gradually in each
iteration, and the stability center is guaranteed to converge to the optimal solution. The
convergence analysis of the proposed transformation-proximal bundle algorithm is
presented in Section 5.3.2.
The pseudocode of the proposed transformation-proximal bundle algorithm for solving
multistage ARO problems is shown in Figure 25. The proposed algorithmic framework
is comprised of two primary blocks connected in series. The first block is the multi-to-
two transformation step to convert the multistage ARO problem into a two-stage ARO
problem. The second block is the proximal bundle method, which is employed to
address the resulting two-stage ARO problem. The proposed algorithm iteratively
solves a master problem, a feasibility problem, and a subproblem, until the expected
decrease reaches its predefined tolerance δtol. The transformation-proximal bundle
algorithm provides an attractive trade-off between solution quality and computational
tractability by organically integrating the multi-to-two transformation scheme with the
regularized cutting-plane machinery.
171
Algorithm. Transformation-proximal bundle algorithm
1: Step 1 (Initialization)
2: Set k  0 , flag  0 , m , tk and tol ;
3: Step 2 (Transformation step)
4: Substitute adjustable state decisions with affine control policy in (2);
5: While flag  0
6: Step 3 (Master problem)
7: Solve master problem (MP) to obtain xˆ k 1 , k 1 ;
1 k 1 k 2
8:  
Update  k  F z k  Fk xˆ k 1  2tk

xˆ  z ;
9: Step 4 (Stopping test)

10: if  k   tol then
11: flag  1 and stop the while loop;
12: end
13: Step 5 (Call oracle)
14: Solve feasibility problem (FP) to obtain   xˆ k 1  and g kf 1 ;
15: Solve subproblem problem (SUP) to obtain F  xˆ k 1  and gk1 ;
16: Step 6 (Update stability center)
17:    
if F xˆ k 1  F z k  m k then
18: 
Update stability center z k 1  xˆ k 1 , and F z k 1  F xˆ k 1 ;  
19: else
20: 
Remain the stability center z k 1  z k , and F z k 1  F z k ;   
21: end
22: Step 7 (Update cutting-plane models)
23: If   xˆ k 1   0 then
24: 
Update Fk 1  xˆ   max Fk 1  xˆ  , F  xˆ k 1   g k 1 , xˆ  xˆ k 1 ;
25: else
26: Update the feasibility cutting plane model;
27: end
28: k  k 1;
29: end
k
30: Return z and F z ;
k
 
Figure 25. The pseudocode of transformation-proximal bundle.
172
5.3.2 Convergence analysis
In this subsection, we present the convergence analysis of the proposed algorithm.
Proposition 5.1 F  xˆ  in (5.12) is a convex function in x̂ .
Proof. Based on (5.12), we can see that F  xˆ  is the sum of the linear function in x̂
and Q  xˆ  . So, we only need to show the convexity of Q  xˆ  . We rewrite function Q  xˆ 
as (5.17).
T T 
Q  xˆ   max   dtPt ut  min  t tf  y
uU
 t 1 yy y t t  xˆ ,ut , t t 1  (5.17)
T 
 max   dtPt ut  R  xˆ , u  
uU
 t 1 
 dtPt ut is a linear function in x̂ . Let    0, 1 , y1t* be the optimal

T
For  u  U , t 1
solution for the minimization problem involved in R  xˆ 1 , u  , and y*2t be the optimal
solution for minimization problem involved in R  xˆ 2 , u  . Therefore, we can have
T
R  xˆ 1  1    xˆ 2 , u    ft  y1*t  1    y*2t 
t 1 (5.18)
   R  xˆ 1 , u   1     R  xˆ 2 , u 
The inequality in (5.18) is based on the fact that  y1*t  1    y *2 t is feasible for the
minimization problem involved in R  xˆ 1  1    xˆ 2 , u  . Based on (5.18), R  xˆ , u  is a
convex function in x̂ for  u  U . Following the pointwise maximum property, F  xˆ 
is a convex function. □
To facilitate exposition, we define the linearization errors at the stability center zk in
(5.19).
173
el  F  z k    F  xˆ l   g l , z k  xˆ l  , l (5.19)
Based on convexity of F  xˆ  , we have el  0 . With the definition of el, we can rewrite
Fk  xˆ  in (5.20).

Fk  xˆ   max F  z k   el  gl , z k  xˆ l  gl , xˆ  xˆ l
l 1,, k

(5.20)

 F  z k   max el  gl , xˆ  z k
l 1,, k

To prove the convergence of the proposed algorithm, we first present five lemmas and
one proposition as follows [394, 403].
Lemma 5.1 Consider the following Regularized Optimization Problem (ROP):
1
min Fk  xˆ  
2
xˆ  z k (ROP)
xˆ 2tk
Then, the dual problem is shown in (5.21).

2
tk k k

maxk 
F  z k
 
2
 l  gl   l  el (5.21)
αR  l 1 l 1 l 1
k

 
l 1 
Proof. By using epigraph reformulation, we have (5.22).
1 2
min   xˆ  z k
xˆ ,  2tk (5.22)
s.t .   F  z k
e 
l g , xˆ  z , l  1,.k
l k
The Lagrangean function is
 
k k
1
     l    l  F  z k   el  g l , xˆ  z k
2
L   xˆ  z k
2tk l 1 l 1
Based on KKT conditions of this problem, we have
174
 L 1 k
 xˆ t 
    l  gl  0

k
ˆ
x z
 k l 1

 L  1    0
k
  
l 1
l
Thus, the dual objective function is given in (5.23),
 
k
1
   l    F  z k   el  g l , xˆ  z k
2
 xˆ  z k
2tk l 1
k
1
 F  z k     l  el  tk
2 2
tk  l 1 l  g l 
k k
   gl
l 1 l
(5.23)
2tk l 1
k 2
t k
 F  z     l  el  k
k
 l g l
l 1 2 l 1
Lemma 5.2 Suppose α is an optimal solution to the optimization problem in (5.21).
Then, we have
(i) gˆ k Fk  xˆ k 1 
Fk  xˆ k 1   F  z k   tk gˆ k
2
(ii)  eˆk
tk k 2
(iii) k  gˆ  eˆk
2
(iv) gˆ k  eˆk F  z k 
where gˆ k   l 1 l  g l and eˆk   l 1 l el .

k k
Proof. (i) We can obtain
1 k 1 k
gˆ k  
tk
 xˆ  z  (5.24)
 1 
Since xˆ k 1 is an optimal solution to (ROP), we have 0   Fk  xˆ    xˆ  z k   .
 tk 
175
Therefore, based on (5.24), we arrive at (i).
(ii) Based on strong duality, we have,
1 k 1 k tk k
Fk  xˆ k 1    F  z k   eˆk 
2 2
xˆ  z gˆ (5.25)
2tk 2
Based on (5.25), we further have
t 1
Fk  xˆ k 1   F  z k   eˆk  k gˆ k
2 2
 tk gˆ k
2 2tk (5.26)
 F  z k   eˆk  tk gˆ k 2
(iii) According to (5.16) and (5.26), we can have (5.27).

 k  F  z k   F  z k   eˆk  tk gˆ k
2
  21t tk gˆ k
2
k
(5.27)
t 2
 eˆk  k gˆ k
2
(iv) Since F  xˆ  is convex based on Proposition 5.1 and Lemma 5.2 (i), we have the
following
F  xˆ   Fk  xˆ k 1   gˆ k , xˆ  xˆ k 1
 F  z k   tk gˆ k
2
 eˆk  gˆ k , xˆ  z k  gˆ k , z k  xˆ k 1 (5.28)
 F  z k   gˆ k , xˆ  z k  eˆk
The first equality is based on Lemma 5.2 (ii), and the second equality is based on the
equation (5.24). □
Definition 5.1 (Serious Steps). For the proposed algorithm, serious steps refer to those
steps in which the stability center is changed.
Lemma 5.3 Suppose F* be the optimal value of min F  xˆ  and F*>−∞. Then, we have
inequality (5.29).
176
F  z0   F *

kLs
k 
m
 (5.29)
where Ls denotes the set of iteration having serious steps.
Proof. Based on the serious step, we have
F  z k   F  xˆ k 1   F  z k   F  z k 1   m k
By taking a summation over the set of serious steps, we arrive at
F  z0   F *    F  z   F  z    m
k k 1
k (5.30)
kLs kLs
By rearranging (5.30) and noting that F*>−∞, we have (5.29), which completes the
proof. □
Assumption 5.1 For the infinite number of serious steps, i.e. Ls   , the sequence
F  z  k
k Ls
is assumed to converge and limkLs F z k  F*   .  
Note that here we do not assume the converged value F* is the optimal value of the two-
stage ARO problem (5.8), which is denoted by another symbol F * . Theorem 5.2 will
prove that F* is indeed the optimal value.
Lemma 5.4 For an infinite number of serious steps, we have
(i) If t
kLs
k   , then liminf gˆ  0 ;
k
(ii) If 0  tk  c and arg min F  xˆ    , then z k  is bounded.

xˆ k Ls
Proof. (i) Using Lemma 5.2 (iii), we have inequality (5.31).
F  z 0   F*
2
tk gˆ k

kLs 2
 
kLs
k 
m
 (5.31)
177
Since t k   , we can conclude that zero is a cluster point of gˆ 
k
k Ls
.
kLs
(ii) Let xˆ *  arg min F  xˆ  , for k  Ls , we have the following
2 2 2
xˆ *  z k 1  xˆ *  z k  z k  z k 1  2 xˆ *  z k , z k  z k 1
  tk  gˆ k
2 2 2
 xˆ *  z k  2tk xˆ *  z k , gˆ k
 t 2
 2tk  F  xˆ *   F  z k   eˆk  k gˆ k 
2
 xˆ *  z k (5.32)
 2 

 2tk F  xˆ *   F  z k    k 
2
 xˆ *  z k
2
 xˆ *  z k  2tk  k
Summing (5.32) over set Ls leads to the following inequality:
k 1
xˆ *  z k 1  xˆ *  z 0  2 tl l  xˆ *  z 0  2c    l
2 2 2
(5.33)
l 1 lLs
Based on (5.33) and Lemma 3, then we have (ii). □
Definition 5.2 (Null Steps). For the proposed algorithm, null steps are those steps in
which the stability center remains the same.
Lemma 5.5 If there is a finite number of serious steps, i.e. Ls   , let k0 be the index
of last serious step, xˆ k  be the sequence of null steps, and z k0 be the stability center
k  k0
generated by the last serious step. Then, we have (k>k0),
1
 
2
F z k0   k  xˆ  xˆ k 1
2tk
(5.34)
1
 Fk  xˆ k 1   gˆ k , xˆ  xˆ k 1 
2
xˆ  z k0
2tk
Proof. Starting from the right-hand side of (5.34), we have (5.35).
178
1  1 k 1 k0 
F  z k0   xˆ  xˆ k 1   F  z k0   Fk  xˆ k 1  
2 2
xˆ  z 
2tk  2tk 
 Fk  xˆ k 1  
1
2tk  2
xˆ  xˆ k 1  xˆ k 1  z k0
2

(5.35)
   
 xˆ  xˆ k 1  xˆ k 1  z k0
 
2
1
 Fk  xˆ k 1   
2tk  
ˆ ˆ k 1 ˆ k 1 k0
 2 x  x , x  z 
 Fk  xˆ k 1  
1
2tk  xˆ  z k0 2
 2 xˆ  xˆ k 1 ,  tk gˆ k   RHS
The first equality is based on (5.16), while the third equality is based on the equation
(5.24). □
Lemma 5.6 For the proposed algorithm, the following equality and inequality hold.
   
(i) Fk xˆ k 1  gˆ k , xˆ k  2  xˆ k 1  F z k  gˆ k , xˆ k  2  z k  eˆk
  
(ii) Fk xˆ k 1  gˆ k , xˆ k  2  xˆ k 1  Fk 1 xˆ k  2 
Proof. (i) Starting from the right-hand side of Lemma 5.6 (i), we have (5.36).
F  z k   gˆ k , xˆ k  2  z k  eˆk
 F  z k   gˆ k , xˆ k  2  xˆ k 1  gˆ k , xˆ k 1  z k  eˆk
 F  z k   gˆ k , xˆ k  2  xˆ k 1  gˆ k ,  tk gˆ k (5.36)
  Fk  xˆ k 1   F  z k   tk gˆ k 
2
 
 Fk  xˆ   gˆ , xˆ  xˆ
k  1 k k  2 k 1
where the second equality holds according to (5.24) and Lemma 5.2 (ii).
(ii) Based on the expressions of gˆ k and eˆk in Lemma 5.2, we can have
179
F  z k   gˆ k , xˆ k  2  z k  eˆk

 F  z k    l 1 l el  g l , xˆ k  2  z k
k

(5.37)

 F  z k   max el  g l , xˆ k  2  z k
l 1,, k

 Fk  xˆ k  2   Fk 1  xˆ k  2 

k
The first inequality is based on the fact that α  Rk and l 1
 l  1 , the second equality
holds because of (5.20), and the last inequality is according to (5.14).
Based on Lemma 5.6 (i) and (5.37), we have Lemma 5.6 (ii).
Assumption 5.2 The sequence t k  is positive and nonincreasing.
Propositon 5.2 If there is a finite number of serious steps, let k0 be the index of last
serious step, xˆ k  the sequence of null steps, and z k0 is the stability center generated
k  k0
by the last serious step. Then δk→0.
Proof. Using Lemma 5.5 and xˆ  xˆ k  2
1 k 2
 
2
F z k0   k  xˆ  xˆ k 1
2tk
1 k 2
 Fk  xˆ k 1   gˆ k , xˆ k  2  xˆ k 1 
2
xˆ  z k0
2tk
(5.38)
 Fk 1  xˆ   21t xˆ k 2  z k0
2
k 2
1
 Fk 1  xˆ k  2    
2
xˆ k  2  z k0  F z k0   k 1
2tk 1
where the first equality is based on Lemma 5.5, the first inequality is according to
Lemma 5.6, the second inequality is valid because of Assumption 5.1, the last equality
1 k  2 k 1 2
is based on (5.16). By rearranging (5.38), we have  k   k 1  xˆ  xˆ .
2tk
180
Using one more time Lemma 5, we can have xˆ  z k0 and (5.39).
1 k0
  z  xˆ k 1  Fk  xˆ k 1   gˆ k , z k0  xˆ k 1
2
F z k0   k 
2tk (5.39)
 Fk z   F z 
k0 k0
2
Therefore, we have z k0  xˆ k 1  2 k tk  2 k0 tk0 due to fact that δk is decreasing and tk
is nonincreasing. Thus, xˆ k   is bounded. Since the serious steps fail for any steps
beyond k0, we have m k  F xˆ k 1  F z 0 .
k
   
 
Based on (5.16), we can have  k  F z 0  Fk xˆ k 1 . Therefore, we have
k
 
1  m   k  F  xˆ k 1   Fk  xˆ k 1 
(5.40)
  F  xˆ k 1   F  xˆ k     Fk  xˆ k   Fk  xˆ k 1    2 xˆ k 1  xˆ k
 
The equality in (5.40) is based on fact that F xˆ k  Fk xˆ k . Therefore, we can obtain  
1  m   2  1  m   2
2 2
1 k 2 2
 k   k 1  xˆ  xˆ k 1  k 1 (5.41)
8 2tk 8  2 t k0
k
2tk
1  m
2
By summing (5.41) over k≥k0, we have

8 t02 
k  k0
k
2
   k   k 1    k0 .
k  k0
Thus, δk→0. □
Theorem 5.2 For δtol=0, the proposed transformation-proximal bundle algorithm
converges to the globally optimal solution of (5.8) asymptotically; for  tol  0 , it is
guaranteed to converge in finite steps.
181
Proof. For δtol=0, the transformation-proximal bundle algorithm loops forever. There
are two exclusive scenarios: (1) The algorithm implements an infinite number of serious
steps; (2) After a finite number of serious steps, the algorithm implements only null
steps.
If there is an infinite number of serious steps, we have eˆk  0 . Therefore, we have
gˆ k F  z k  , k   . Based on Lemma 5.4, we can further have that
0 F  z k  , k   , which implies that the algorithm converges to globally optimal
solution of (5.8) asymptotically.
Under scenario 2, we have  k  0 based on Proposition 5.2. Also,  k  0 implies
eˆk  0 according to Lemma 5.2 (iii). Thus, the algorithm still converges to globally
optimal solution of (5.8) asymptotically. For  tol  0 , suppose the algorithm does not
converge in finite number of iterations, then we have  k   tol  0, k , which
contradicts δk→0. based on Lemma 5.3 and Proposition 5.2. □
Remark 5.4 Since el defined in (5.19) is the linearization errors for a convex function,
both el and eˆk   l 1 l el ( l  0 ) are nonnegative. According to Lemma 5.2 (iii), we
k
have  k  0, k . If δtol= 0, that stopping criterion becomes δk< 0. Consequently, the
stopping condition will never be met, and the algorithm will loop forever.
Remark 5.5 Since  k  0, k , the series  kLs

 k is nondecreasing. According to
Lemma 5.3, the series  kLs

 k is also bounded from above. Therefore, the series has
a limit, and its convergence is guaranteed.
182
5.4 The lower bounding technique
In this section, we devise a lower bounding technique, which serves to assess solution
quality of multistage ARO solution algorithms. Both affine control policy and the
proposed transformation-proximal bundle algorithm are approximation solution
approaches for solving computationally intractable MARMILPs, and they both yield
upper bounds on the optimal value of the original multistage robust optimization
problem. To measure the loss of optimality, we leverage the proposed solution algorithm
developed in the previous section in conjunction with the scenario-tree based method
[404]. The proposed lower bounding technique is presented in this section.
There are in general two types of lower bounds, namely a priori bound and posteriori
bound. A priori lower bounding methods evaluate the worst-case bound for any problem
instances of MARMILPs. However, this type of lower bound might be too pessimistic
for a specific problem instance. As such, we focus on posteriori lower bounding
techniques, which can provide a lower bound for the optimal value of a specific
MARMILP instance. Posterior results fit our purpose to assess and compare loss of
optimality incurred by different multistage solution algorithms at computational
experiments in the next section.
The idea of scenario-tree-based lower bounding approach is to replace the uncertainty
set in MARMILPs with a finite number of uncertainty scenarios. The resulting scenario-
tree problem yields a lower bound, because it is a relaxation of the original MARMILP.
It is worth noting that the quality of lower bounds depends heavily on the choice of the
scenario set. Motivated by this observation, we resort to the uncertainty scenario set
constructed within the transformation-proximal bundle algorithmic framework.

183
Specifically, uncertainty scenarios are directly constructed from the subproblem (SUP)
and the feasibility problem (FP) during the oracle calling. This yields optimality or
feasibility cuts, which are then fed back to the master problem in each iteration. When
the proposed solution algorithm converges, the scenario set can be obtained by
collecting all uncertainty scenarios. The resulting scenario-tree counterpart is shown as
follows.
T
min max cx   dtst  ut   fty t  u t  
x , st , y t uU
t 1
 
s.t . A t st  u t   Bt st 1  ut 1   Wt y t  ut 
 ht0  H t ut  Tt x, u  U , t
(5.42)
Et st  u   G t y t  u   m  M t ut  Lt x, u  U , t
t t 0
t
       
ut i   ut j   st ut i   st u t j  , y t uti   y t ut j  i, j , t
U  u  , , u  
1 N
where u(i) is an element of the scenario set U and N denotes the total number of
uncertainty scenarios. Note that additional constraints are introduced to model the non-
anticipativity restriction in the multistage decision-making setting [45]. To be more
specific, if the trajectories of two uncertainty scenarios are the same up to stage t, the
corresponding recourse decisions cannot be distinguished. Using epigraph
reformulation, we can equivalently transform (5.42) into the following Scenario-Tree
Multistage Adaptive Robust Mixed-Integer Linear Program (STMARMILP), as shown
in (5.43).
184
min cx  
x , st , y t , 
   
T
s.t .    d tst ut i   fty t u t i  , i  1, , N 
t 1
 
A s  u    B s  u    W y  u  
t
t t 1
t 1 t
t t i i t t i (5.43)
 h  H t u  i   Tt x, i  1, , N  , t
0
t
t
   
Et st u t i   G t y t u t i   m t0  M t u t i   L t x, i  1, , N  , t
u    u    s  u     s  u    , y  u     y  u    i, j , t
t
i
t
j t
t
i t
t
j t
t
i t
t
j
The above scenario-based problem constitutes an MILP problem, which can be solved
to global optimality by employing the branch-and-cut methods implemented in
optimization solvers like CPLEX and GUROBI. In this sense, obtaining the lower
bound boils down to solving a computationally efficient STMARMILP, in which
critical uncertainty realizations are identified through the proposed solution algorithm.
We quantitatively assess the solution quality of different algorithms using the relative
UB  LB
optimality gap defined by, where UB denotes the upper bound, and LB
0.5 UB  LB 
represents the lower bound obtained via the STMARMILP. Note that this gap is an
indication of solution quality: a small gap implies a near-optimal solution, while a large
gap suggests a significant loss of optimality. Before closing this section, we summarize
the inequality relationship between bounds in the following theorem.
Theorem 5.3 For any specific problem instance of MARMILPs, the following
inequalities (5.44) hold.
v S  v*  vTPB  v ADR (5.44)
where νS, ν*, νTPB, and νADR present the optimal values of STMARMILP, MARMILP,
TARMILP, and the affinely adjustable robust counterpart.
185
Proof. Since the scenario set is a subset of the uncertainty set ( U  U ), the scenario-
tree counterpart STMARMILP is a relaxation of the original multistage ARO problem
by satisfying only a subset of constraints. Hence, the objective value of STMARMILP
provides a lower bound for the original multistage ARO problem (νS ≤ ν*).
In the original MARMILP, the recourse decisions are general functions of uncertainty.
In both the affine control policy and the proposed transformation proximal bundle
algorithm, all or some of the recourse variables are restricted to a fixed functional form
of uncertainty realizations, thus providing upper bounds to the optimal value of the
original multistage ARO problems (ν* ≤ νADR and ν* ≤ νTPB). Additionally, any feasible
solution of the affinely adjustable robust counterpart is also feasible for the TARMILP
due to the fact that state decisions are restricted to affine control policy and control
decisions are general functions of uncertainty in the proposed solution algorithm.
Therefore, we have νADR ≤ νTPB. □
Remark 5.6 The proposed algorithm provides an upper bound on ν* for the following
 
reason. First, the upper bound is based on F z k , instead of the lower piecewise linear
 
approximation Fk z k . Second, although we add the feasibility cuts on-the-fly into
problem (MP) to obtain a candidate solution xˆ k , this candidate solution must be feasible
in order to become a stability center. This is because if it is not feasible, i.e. its
 
corresponding objective value of (FP)  xˆ k  0 , the condition for the serious step
 
(Line 17 of the algorithm pseudocode) cannot be met due to F xˆ k   .
186
5.5 Applications
5.5.1 Application 1: Robust optimal inventory control
In this section, we apply the proposed transformation-proximal bundle algorithm to
robust finite-horizon optimal inventory control problems. Extensive comparisons
between the affine control policy [106], the piecewise affine control policy [396, 405],
and the proposed multi-to-two transformation-based algorithm are made in terms of
solution quality and computational efficiency. All optimization problems are solved
with CPLEX 12.8.0, implemented on a computer with an Intel (R) Core (TM) i7-6700
CPU @ 3.40 GHz and 32 GB RAM. The optimality tolerance for CPLEX 12.8.0 is set
to be 0. The tolerance for expected decrease δtol is set to be 0.1.
Inventory control plays a critical role in improving customer services as well as in
boosting profits. Due to the market fluctuations, customer demands are inevitably
subject to uncertainty [406-408]. These uncertainties are typically revealed sequentially
over the entire time horizon. In this application, we consider a single-item multiperiod
robust optimal inventory control problem under demand uncertainty [381, 409, 410]. In
such a problem, a decision maker needs to serve customer demand as far as possible at
a minimum cost. There are two types of orders, standard orders and express orders, that
can be placed after knowing uncertainty realization at the beginning of each period. A
standard order of product arrives at the end of the time period, while the costlier express
orders arrive immediately. Any excess inventories are stored in a warehouse and incur
the holding cost. If customer demands are backlogged, the backlog cost should be paid.
187
The robust finite-horizon optimal inventory control problem under demand uncertainty
is shown as follows. The objective is to minimize the total cost, which is given in (5.45)
. The total cost includes ordering, holding, and backlog costs incurred over all time
periods. The constraints can be classified into inventory dynamics constraints (5.46),
ordering bound constraints (5.47)-(5.48), and real-valued mapping constraints (5.49).
Uncertainty set for demand is given in (5.50).
c1 xt  ξ t   c2 yt  ξ t  
T
 
min max    (5.45)
xt , yt  , I t   ξU
t 1   cH  I t  ξ    cB   I t  ξ   
t t
  
     
s.t. I t ξt  It 1 ξt 1  xt 1 ξt 1  yt ξt  t ξ U , t   (5.46)
xt  ξt   0, ξ U , t (5.47)
yt  ξt   0, ξ U , t (5.48)
It  ξt  , xt  ξt  , yt  ξt   , t (5.49)
 T
max max 
U  ξ lt  t  ut , t ,  t  2
 
2 
(5.50)
 t 1
where xt is a decision variable for standard order of the product at the beginning of time
period t, yt denotes a decision on express order of the product at the beginning of time
period t, and It is the inventory level at time period t. Moreover, ξt denotes the uncertain
demand at time period t, and ξt=[ξ1′, …, ξt′]′ represents the uncertainty realizations
available up to time period t. T denotes the total length of the time horizon. c1 and c2
represent the unit costs of standard and express orders, respectively. cH and cB are the
unit holding and unit backlogging costs, respectively. In the uncertainty set, the lower
188
and upper bounds of uncertain product demand are denoted by lt and ut, respectively.
Constant  max represents the highest possible level of product demand for each time
period. The operator [.]+ in (5.45) represents max( . ,0), and can be tackled by the
following epigraph reformulations (5.51) and (5.52).
tH  ξt   It  ξt  , tH  ξt   0 (5.51)
tB  ξt    It  ξt  , tB  ξt   0 (5.52)
By employing the variable substitution Iˆt  I t  xt , the above robust finite-horizon
optimal inventory control problem assumes the same formulation as the multistage ARO
problem. As a result, Iˆt is a state decision variable, while xt, yt, ηtH and ηtB are control
variables. The multi-to-two transformation scheme is utilized by applying the affine
control policy to the state decision variables.
In this application, 25 instances are randomly generated to compare the performances
of different control strategies. The number of time periods T is set to be 5. The initial
inventory of the product is assumed to be zero. The unit costs for standard order, express
order, backlog, and holding are chosen randomly following the uniform distributions:
c1~Unif(0, 5), c2~Unif(5, 10), cB ~Unif(0, 10), and cH ~Unif(0, 5). Lower and upper
bounds of the product demand are generated according to the following distributions: lt
~Unif(0, 15) and ut ~Unif(75, 100). Note that the notation of Unif denotes the uniform
distribution. The highest value of product demand  max is set to be 100.
The computational results are summarized in Appendix A. For each problem instance,
UB  LB
the relative gap is calculated as . Note that LB is the lower bound
0.5 UB  LB 
189
obtained using the proposed scenario-tree-based lower bounding technique, so it is the
same for different control policies in a specific instance. Accordingly, a large value of
the relative gap implies a high value of UB, which means a large loss of optimality
incurred by the corresponding control policy. In the application, the affine control policy
suffers from severe suboptimality. Its largest relative gap can reach as high as 53.43%,
and the average relative gap is 25.72%. By contrast, the control policy determined by
the proposed transformation-proximal bundle algorithm outperforms against both the
affine control policy and the piecewise affine control policy consistently across all the
problem instances. More specifically, the control policy resulting from the proposed
algorithm has a relative gap of 1.33% on average, while its highest relative gap is merely
4.27%. Additionally, it can yield near-optimal control strategies for Instances 13, 16, 17
and 21 with relative gaps below 0.02%. In terms of computational time, the robust
optimal inventory control problems using affine control policy and piecewise affine
control policy are more efficient to solve compared to the proposed approach, since they
involve solving only one linear programming problem. However, the proposed
approach solves the robust optimal inventory control problem instances within only 20.8
seconds on average. Note that the average computational times for solving the
reformulated (SUP) and (FP) are 0.41 seconds and 0.25 seconds, respectively. It is worth
noting that the inventory plan is typically made in a large time scale of days and weeks
[408]. Therefore, the computational time difference between affine control policy and
the proposed approach is insignificant. The solution quality in terms of optimality gap
is paramount, as long as the computational time is within a reasonable range. In this
190
sense, it provides an attractive trade-off between solution quality and computational
tractability.
To better understand the inventory management decisions, we present the results of a
single problem instance (Instance 13) determined by the affine control policy and the
control policy determined by the proposed algorithm in Figure 26. In this particular
instance, we show the inventory profiles over the entire time horizon. From the figure,
we can observe that the affine control policy tends to keep much higher inventory levels
of the product than the proposed transformation-proximal bundle method does.
Specifically, the inventory levels at period 3 and period 4 determined by the affine
control policy are more than double those of the proposed control policy, respectively.
As a result, the excessive inventory incurs additional costs, rendering the induced robust
optimal inventory control policy over-conservative.
Figure 26. Inventory profiles determined by different control policies under the worst-
case uncertainty realization.
191
We present the cost breakdowns determined by the affine control policy and the control
policy determined by proposed algorithm in Figure 27. From the pie charts, we can
observe that a major part of the total cost comes from ordering standard delivery of
products for both control policies. Although express orders can more promptly serve the
customer demands, it is too expensive to be adopted by both control policies. Notably,
the percentage of holding cost determined by the affine control policy is 14% higher
than that of the proposed one due to their different inventory levels.
We further compare the proposed algorithmic framework with a data-driven approach
that samples uncertainty scenarios from the uncertainty set following the uniform
distribution [404, 411]. It is worth noting that the data-driven approach, which relies on
scenario sampling, only provides a lower bound of the original multistage ARO problem
due its relaxation. To guarantee a fair comparison, we employ the STMARMILP in the
proposed framework, since it also provides a lower bound. Additionally, the same
number of uncertainty scenarios is used in the data-driven approach as that of
STMARMILP. We present the computational results in Figure 28, where the X-axis
denotes the index of instances and the Y-axis represents the lower bounds of total cost
in multiperiod inventory control. As can be observed from the figure, the proposed
approach compares favorably against the data-driven approach by generating tighter
lower bounds in each instance.
192
Figure 27. Cost breakdowns determined by (a) the affine control policy, (b) the
proposed control policy.
Figure 28. Lower bounds of multi-period inventory cost determined by the proposed
method and the data-driven approach.
To investigate the impacts of the number of uncertainty scenarios on the data-driven
approach, we select Instance 1 and plot lower bound ratios and computational times
193
under different numbers of scenarios in Figure 29. Note that the lower bound ratio is
defined as the ratio between the lower bounds generated by the data-driven approach
and STMARMILP. From the figure, we can see that the computational time of the data-
driven approach increases significantly as the total number of scenarios increases.
Although its corresponding lower bound becomes tighter when using more uncertainty
scenarios, the data-driven approach consumes 27.1 times more computational time than
the proposed method and still generates a less tight lower bound (lower bound ratio is
0.932) when the total number of scenarios is 10,000.
Figure 29. The impacts of the number of uncertainty scenarios on the generated lower
bound of the original multistage ARO problem and computational time in the data-
driven approach.
To further investigate the performance of the proposed algorithm under different
number of time stages, we implement computational experiments with T=10 and T=15.
194
For each value of T, 25 randomly generated robust optimal inventory control instances
are used to evaluate and compare different control policies as before. The computational
results for each problem instance with T=10 and T=15 are presented in Table A2 and
Table A3 of Appendix A, respectively. From these tables, we can see that the solution
qualities of both the affine control policy and piecewise affine control policy deteriorate
remarkably as the number of time stages increases. Specifically, their average relative
gaps soar significantly from 25.72% to 34.88% when the value of T changes from 5 to
15, while the largest relative gap changes from 53.43% to 111.20%. In stark contrast,
the average gap of the proposed control policy is increased by only 0.35%, which
demonstrates its consistent performance across different numbers of time periods.
Notably, the largest relative gap of the proposed solution algorithm becomes 6.29%
from 4.27% when the value of T increases from 5 to 15. It is worth mentioning that the
proposed control policy compares favorably against the other two control policies in all
problem instances. Moreover, the average computational time of the proposed algorithm
increases from 20.8s to 493.2s, which is still a reasonable amount of time for inventory
control problems. Since the obtained solution from the proposed algorithm is a stability
center, it indicates the feasibility of the obtained inventory control policy according to
Remark 5.6.
5.5.2 Application 2: Process network planning
In this section, a multi-period strategic planning problem of process networks is
presented to demonstrate the applicability of the proposed multi-to-two transformation-
based algorithm. Chemical manufacturers often build integrated chemical complexes
195
that consist of interconnected processes and various chemicals [412]. The objective of
the process network planning is to maximize the net present value (NPV) over the
strategic planning horizon. The considered chemical process network, which is shown
in Figure 30, consists of five chemicals (A-E) and three processes (P1, P2, and P3). In
the figure, chemicals A-C represent raw materials, which can be either purchased from
suppliers or produced by certain processes. For example, Chemical C can be either
manufactured by process P3 or purchased from a supplier. Chemicals D and E are final
products, which are sold to the markets. In this application, we consider five time
periods over the 10-year planning horizon, and the duration of each period is two years.
It is assumed that all the processes do not have initial capacities, and they can be
installed at the beginning of the planning horizon. For the demand uncertainty set,
d 0jt  100 ,  jt  0.85 , and   0.6 . The mass balance relationships are given in Table
6.
Figure 30. The schematic of a small-scale process network.
196
Table 6. Mass balance relationships for different processes.
Process Mass balance relationship
Process 1 0.63 A + 0.58 C → E
Process 2 0.64 A → D
Process 3 1.25 B→0.90 C + E
The process network planning determines the purchase levels of feedstocks, sales of
final products, capacity expansion, and production profiles of processes at each time
period, in order to maximize the NPV over the strategic planning horizon. The
multistage ARO model for process network planning under demand uncertainty is
formulated as follows. The objective is to maximize the NPV, which is given in (5.53).
The constraints can be classified into capacity expansion constraints (5.54)-(5.55), mass
balance constraints (5.56), production level constraints (5.57), supply and demand
constraints (5.58)-(5.59), non-negativity constraints (5.60)-(5.64), and integrity
constraints (5.65). The data-driven uncertainty set of demand is defined in (5.66)
following the literature [413]. The “here-and-now” decision is binary decision Yit, while
all the other continuous decisions constitute the “wait-and-see” decisions. Based on
definitions of state and control decision variables, Qit is the adjustable state decision,
while QEit, Wit, Pjt, and Sjt are the adjustable control decisions. A list of indices/sets,
parameters and variables is given in the Nomenclature section.
197

max min  jt  S jt  dt    c1it  QEit  dt    c 2it  Yit
QEit ,Qit ,Yit , dU
Wit , Pjt , S jt  j t i t i t
(5.53)

 c3it Wit  d    c 4 jt  Pjt  d  
t t
i t j t 
s.t. qeitL  Yit  QEit  dt   qeitU  Yit , d U , i, t (5.54)
Qit  dt   Qit 1  dt   QEit  dt  , d U , i, t (5.55)
Pjt  dt     ij Wit  dt   S jt  dt   0, d U , j, t (5.56)

i
Wit  dt   Qit  dt  , d U , i, t (5.57)
Pjt  dt   su jt , d U , j, t (5.58)
S jt  dt   d jt , d U , j, t (5.59)
QEit  dt   0 d U , i, t (5.60)
Qit  dt   0 d U , i, t (5.61)
Pjt  dt   0, d U , j, t (5.62)
S jt  dt   0, d U , j, t (5.63)
Wit  dt   0, d U , i, t (5.64)
Yit  0,1 , i, t (5.65)
 
U  d jt 1   jt  d 0jt  d jt  1   jt  d 0jt j, t ,  1     d
jt
0
jt   d jt  (5.66)
 j t j t 
198
The above multistage adaptive robust process network planning problem is
computationally intractable because all the “wait-and-see” decisions are expressed as
general functions of demand uncertainty. To this end, we employ the multi-to-two
transformation scheme and restrict only state decision Qit to follow affine control policy.
As a result, the above multistage robust process network planning problem is
transformed into a two-stage ARO problem. In contrast, the affine and piecewise affine
control policies restrict all the adjustable decisions Qit, QEit, Wit, Pjt, and Sjt to be affine
and piecewise affine functions of demand uncertainty realizations, respectively.
The computational results are provided in Table 7. In this application, the proposed
solution algorithm increases the NPV by 6.27% (from $121.2MM to $128.8MM)
compared with affine control policy. In terms of solution quality, the proposed solution
algorithm demonstrates a superior performance than the other two approaches and
generates a high-quality solution with a relative gap of 3.36%. Notably, the proposed
computational algorithm can solve this multistage ARO problem within merely 24.2
seconds, which is a reasonable amount of time given its high solution quality. It can be
concluded that the proposed multi-to-two transformation-based algorithm can provide
an attractive trade-off between solution quality and computational tractability. The
optimal design and planning decisions at time period 5 determined by the affine decision
rule method and the proposed solution method are shown in Figure 31 (a) and Figure
31 (b), respectively. In Figure 31, the optimal total capacities are displayed under
operating processes.
199
Table 7. Computational results of different methods in the process network planning
application.
The affine The Transformation-proximal Scenario-tree

control policy piecewise bundle algorithm problem
method control policy
method
Master Subproblem
problem
Binary decisions 15 15 15 130 15
Cont. decisions 12,213 14,588 1,952 181 2,282
Constraints 6,957 13,682 3,079 467 4,367
Max. NPV ($MM) 121.2 121.2 128.8 133.2
CPU time (s) 0.8 1.6 24.2 0.4
To illustrate the optimal capacity expansion activities, we present the capacity profiles
during the entire planning horizon determined by the affine decision rule method and
the developed solution algorithm in Figure 32 (a) and Figure 32 (b), respectively. As
can be observed from Figure 32 (a), Process 3 is expanded at the beginning of time
period 1, and it is further expanded at the second time period in the solution determined
by the affine decision rule approach. By contrast, Process 3 is not selected to be
expanded from time period 2 in the solution determined by the transformation-proximal
bundle algorithm. Additionally, the optimal total capacities of some processes
determined by the two solution methods are different. For example, the optimal total
capacity of Process 2 is 123.4 kt/y at the end of planning horizon determined by the
affine decision rule approach, while the corresponding capacity is 30.6 kt/y larger for
the developed solution method.
200
Figure 31. The optimal design and planning decisions at the end of the planning
horizon determined by (a) the affine decision rule method, and (b) the transformation-
proximal bundle algorithm.
201
bundle algorithm.
The details on revenues and cost breakdown, including fixed investment cost, variable
investment cost, operating cost, and purchase cost, are shown in Figure 33. As can be
observed from the bar charts, the proposed approach generates $27.53MM higher
revenues than the conventional affine decision rule method, which demonstrates that the
proposed approach is less conservative by allowing the control decisions to be fully
202
adjustable to demand uncertainty realizations. From the pie charts in Figure 33, we can
see that more than 40% of the total cost comes from purchasing feedstock for both
network planning solution methods. In addition, the percentage of the variable
investment cost for the developed solution algorithm is 5% higher than that determined
by the affine decision rule method, because optimal process capacities determined by
the proposed transformation-proximal bundle method are larger in its optimal network
structure.
Figure 33. Revenues and cost break down determined by the affine decision rule
method and the transformation-proximal bundle algorithm.
Next, we consider a larger-scale petrochemical process network, which includes 10
chemicals, six processes, four suppliers, and six markets [412]. The detailed network
schematic is depicted in Figure 34, where specific chemical names are listed. Chemicals
A-D represent feedstock, which can be purchased from suppliers or manufactured by
203
certain processes. For instance, Chemical D (Nitric Acid) can be either purchased from
a supplier or produced by Process 3. Chemicals E-J are products, which are sold to
markets for earning revenue. This complex process network has such flexibility that
many manufacturing options are available. For example, Chemical F is a type of product
and can be manufactured by Process 1 and Process 3. Chemical B serves as a feedstock
to Process 3, Process 4, and Process 6. In this case study, we consider four time periods
over the planning horizon, and the duration of each time period is two years. It is
assumed that all processes can be installed at the beginning of time period 1. For the
demand uncertainty set, d 0jt  200 ,  jt  0.8 , and   0.6 .
Figure 34. The schematic of a large-scale petrochemical process network where
chemical names are listed.
204
Unlike the affine decision rule method which restricts all adaptive decisions to be affine
functions of demand uncertainty realizations, the proposed solution algorithm allows
for full adjustability in the local decisions, thus boosting the NPV by 4.43% (increases
from $646.2MM to $674.8MM) for nominal demand uncertainty realizations. This
computational result illustrates that the developed solution algorithm is capable of
generating superior planning decisions compared with conventional decision rule
method. Figure 35 shows more details on NPV results, including revenue and cost break
down in each time period, for the affine decision rule method (represented by ADR for
notation brevity in the figure) and the transformation-proximal bundle algorithm
(denoted by TPB in the figure), respectively. We can observe from Figure 35 that
investment costs occupy 48.6% of total costs in the first time period for both solution
methods. This result can be well explained by the fact that most chemical processes are
expanded or built within the first time period. During the last three time periods, the
majority of costs come from process operation and feedstock purchase. Notably, the
revenues determined by the optimal planning decisions of the proposed solution method
are 3.27%, 5.02%, and 4.48% higher, compared with the affine decision rule approach,
in Period 2, Period 3, and Period 4, respectively.
205
Figure 35. Revenues and cost break down at each time period determined by the affine
decision rule method (denoted by ADR in the figure) and the transformation-proximal
bundle algorithm (denoted by TPB in the figure).
To illustrate the optimal capacity expansion activity, we present the capacity profiles
during the entire planning horizon for the proposed approach in Figure 36. From Figure
36, we can see that a total of five processes are selected to be built at time period 1 in
the optimal process network determined by the proposed approach. By comparing
capacity expansions of different processes, we can conclude that the optimal expansion
frequency of Process 4 is the highest among all processes. This is partially ascribed to
the fact that a total of three products (Chemical J, Chemical H, and Chemical I) are
closely connected to Process 4. Additionally, the optimal total capacity of Process 4
reaches to 180.0 kt/y at the end of the planning horizon.
206
determined the transformation-proximal bundle algorithm.
In Figure 37 (a) and Figure 37 (b), we further present the optimal purchase levels of
feedstock determined by the conventional affine decision rule method and the proposed
solution algorithm, respectively. From the bar charts, we can observe a similar trend for
both solution methods that the purchase level of a feedstock increases as the time period
evolves. This is because the feedstock purchase gets enlarged to accommodate
manufacturing need when process capacities expand over the planning horizon. By
comparing Figure 37 (a) and Figure 37 (b), a notable difference lies in that the total
purchase amount of Chemical B determined by the proposed approach is 6.99% larger
than that of the affine decision rule method. In addition, the proposed solution algorithm
increases the total purchase amount of Chemical A during the entire planning horizon
by 2.43% compared against the conventional decision rule method.
207
Figure 37. Optimal feedstock purchase at each time stage determined by (a) the affine
decision rule method, and (b) the transformation-proximal bundle algorithm.
To take a closer look at the optimal adjustable decisions on product sale, we present the
results for the affine decision rule method and the proposed approach as spider charts
shown in Figure 38 (a) and Figure 38 (b), respectively. Among all the products
(Chemicals E-J), there exist significant increases in sale amount of Chemical G at Period
2, Period 3, and Period 4. The sale level of Chemical G could reasonably be expected
to rise when the corresponding feedstock to Process 6 increases (as shown in Figure 37).
Compared with the optimal sale level of Chemical E in Figure 38 (a), the optimal sale
amount of Chemical E determined by the proposed approach decreases slightly by
2.01% at Period 2, while increases remarkably by 7.55% at Period 3 and by 7.55% at
Period 4.
208
Figure 38. Spider charts showing optimal sale quantities (kt/y) of final products at
each time stage determined by (a) the affine decision rule method, and (b) the
transformation-proximal bundle algorithm.
5.6 Summary
In this work, a novel transformation-proximal bundle algorithmic framework was
proposed for solving a broad class of MARMILP problems efficiently. We first
proposed a multi-to-two transformation scheme, in which only state decision variables
were restricted to be affine functions. By employing the proposed scheme, the original
multi-stage ARO problem was proved to be transformed into an equivalent two-stage
ARO problem. The proximal bundle algorithm was further developed as an efficient
global optimization algorithm of the resulting two-stage ARO problem. Since the local
decisions were exempt from the affine decision rule restriction, the proposed solution
algorithm sacrificed less optimality for the computational tractability compared with
conventional decision rule methods. The computational results showed that the
209
proposed transformation-proximal bundle algorithm significantly outperformed the
conventional solution methods in terms of solution quality.
5.7 Appendix: Tables of computational results in Application 1
Table 8. Computational performances of different solution algorithms in the
multistage robust inventory control problem under demand uncertainty for T=5.
Instance The affine control The piecewise affine Transformation-

No. policy method control policy method proximal bundle algorithm
Time Relative Time Relative Time Relative
(s) Gap (%) (s) Gap (%) (s) Gap (%)
1 0.2 53.43 0.2 53.43 14.4 4.14
2 0.2 12.67 0.2 12.67 24 0.25
3 0.2 20.22 0.2 20.22 12.2 0.37
4 0.2 29.98 0.2 29.98 18.8 4.27
5 0.2 32.70 0.2 32.70 18 2.65
6 0.2 14.63 0.2 14.63 17.9 2.54
7 0.2 35.95 0.2 35.95 71.1 0.26
8 0.2 6.84 0.2 6.84 20.7 0.04
9 0.2 7.56 0.2 7.56 16.9 1.14
10 0.2 37.58 0.2 37.58 14.7 3.49
11 0.2 33.69 0.2 33.69 15.4 0.33
12 0.2 32.72 0.2 32.72 19.5 0.26
13 0.2 18.88 0.2 18.88 17.7 0.01
14 0.2 36.89 0.2 36.89 17.9 0.67
15 0.3 26.60 0.2 26.60 11.8 1.96
16 0.2 9.27 0.2 9.27 13.1 0.02
17 0.2 42.24 0.2 42.24 20.6 0.02
18 0.2 9.27 0.2 9.27 42.7 0.07
19 0.2 35.00 0.2 35.00 19.1 0.28
20 0.3 27.00 0.2 27.00 13.4 3.70
21 0.2 36.82 0.2 36.82 24.1 0.02
22 0.2 18.57 0.3 18.57 15.8 3.63
23 0.2 25.71 0.2 25.71 21.5 0.46
24 0.2 2.50 0.2 2.50 17.6 0.34
25 0.2 36.27 0.2 36.27 21.3 2.33
210

Time Relative Time Relative Time Relative
(s) Gap (%) (s) Gap (%) (s) Gap (%)
1 0.3 12.21 0.3 12.21 106.9 0.00
2 0.2 94.38 0.3 94.38 49.2 5.61
3 0.3 74.34 0.3 74.34 64.2 2.14
4 0.3 27.81 0.3 27.81 45.4 0.16
5 0.3 64.86 0.3 64.86 28.4 3.28
6 0.4 2.29 0.4 2.29 95.6 0.16
7 0.3 45.81 0.3 45.81 275.5 4.03
8 0.3 16.86 0.3 16.86 42.6 0.55
9 0.2 12.83 0.3 12.83 109.3 3.17
10 0.3 59.16 0.4 59.16 78.9 1.88
11 0.2 55.61 0.3 55.61 55.7 2.32
12 0.3 50.81 0.3 50.81 107.3 1.12
13 0.3 99.36 0.4 99.36 31.6 4.31
14 0.3 12.52 0.3 12.52 112.7 2.22
15 0.3 12.94 0.4 12.94 21.7 0.17
16 0.3 14.36 0.4 14.36 52.6 0.15
17 0.3 67.94 0.3 67.94 261.7 4.42
18 0.3 19.73 0.3 19.73 52.4 1.15
19 0.3 49.75 0.3 49.75 92.9 2.64
20 0.3 14.82 0.3 14.82 33.7 0.24
21 0.3 2.28 0.3 2.28 26.0 0.22
22 0.3 17.29 0.3 17.29 73.9 0.18
23 0.3 72.57 0.4 72.57 55.1 0.73
24 0.3 20.11 0.3 20.11 188.1 0.67
25 0.3 1.02 0.3 1.02 21.6 0.03
211

CPU Relative CPU Relative CPU Relative
(s) Gap (%) (s) Gap (%) (s) Gap (%)
1 0.6 33.46 1.0 33.46 188.8 1.30
2 0.7 85.11 1.1 85.11 70.5 4.01
3 0.7 111.20 0.9 111.20 36.8 6.04
4 0.6 8.85 0.8 8.85 383.8 0.21
5 0.6 18.29 1.1 18.29 47.2 0.19
6 0.6 4.98 1.9 4.98 56.7 1.87
7 0.6 69.21 1.0 69.21 127.7 2.92
8 0.6 6.22 0.7 6.22 457.3 1.29
9 0.6 19.06 0.9 19.06 32.7 0.21
10 0.6 42.42 0.7 42.42 103.9 1.45
11 0.8 27.12 1.0 27.12 4450.5 1.34
12 0.7 62.52 1.3 62.52 503.9 4.06
13 0.6 26.68 1.0 26.68 1061.6 0.84
14 0.8 39.73 0.9 39.73 57.3 1.03
15 0.6 43.04 0.9 43.04 137.5 6.29
16 0.7 26.40 1.0 26.40 569.4 1.67
17 0.8 43.50 1.0 43.50 84.7 1.87
18 0.6 17.75 1.0 17.75 42.5 0.08
19 0.6 25.49 0.7 25.49 959.6 1.31
20 0.7 24.17 1.0 24.17 1853.7 2.18
21 0.6 35.44 0.8 35.44 64.9 0.43
22 0.8 50.92 1.0 50.92 203.4 0.53
23 0.7 4.68 1.1 4.68 639.2 0.60
24 0.6 9.10 1.0 9.10 117.3 0.22
25 0.7 36.62 1.1 36.62 78.7 0.06
5.8 Nomenclature
Robust optimal inventory control
Sets/indices
t index of time periods
Parameters
c1 unit cost of standard order
c2 unit cost of express order
212
cB unit backlog cost
cH unit holding cost
lt lower bound of product demand at the beginning time period t
ut upper bound of product demand at the beginning time period t
ξt uncertain product demand at the beginning time period t
 max maximum level of product demand
It inventories of products at time period t
xt stand order of product at the beginning time period t
yt express order of product at the beginning time period t
Process network planning
Sets/indices
I set of processes indexed by i
J set of chemicals indexed by j or k
T set of time periods indexed by t or n
Parameters
c1it variable investment cost for process i in time period t
c2it fixed investment cost for process i in time period t
c3it unit operating cost for process i in time period t
c4it purchase price of chemical j in time period t
djt demand of chemical j in time period t
qeitL lower bound for capacity expansion of process i in time period t
qeitU upper bound for capacity expansion of process i in time period t

213
sujt supply of chemical j in time period t
vjt sale price of chemical j in time period t
κij mass balance coefficient for chemical j in process i
Binary variables
Yit binary variable that indicates whether process i is chosen for expansion
in time period t
Pjt purchase amount of chemical j in time period t
Qit total capacity of process i in time period t
QEit capacity expansion of process i in time period t
Sjt sale amount of chemical j in time period t
Wit operation level of process i in time period t
214
CHAPTER 6
CONCLUSIONS
The data-driven optimization under uncertainty has been investigated with emphasis on
four main aspects, namely the two-stage adaptive distributionally robust optimization,
deep-learning-based ambiguous chance constrained optimization, a learning-while-
optimizing framework, and the algorithm design for large-scale multistage robust
optimization problems, in this dissertation. A series of novel contributions are made on
data-driven optimization modeling frameworks, efficient solution algorithms, along
with applications. We believe that the research in this dissertation lays a solid foundation
for future studies in this area. Additionally, the proposed frameworks and solution
algorithms are general enough to deal with a variety of applications on optimization
under uncertainty, such as those for supply chain management, energy systems, and
process control. The summary of the dissertation as well as future research directions
are provided in the following.
We propose a novel data-driven Wasserstein distributionally robust optimization model
for hedging against uncertainty in the optimal biomass with agricultural waste-to-energy
network design under uncertainty. Instead of assuming perfect knowledge of probability
distribution for uncertain parameters, we construct a data-driven ambiguity set of
candidate distributions based on the Wasserstein metric, which is utilized to quantify
their distances from the data-based empirical distribution. Equipped with this ambiguity
set, the two-stage distributionally robust optimization model not only accommodates
the sequential decision making at design and operational stages, but also hedges against
the distributional ambiguity arising from finite amount of uncertainty data. A solution
215
algorithm is further developed to solve the resulting two-stage distributionally robust
mixed-integer nonlinear program. To demonstrate the effectiveness of the proposed
approach, we present a case study of a biomass with agricultural waste-to-energy
network including 216 technologies and 172 compounds. Computational results show
that the data-driven Wasserstein distributionally robust optimization approach has a
better out-of-sample performance in terms of a 5.7% lower average cost and a 37.1%
smaller cost standard deviation compared with the conventional stochastic
programming method.
We propose a novel deep learning based ambiguous joint chance constrained ED
framework for high penetration of renewable energy. By leveraging a deep GAN, an f-
divergence-based ambiguity set of wind power distributions is constructed as a ball in
the probability space centered at the distribution induced by a generator network.
Specifically, wind power data are utilized to train f-GAN, in which its discriminator
network criticizes the performance of the generator network in terms of f-divergence.
Consequently, the proposed framework closely links the training objective of deep
learning with the characterization of ambiguity set via the same type of divergence.
Additionally, the GAN is well suited for capturing the complicated temporal and spatial
correlations among renewable energy sources. Based upon this ambiguity set, a data-
driven joint chance constrained ED model is developed to hedge against distributional
uncertainty present in multiple constraints regarding wind power utilization. To
facilitate its solution process, the resulting distributionally robust chance constraints are
equivalently reformulated as ambiguity-free chance constraints, which are further
tackled using a scenario approach. This scenario approach leverages the sampling
216
efficiency of the generator network due to the feedforward nature of neural networks.
Theoretical a priori bound on the required number of synthetic wind power data
generated by f-GAN is explicitly derived for the multi-period ED problem to guarantee
a predefined risk level. By exploiting the ED problem structure, a prescreening technique
is employed to greatly boost both computational and memory efficiencies. The
effectiveness and scalability of the proposed approach are demonstrated through the six-
bus and IEEE 118-bus systems. Computational results show that the proposed approach
is more cost-effective compared against the conventional distributionally robust chance
constrained optimization method.
We investigate the problem of designing data-driven stochastic MPC for linear time-
invariant systems under additive stochastic disturbance, whose probability distribution
is unknown but can be partially inferred from data. We propose a novel online learning-
based risk-averse stochastic MPC framework in which CVaR constraints on system
states are required to hold for a family of distributions called an ambiguity set. The
ambiguity set is constructed from disturbance data by leveraging a Dirichlet process
mixture model that is self-adaptive to the underlying data structure and complexity.
Specifically, the structural property of multimodality is exploited, so that the first and
second-order moment information of each mixture component is incorporated into the
ambiguity set. A novel constraint tightening strategy is then developed based on an
equivalent reformulation of distributionally robust CVaR constraints over the proposed
ambiguity set. As more data are gathered during the runtime of controller, the ambiguity
set is updated online using real-time disturbance data, which enables the risk-averse
stochastic MPC to cope with time-varying disturbance distributions. The employed
217
online variational inference algorithm obviates learning all collected data from scratch,
and therefore the proposed MPC is endowed with the guaranteed computational
complexity of online learning. The guarantees on recursive feasibility and closed-loop
stability of the proposed MPC are established via a safe update scheme. Numerical
examples are used to illustrate the effectiveness and advantages of the proposed MPC.
We develop a novel transformation-proximal bundle algorithm for MARMILPs. By
partitioning recourse decisions into state and control decisions, the proposed algorithm
applies affine control policy only to state decisions and allows control decisions to be
fully adaptive to uncertainty. In this way, the MARMILP is proved to be transformed
into an equivalent two-stage ARO problem. The proposed multi-to-two transformation
remains valid for other types of causal control policies besides the affine one.
Importantly, this transformation scheme is general enough to be employed with any
two-stage ARO solution algorithms for MARMILPs, thus opening a new avenue for a
variety of multistage ARO solution algorithms. The proximal bundle method is
developed for the resulting two-stage problem. We theoretically show finite
convergence of the proposed algorithm with any positive tolerance. To quantitatively
assess solution quality, we develop a scenario-tree-based lower bounding technique.
The proposed generic approach can be applied to a variety of control problems, such as
constrained robust optimal control. A robust optimal inventory control application is
presented to demonstrate its effectiveness and computational scalability. In this
application, the affine disturbance-feedback control policy suffers from a severe
suboptimality with an average gap of 34.88%, while the proposed algorithm generates
near-optimal solutions with an average gap of merely 1.68%.
218
The future research directions include closed-loop data-driven optimization and the
data-driven optimization framework incorporating “prior” knowledge. The framework
of data-driven optimization under uncertainty could be considered as a “hybrid” system
that integrates the data-driven system based on machine learning to extract useful and
relevant information from data, and the model-based system based on the mathematical
programming to derive the optimal decisions from the information. Existing data-driven
optimization approaches adopt a sequential and open-loop scheme, which could be
further improved by introducing feedback steps from the model-based system to data-
driven system. A “closed-loop” data-driven optimization paradigm that explores the
information feedback to fully couple upper-stream machine learning and downstream
mathematical programming could be a more effective and rigorous approach. In
addition to uncertainty data, some available domain-specific knowledge or “prior”
knowledge could serve as another informative input to the data-driven system. Relying
solely on the data to develop the uncertainty model could unfavorably influence the
downstream mathematical programming. The prior knowledge depicts what the
decision maker knows about the uncertainty, and it can come in different forms. For
example, the prior knowledge could be the structural information of probability
distributions, upper and lower bounds of uncertain parameters or certain correlation
relationship among uncertainties. Incorporating such “prior” knowledge in the data-
driven optimization framework could be substantially useful and provides more reliable
results in the face of messy data.
Another future research direction on data augmentation driven optimization under
uncertainty deserves more efforts. Imbalanced volume of different uncertainty data
219
sources and a small data regime beget new challenges to the existing data-driven
decision making under uncertainty frameworks. The imbalance of datasets would lead
to deteriorating performance of data-driven optimization. Specifically, this data
imbalance has two main adverse effects. First, decision makers would lose much
information imbedded within the majority data class, if they synthesize both minority
and majority uncertainty data through down-sampling. Based on the research works in
this thesis, the inefficient use of uncertainty data information would negatively
influence the solution quality of data-driven optimization under uncertainty. Second, if
one builds data-driven uncertainty models for the minority and majority datasets
separately, correlation information between uncertainty data sources from different
subsystems is discarded. Additionally, the existing data-driven robust optimization
methods tend to suffer severely from the issue of “small data”, which has a direct impact
on uncertainty set construction. The small data regime could under-fit machine learning
models, thereby comprising the performance of data-driven optimization. Given more
and more brand new systems employed, this type of small data regime can be frequently
encountered in data-driven decision making under uncertainty.
The integration of data-driven robust optimization with data augmentation is a
promising general framework for coping with the limited amount of uncertainty data
and potentially improves upon the generalization performance of machine learning in
data-driven decision making. The key idea of a tentative methodology is to generate
uncertainty data from the existing uncertainty data. As these newly generated data are
totally unseen, an uncertainty model built from augmented data is more likely to have a
superior generalization property. Moreover, the data augmentation techniques can
220
increase the volume of minority data class, and therefore is well suited for addressing
the imbalanced uncertainty data issue. In the literature of machine learning, data
augmentation is becoming increasingly popular, and has witnessed various successful
applications [414, 415]. First, data augmentation is useful for data imbalance during
training. Second, real data could convey some private information and therefore using
artificial generated data is capable of protecting data privacy. One potential method for
data augmentation to use is resampling techniques. The resampled uncertainty data are
used to augment the minority data class and to make the overall dataset balanced.
Another promising way is employing deep learning, especially deep generative models,
to generate synthetic uncertainty data for the purpose of data augmentation. The
complicated and unseen data patterns can be potentially captured by the powerful deep
learning techniques, and are seamlessly incorporated into robust optimization. To be
more specific, a data-driven uncertainty set would be constructed from a hybrid use of
the majority dataset and the augmented minority dataset. Then, this data-driven
uncertainty set could be further integrated into dynamic robust optimization, which
holds a great promise to witness various applications in data-driven design, operation,
and control for uncertain systems.
221
REFERENCES
[1] L. T. Biegler and I. E. Grossmann, "Retrospective on optimization," Comput.

Chem. Eng., vol. 28, no. 8, pp. 1169-1192, 2004, doi:
https://doi.org/10.1016/j.compchemeng.2003.11.003.
[2] I. E. Grossmann and L. T. Biegler, "Part II. Future perspective on optimization,"
Comput. Chem. Eng., vol. 28, no. 8, pp. 1193-1218, 2004, doi:
[3] V. Sakizlis, J. D. Perkins, and E. N. Pistikopoulos, "Recent advances in
optimization-based simultaneous process and control design," Comput. Chem.
Eng., vol. 28, no. 10, pp. 2069-2086, 2004, doi:
[4] N. V. Sahinidis, "Optimization under uncertainty: State-of-the-art and
opportunities," Comput. Chem. Eng., vol. 28, no. 6–7, pp. 971-983, 2004.
[Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0098135403002369.
[5] M. L. Liu and N. V. Sahinidis, "Optimization in Process Planning under
Uncertainty," Ind. Eng. Chem. Res., vol. 35, no. 11, pp. 4154-4165, 1996/01/01
1996, doi: 10.1021/ie9504516.
[6] J. Acevedo and E. N. Pistikopoulos, "Stochastic optimization based algorithms
for process synthesis under uncertainty," Comput. Chem. Eng., vol. 22, no. 4, pp.
647-671, 1998, doi: http://dx.doi.org/10.1016/S0098-1354(97)00234-2.
[7] Z. Li and M. G. Ierapetritou, "Process scheduling under uncertainty: Review and
challenges," Comput. Chem. Eng., vol. 32, no. 4–5, pp. 715-727, 2008, doi:
http://dx.doi.org/10.1016/j.compchemeng.2007.03.001.
[8] A. Ben-Tal and A. Nemirovski, "Robust optimization – methodology and
applications," Math. Program., journal article vol. 92, no. 3, pp. 453-480, 2002,
doi: 10.1007/s101070100286.
[9] I. E. Grossmann, R. M. Apap, B. A. Calfa, P. García-Herreros, and Q. Zhang,
"Recent advances in mathematical programming techniques for the optimization
of process systems under uncertainty," Comput. Chem. Eng., vol. 91, pp. 3-14,
2016, doi: http://dx.doi.org/10.1016/j.compchemeng.2016.03.002.
[10] E. N. Pistikopoulos, "Uncertainty in process design and operations," Comput.
Chem. Eng., vol. 19, pp. 553-563, 1995, doi: http://dx.doi.org/10.1016/0098-
1354(95)87094-6.
[11] Y. Chen, Z. H. Yuan, and B. Z. Chen, "Process optimization with consideration
of uncertainties-An overview," Chin. J. Chem. Eng., vol. 26, no. 8, pp. 1700-
1706, Aug 2018, doi: 10.1016/j.cjche.2017.09.010.
[12] S. John Walker, "Big data: A revolution that will transform how we live, work,
and think," ed: Taylor & Francis, 2014.
[13] S. Yin and O. Kaynak, "Big Data for Modern Industry: Challenges and Trends,"
Proceedings of the IEEE, vol. 103, no. 2, pp. 143-146, 2015, doi:
10.1109/JPROC.2015.2388958.
222
[14] S. J. Qin, "Process data analytics in the era of big data," AIChE J., vol. 60, no.
9, pp. 3092-3100, 2014. [Online]. Available:
http://dx.doi.org/10.1002/aic.14523.
[15] V. Venkatasubramanian, "DROWNING IN DATA: Informatics and modeling
challenges in a data-rich networked world," AIChE J., vol. 55, no. 1, pp. 2-8,
2009. [Online]. Available: http://dx.doi.org/10.1002/aic.11756.
[16] J. Li et al., "Data-driven mathematical modeling and global optimization
framework for entire petrochemical planning operations," AIChE J., vol. 62, no.
9, pp. 3020-3040, 2016, doi: 10.1002/aic.15220.
[17] S. Yin, X. Li, H. Gao, and O. Kaynak, "Data-Based Techniques Focused on
Modern Industry: An Overview," IEEE Transactions on Industrial Electronics,
vol. 62, no. 1, pp. 657-667, 2015, doi: 10.1109/TIE.2014.2308133.
[18] V. Venkatasubramanian, "The promise of artificial intelligence in chemical
engineering: Is it here, finally?," AIChE J., vol. 65, no. 2, pp. 466-478, 2019, doi:
doi:10.1002/aic.16489.
[19] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning. MIT
press Cambridge, 2016.
[20] I. E. Grossmann, "Advances in mathematical programming models for
enterprise-wide optimization," Comput. Chem. Eng., vol. 47, pp. 2-18, 2012.
[Online]. Available:
[21] M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and
prospects," Science, vol. 349, no. 6245, pp. 255-260, 2015. [Online]. Available:
http://science.sciencemag.org/content/sci/349/6245/255.full.pdf.
[22] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, p. 436,
2015, doi: 10.1038/nature14539.
[23] D. Bertsimas, V. Gupta, and N. Kallus, "Data-driven robust optimization," arXiv
preprint arXiv:1401.0212, 2013.
[24] D. Bertsimas and A. Thiele, "Robust and data-driven optimization: Modern
decision-making under uncertainty," INFORMS tutorials in operations research:
models, methods, and applications for innovative decision making, pp. 95-122,
2006.
[25] B. A. Calfa, A. Agarwal, S. J. Bury, J. M. Wassick, and I. E. Grossmann, "Data-
Driven Simulation and Optimization Approaches To Incorporate Production
Variability in Sales and Operations Planning," Ind. Eng. Chem. Res., vol. 54, no.
29, pp. 7261-7272, 2015. [Online]. Available:
http://dx.doi.org/10.1021/acs.iecr.5b01273.
[26] B. A. Calfa, A. Agarwal, I. E. Grossmann, and J. M. Wassick, "Data-driven
multi-stage scenario tree generation via statistical property and distribution
matching," Comput. Chem. Eng., vol. 68, pp. 7-23, 2014. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S009813541400129X.
[27] B. A. Calfa, I. E. Grossmann, A. Agarwal, S. J. Bury, and J. M. Wassick, "Data-
driven individual and joint chance-constrained optimization via kernel
smoothing," Comput. Chem. Eng., vol. 78, pp. 51-69, Jul 2015, doi:
10.1016/j.compchemeng.2015.04.012.
223
[28] T. Campbell and J. P. How, "Bayesian nonparametric set construction for robust
optimization," in American Control Conference (ACC), 2015, 1-3 July 2015
2015, pp. 4216-4221, doi: 10.1109/ACC.2015.7171991.
[29] R. Jiang and Y. Guan, "Data-driven chance constrained stochastic program,"
Mathematical Programming, journal article vol. 158, no. 1, pp. 291-327, 2015,
doi: 10.1007/s10107-015-0929-7.
[30] R. Levi, G. Perakis, and J. Uichanco, "The data-driven newsvendor problem:
new bounds and insights," Operations Research, vol. 63, no. 6, pp. 1294-1306,
2015.
[31] Y. Zhang, Y. Feng, and G. Rong, "Data-driven chance constrained and robust
optimization under matrix uncertainty," Ind. Eng. Chem. Res., vol. 55, no. 21,
pp. 6145-6160, 2016. [Online]. Available:
http://dx.doi.org/10.1021/acs.iecr.5b04973.
[32] C. Ning and F. You, "Data-driven adaptive nested robust optimization: General
modeling framework and efficient computational algorithm for decision making
under uncertainty," AIChE J., vol. 63, no. 9, pp. 3790-3817, 2017, doi:
10.1002/aic.15717.
[33] C. Ning and F. You, "Data-Driven Adaptive Robust Unit Commitment under
Wind Power Uncertainty: A Bayesian Nonparametric Approach," IEEE Trans.
Power Syst., vol. (doi: 10.1109/TPWRS.2019.2891057), 2019, doi:
10.1109/TPWRS.2019.2891057.
[34] C. Ning and F. You, "Data-driven decision making under uncertainty integrating
robust optimization with principal component analysis and kernel smoothing
methods," Comput. Chem. Eng., vol. 112, pp. 190-210, 2018, doi:
[35] C. Shang and F. You, "A data-driven robust optimization approach to stochastic
model predictive control," Journal of Process Control, vol. 75, pp. 24-39, 2019.
[36] C. Shang, W.-H. Chen, A. D. Stroock, and F. You, "Robust Model Predictive
Control of Irrigation Systems with Active Uncertainty Learning and Data
Analytics," arXiv preprint arXiv:1810.05947, 2018.
[37] W. C. Rooney and L. T. Biegler, "Optimal process design with model parameter
uncertainty and process variability," AIChE J., vol. 49, no. 2, pp. 438-449, Feb
2003, doi: 10.1002/aic.690490214.
[38] P. M. Verderame, J. A. Elia, J. Li, and C. A. Floudas, "Planning and Scheduling
under Uncertainty: A Review Across Multiple Sectors," Ind. Eng. Chem. Res.,
vol. 49, no. 9, pp. 3993-4017, May 2010, doi: 10.1021/ie902009k.
[39] A. Mesbah, "Stochastic Model Predictive Control AN OVERVIEW AND
PERSPECTIVES FOR FUTURE RESEARCH," IEEE Control Systems
Magazine, vol. 36, no. 6, pp. 30-44, Dec 2016, doi: 10.1109/mcs.2016.2602087.
[40] A. Krieger and E. N. Pistikopoulos, "Model predictive control of anesthesia
under uncertainty," Comput. Chem. Eng., vol. 71, pp. 699-707, Dec 2014, doi:
10.1016/j.compchemeng.2014.07.025.
[41] D. W. Griffith, V. M. Zavala, and L. T. Biegler, "Robustly stable economic
NMPC for non-dissipative stage costs," Journal of Process Control, vol. 57, pp.
116-126, Sep 2017, doi: 10.1016/j.jprocont.2017.06.016.
224
[42] T. Y. Chiu and P. D. Christofides, "Robust control of particulate processes using
uncertain population balances," AIChE J., vol. 46, no. 2, pp. 266-280, Feb 2000,
doi: 10.1002/aic.690460207.
[43] N. V. Sahinidis, "Optimization under uncertainty: state-of-the-art and
opportunities," Comput. Chem. Eng., vol. 28, no. 6-7, pp. 971-983, Jun 2004,
doi: 10.1016/j.compchemeng.2003.09.017.
[44] I. E. Grossmann, R. M. Apap, B. A. Calfa, P. Garcia-Herreros, and Q. Zhang,
"Recent advances in mathematical programming techniques for the optimization
of process systems under uncertainty," Comput. Chem. Eng., vol. 91, pp. 3-14,
Aug 2016, doi: 10.1016/j.compchemeng.2016.03.002.
[45] J. R. Birge and F. Louveaux, Introduction to stochastic programming. Springer
Science & Business Media, 2011.
[46] J. R. Birge, "State-of-the-Art-Survey—Stochastic Programming: Computation
and Applications," INFORMS J. Comput., vol. 9, no. 2, pp. 111-133, 1997, doi:
10.1287/ijoc.9.2.111.
[47] A. Gupta and C. D. Maranas, "Managing demand uncertainty in supply chain
planning," Comput. Chem. Eng., vol. 27, no. 8, pp. 1219-1227, 2003, doi:
http://dx.doi.org/10.1016/S0098-1354(03)00048-6.
[48] R. M. Vanslyke and R. Wets, "L-SHAPED LINEAR PROGRAMS WITH
APPLICATIONS TO OPTIMAL CONTROL AND STOCHASTIC
PROGRAMMING," SIAM Journal on Applied Mathematics, vol. 17, no. 4, pp.
638-+, 1969, doi: 10.1137/0117061.
[49] G. Laporte and F. V. Louveaux, "THE INTEGER L-SHAPED METHOD FOR
STOCHASTIC INTEGER PROGRAMS WITH COMPLETE RECOURSE,"
Oper. Res. Lett., vol. 13, no. 3, pp. 133-142, Apr 1993, doi: 10.1016/0167-
6377(93)90002-x.
[50] F. Oliveira, V. Gupta, S. Hamacher, and I. E. Grossmann, "A Lagrangean
decomposition approach for oil supply chain investment planning under
uncertainty with risk considerations," Comput. Chem. Eng., vol. 50, pp. 184-195,
Mar 2013, doi: 10.1016/j.compchemeng.2012.10.012.
[51] S. Küçükyavuz and S. Sen, "An introduction to two-stage stochastic mixed-
integer programming," in Leading Developments from INFORMS Communities:
INFORMS, 2017, pp. 1-27.
[52] C. C. Caroe and R. Schultz, "Dual decomposition in stochastic integer
programming," Oper. Res. Lett., vol. 24, no. 1-2, pp. 37-45, Feb-Mar 1999, doi:
10.1016/s0167-6377(98)00050-9.
[53] S. Ahmed, M. Tawarmalani, and N. V. Sahinidis, "A finite branch-and-bound
algorithm for two-stage stochastic integer programs," Math. Program., vol. 100,
no. 2, pp. 355-377, Jun 2004, doi: 10.1007/s10107-003-0475-6.
[54] C. Li and I. E. Grossmann, "An improved L-shaped method for two-stage
convex 0–1 mixed integer nonlinear stochastic programs," Comput. Chem. Eng.,
vol. 112, pp. 165-179, 2018/04/06/ 2018, doi:
225
[55] M. G. Ierapetritou and E. N. Pistikopoulos, "DESIGN OF MULTIPRODUCT
BATCH PLANTS WITH UNCERTAIN DEMANDS," Comput. Chem. Eng.,
vol. 19, pp. S627-S632, 1995, doi: 10.1016/0098-1354(95)00130-t.
[56] A. Bonfill, M. Bagajewicz, A. Espuña, and L. Puigjaner, "Risk Management in
the Scheduling of Batch Plants under Uncertain Market Demand," Ind. Eng.
Chem. Res., vol. 43, no. 3, pp. 741-750, 2004, doi: 10.1021/ie030529f.
[57] A. Bonfill, A. Espuña, and L. Puigjaner, "Addressing Robustness in Scheduling
Batch Processes with Uncertain Operation Times," Ind. Eng. Chem. Res., vol.
44, no. 5, pp. 1524-1534, 2005, doi: 10.1021/ie049732g.
[58] J. Steimel and S. Engell, "Conceptual design and optimization of chemical
processes under uncertainty by two-stage programming," Comput. Chem. Eng.,
vol. 81, pp. 200-217, Oct 2015, doi: 10.1016/j.compchemeng.2015.05.016.
[59] P. Liu, E. N. Pistikopoulos, and Z. Li, "Decomposition Based Stochastic
Programming Approach for Polygeneration Energy Systems Design under
Uncertainty," Ind. Eng. Chem. Res., vol. 49, no. 7, pp. 3295-3305, 2010, doi:
10.1021/ie901490g.
[60] X. Peng, T. W. Root, and C. T. Maravelias, "Optimization-based process
synthesis under seasonal and daily variability: Application to concentrating solar
power," AIChE J., vol. (doi:10.1002/aic.16458), no. 0, doi:
doi:10.1002/aic.16458.
[61] J. Y. Gao and F. Q. You, "Deciphering and handling uncertainty in shale gas
supply chain design and optimization: Novel modeling framework and
computationally efficient solution algorithm," AIChE J., vol. 61, no. 11, pp.
3739-3755, Nov 2015, doi: 10.1002/aic.15032.
[62] F. Q. You, J. M. Wassick, and I. E. Grossmann, "Risk Management for a Global
Supply Chain Planning Under Uncertainty: Models and Algorithms," AIChE J.,
vol. 55, no. 4, pp. 931-946, Apr 2009, doi: 10.1002/aic.11721.
[63] B. H. Gebreslassie, Y. Yao, and F. You, "Design under uncertainty of
hydrocarbon biorefinery supply chains: Multiobjective stochastic programming
models, decomposition algorithm, and a Comparison between CVaR and
downside risk," AIChE J., vol. 58, no. 7, pp. 2155-2179, 2012, doi:
10.1002/aic.13844.
[64] L. J. Zeballos, C. A. Méndez, and A. P. Barbosa-Povoa, "Design and Planning
of Closed-Loop Supply Chains: A Risk-Averse Multistage Stochastic
Approach," Ind. Eng. Chem. Res., vol. 55, no. 21, pp. 6236-6249, 2016, doi:
10.1021/acs.iecr.5b03647.
[65] X. Li, A. Tomasgard, and P. I. Barton, "Nonconvex Generalized Benders
Decomposition for Stochastic Separable Mixed-Integer Nonlinear Programs," J.
Optim. Theory Appl., vol. 151, no. 3, pp. 425-454, Dec 2011, doi:
10.1007/s10957-011-9888-1.
[66] V. Gupta and I. E. Grossmann, "A new decomposition algorithm for multistage
stochastic programs with endogenous uncertainties," Comput. Chem. Eng., vol.
62, pp. 62-79, Mar 2014, doi: 10.1016/j.compchemeng.2013.11.011.
226
[67] V. Goel and I. E. Grossmann, "A Class of stochastic programs with decision
dependent uncertainty," Math. Program., vol. 108, no. 2-3, pp. 355-394, Jan
2007, doi: 10.1007/s10107-006-0715-7.
[68] A. Prékopa, "Stochastic programming, volume 324 of Mathematics and its
Applications," ed: Kluwer Academic Publishers Group, Dordrecht, 1995.
[69] A. Charnes and W. W. Cooper, "CHANCE-CONSTRAINED
PROGRAMMING," Manage. Sci., vol. 6, no. 1, pp. 73-79, 1959, doi:
10.1287/mnsc.6.1.73.
[70] P. Li, H. Arellano-Garcia, and G. Wozny, "Chance constrained programming
approach to process optimization under uncertainty," Comput. Chem. Eng., vol.
32, no. 1, pp. 25-45, 2008/01/01/ 2008, doi:
[71] B. L. Miller and H. M. Wagner, "CHANCE CONSTRAINED
PROGRAMMING WITH JOINT CONSTRAINTS," Oper. Res., vol. 13, no. 6,
pp. 930-&, 1965, doi: 10.1287/opre.13.6.930.
[72] X. Liu, S. Kucukyavuz, and J. Luedtke, "Decomposition algorithms for two-
stage chance-constrained programs," Math. Program., vol. 157, no. 1, pp. 219-
243, May 2016, doi: 10.1007/s10107-014-0832-7.
[73] M. A. Quddus, S. Chowdhury, M. Marufuzzaman, F. Yu, and L. K. Bian, "A
two-stage chance-constrained stochastic programming model for a bio-fuel
supply chain network," International Journal of Production Economics, vol. 195,
pp. 27-44, Jan 2018, doi: 10.1016/j.ijpe.2017.09.019.
[74] J. Luedtke and S. Ahmed, "A SAMPLE APPROXIMATION APPROACH FOR
OPTIMIZATION WITH PROBABILISTIC CONSTRAINTS," SIAM J. Optim.,
vol. 19, no. 2, pp. 674-699, 2008, doi: 10.1137/070702928.
[75] L. J. Hong, Y. Yang, and L. W. Zhang, "Sequential Convex Approximations to
Joint Chance Constrained Programs: A Monte Carlo Approach," Oper. Res., vol.
59, no. 3, pp. 617-630, May-Jun 2011, doi: 10.1287/opre.1100.0910.
[76] F. E. Curtis, A. Wachter, and V. M. Zavala, "A SEQUENTIAL ALGORITHM
FOR SOLVING NONLINEAR OPTIMIZATION PROBLEMS WITH
CHANCE CONSTRAINTS," SIAM J. Optim., vol. 28, no. 1, pp. 930-958, 2018,
doi: 10.1137/16m109003x.
[77] A. Nemirovski and A. Shapiro, "Convex approximations of chance constrained
programs," SIAM J. Optim., vol. 17, no. 4, pp. 969-996, 2006, doi:
10.1137/050622328.
[78] C. D. Maranas, "Optimization accounting for property prediction uncertainty in
polymer design," Comput. Chem. Eng., vol. 21, pp. S1019-S1024, 1997.
[Online]. Available: <Go to ISI>://WOS:A1997XD31300168.
[79] A. Gupta, C. D. Maranas, and C. M. McDonald, "Mid-term supply chain
planning under demand uncertainty: customer demand satisfaction and
inventory management," Comput. Chem. Eng., vol. 24, no. 12, pp. 2613-2621,
Dec 2000, doi: 10.1016/s0098-1354(00)00617-7.
[80] F. Q. You and I. E. Grossmann, "Stochastic Inventory Management for Tactical
Process Planning Under Uncertainties: MINLP Models and Algorithms," AIChE
J., vol. 57, no. 5, pp. 1250-1277, May 2011, doi: 10.1002/aic.12338.
227
[81] D. J. Yue and F. Q. You, "Planning and Scheduling of Flexible Process
Networks Under Uncertainty with Stochastic Inventory: MINLP Models and
Algorithm," AIChE J., vol. 59, no. 5, pp. 1511-1532, May 2013, doi:
10.1002/aic.13924.
[82] W. Shen, Z. Li, B. Huang, and N. M. Jan, "Chance-Constrained Model
Predictive Control for SAGD Process Using Robust Optimization
Approximation," Ind. Eng. Chem. Res., 2018/11/01 2018, doi:
10.1021/acs.iecr.8b03207.
[83] M. Cannon, B. Kouvaritakis, and X. J. Wu, "Probabilistic Constrained MPC for
Multiplicative and Additive Stochastic Uncertainty," IEEE Trans. Autom.
Control., vol. 54, no. 7, pp. 1626-1632, Jul 2009, doi: 10.1109/tac.2009.2017970.
[84] P. Li, H. Arellano-Garcia, and G. Wozny, "Chance constrained programming
approach to process optimization under uncertainty," Comput. Chem. Eng., vol.
32, no. 1-2, pp. 25-45, Jan-Feb 2008, doi: 10.1016/j.compchemeng.2007.05.009.
[85] Y. Yang, P. Vayanos, and P. I. Barton, "Chance-Constrained Optimization for
Refinery Blend Planning under Uncertainty," Ind. Eng. Chem. Res., vol. 56, no.
42, pp. 12139-12150, Oct 2017, doi: 10.1021/acs.iecr.7b02434.
[86] S. S. Liu, S. S. Farid, and L. G. Papageorgiou, "Integrated Optimization of
Upstream and Downstream Processing in Biopharmaceutical Manufacturing
under Uncertainty: A Chance Constrained Programming Approach," Ind. Eng.
Chem. Res., vol. 55, no. 16, pp. 4599-4612, Apr 2016, doi:
10.1021/acs.iecr.5b04403.
[87] K. Mitra, R. D. Gudi, S. C. Patwardhan, and G. Sardar, "Midterm supply chain
planning under uncertainty: A multiobjective chance constrained programming
framework," Ind. Eng. Chem. Res., vol. 47, no. 15, pp. 5501-5511, Aug 2008,
doi: 10.1021/ie0710364.
[88] J. Yang, H. Gu, and G. Rong, "Supply Chain Optimization for Refinery with
Considerations of Operation Mode Changeover and Yield Fluctuations," Ind.
Eng. Chem. Res., vol. 49, no. 1, pp. 276-287, Jan 2010, doi: 10.1021/ie900968x.
[89] F. Q. You and I. E. Grossmann, "Balancing Responsiveness and Economics in
Process Supply Chain Design with Multi-Echelon Stochastic Inventory," AIChE
J., vol. 57, no. 1, pp. 178-192, Jan 2011, doi: 10.1002/aic.12244.
[90] F. Q. You and I. E. Grossmann, "Mixed-Integer Nonlinear Programming Models
and Algorithms for Large-Scale Supply Chain Design with Stochastic Inventory
Management," Ind. Eng. Chem. Res., vol. 47, no. 20, pp. 7802-7817, Oct 2008,
doi: 10.1021/ie800257x.
[91] Y. Yuan, Z. Li, and B. Huang, "Robust optimization under correlated uncertainty:
Formulations and computational study," Comput. Chem. Eng., vol. 85, pp. 58-
71, 2016. [Online]. Available:
[92] A. Ben-Tal and A. Nemirovski, "Robust solutions of Linear Programming
problems contaminated with uncertain data," Math. Programming, vol. 88, p.
411, 2000.
[93] D. Bertsimas and M. Sim, "The price of robustness," Oper. Res., vol. 52, no. 1,
p. 35, 2004.
228
[94] A. Ben-Tal, L. E. Ghaoui, and A. Nemirovski, Robust Optimization. Princeton
University Press, 2009.
[95] C. Gregory, K. Darby-Dowman, and G. Mitra, "Robust optimization and
portfolio selection: The cost of robustness," Eur. J. Oper. Res., vol. 212, no. 2,
pp. 417-428, 2011, doi: http://dx.doi.org/10.1016/j.ejor.2011.02.015.
[96] T. Assavapokee, M. J. Realff, and J. C. Ammons, "Min-Max Regret Robust
Optimization Approach on Interval Data Uncertainty," J. Optim. Theory Appl.,
journal article vol. 137, no. 2, pp. 297-316, 2008, doi: 10.1007/s10957-007-
9334-6.
[97] A. L. Soyster, "Technical Note—Convex Programming with Set-Inclusive
Constraints and Applications to Inexact Linear Programming," Oper. Res., vol.
21, no. 5, pp. 1154-1157, 1973, doi: doi:10.1287/opre.21.5.1154.
[98] D. Bertsimas, D. B. Brown, and C. Caramanis, "Theory and applications of
robust optimization," SIAM Rev., vol. 53, no. 3, pp. 464-501, 2011.
[99] Á. Lorca, X. A. Sun, E. Litvinov, and T. Zheng, "Multistage adaptive robust
optimization for the unit commitment problem," Operations Research, vol. 61,
no. 1, pp. 32-51, 2016.
[100] A. Lorca and X. A. Sun, "Adaptive robust optimization with dynamic
uncertainty sets for multi-period economic dispatch under significant wind,"
Power Systems, IEEE Transactions on, vol. 30, no. 4, pp. 1702-1713, 2015.
[101] A. Atamtürk and M. Zhang, "Two-stage robust network flow and design under
demand uncertainty," Oper. Res., vol. 55, no. 4, pp. 662-673, 2007.
[102] D. Bertsimas, E. Litvinov, X. A. Sun, J. Zhao, and T. Zheng, "Adaptive Robust
Optimization for the Security Constrained Unit Commitment Problem," IEEE
Trans. Power Syst., vol. 28, no. 1, pp. 52-63, 2013.
[103] B. Zeng and L. Zhao, "Solving two-stage robust optimization problems using a
column-and-constraint generation method," Oper. Res. Lett., vol. 41, no. 5, pp.
457-461, 2013. [Online]. Available:
[104] Q. Zhang, M. F. Morari, I. E. Grossmann, A. Sundaramoorthy, and J. M. Pinto,
"An adjustable robust optimization approach to scheduling of continuous
industrial processes providing interruptible load," Comput. Chem. Eng., vol. 86,
pp. 106-119, 2016, doi: http://dx.doi.org/10.1016/j.compchemeng.2015.12.018.
[105] N. H. Lappas and C. E. Gounaris, "Multi-stage Adjustable Robust Optimization
for Process Scheduling under Uncertainty," AIChE Journal, vol. 62, no. 5, pp.
1646-1667, 2016, doi: 10.1002/aic.15183.
[106] A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski, "Adjustable robust
solutions of uncertain linear programs," Math. Program., vol. 99, no. 2, pp. 351-
376, 2004, doi: 10.1007/s10107-003-0454-y.
[107] H. Shi and F. You, "A computational framework and solution algorithms for
two-stage adaptive robust scheduling of batch manufacturing processes under
uncertainty," AIChE J., vol. 62, no. 3, pp. 687-703, 2016. [Online]. Available:
http://dx.doi.org/10.1002/aic.15067.
[108] J. Gong, D. J. Garcia, and F. You, "Unraveling optimal biomass processing
routes from bioconversion product and process networks under uncertainty: An
229
adaptive robust optimization approach," ACS Sustain. Chem. Eng., vol. 4, no. 6,
pp. 3160-3173, 2016, doi: 10.1021/acssuschemeng.6b00188.
[109] J. Gong and F. You, "Optimal processing network design under uncertainty for
producing fuels and value-added bioproducts from microalgae: Two-stage
adaptive robust mixed integer fractional programming model and
computationally efficient solution algorithm," AIChE J., vol. 63, no. 2, pp. 582-
600, 2017, doi: 10.1002/aic.15370.
[110] E. Delage and D. A. Iancu, "Robust multistage decision making." Catonsville,
MD: INFORMS Tutorials in Operations Research, 2015, pp. 20-46.
[111] C. Ning and F. You, "A Transformation-Proximal Bundle Algorithm for Solving
Large-Scale Multistage Adaptive Robust Optimization Problems," arXiv
preprint arXiv:1810.05931, 2018.
[112] C. Ning and F. You, "A Data-Driven Multistage Adaptive Robust Optimization
Framework for Planning and Scheduling under Uncertainty," AIChE J., vol. 63,
no. 10, pp. 4343–4369, 2017b, doi: 10.1002/aic.15792.
[113] K. McLean and X. Li, "Robust Scenario Formulations for Strategic Supply
Chain Optimization under Uncertainty," Ind. Eng. Chem. Res., vol. 52, no. 16,
pp. 5721-5734, 2013, doi: 10.1021/ie303114r.
[114] D. Yue and F. You, "Optimal supply chain design and operations under multi-
scale uncertainties: Nested stochastic robust optimization modeling framework
and solution algorithm," AIChE J., vol. 62, no. 9, pp. 3041-3055, 2016, doi:
10.1002/aic.15255.
[115] C. Liu, C. Lee, H. Chen, and S. Mehrotra, "Stochastic Robust Mathematical
Programming Model for Power System Optimization," IEEE Trans. Power Syst.,
vol. 31, no. 1, pp. 821-822, 2016, doi: 10.1109/TPWRS.2015.2394320.
[116] L. Baringo and A. Baringo, "A Stochastic Adaptive Robust Optimization
Approach for the Generation and Transmission Expansion Planning," IEEE
Trans. Power Syst., vol. 33, no. 1, pp. 792-802, Jan 2018, doi:
10.1109/tpwrs.2017.2713486.
[117] C. Y. Zhao and Y. P. Guan, "Unified Stochastic and Robust Unit Commitment,"
IEEE Trans. Power Syst., vol. 28, no. 3, pp. 3353-3361, Aug 2013, doi:
10.1109/tpwrs.2013.2251916.
[118] G. D. Liu, Y. Xu, and K. Tomsovic, "Bidding Strategy for Microgrid in Day-
Ahead Market Based on Hybrid Stochastic/Robust Optimization," IEEE
Transactions on Smart Grid, vol. 7, no. 1, pp. 227-237, Jan 2016, doi:
10.1109/tsg.2015.2476669.
[119] E. Keyvanshokooh, S. M. Ryan, and E. Kabir, "Hybrid robust and stochastic
optimization for closed-loop supply chain network design using accelerated
Benders decomposition," Eur. J. Oper. Res., vol. 249, no. 1, pp. 76-92, Feb 2016,
doi: 10.1016/j.ejor.2015.08.028.
[120] P. Parpas, B. Rustem, and E. Pistikopoulos, "Global optimization of robust
chance constrained problems," Journal of Global Optimization, vol. 43, no. 2-3,
pp. 231-247, Mar 2009, doi: 10.1007/s10898-007-9244-z.
230
[121] J. E. Smith and R. L. Winkler, "The optimizer's curse: Skepticism and
postdecision surprise in decision analysis," Manage. Sci., vol. 52, no. 3, pp. 311-
322, 2006, doi: 10.1287/mnsc.1050.0451.
[122] E. Delage and Y. Y. Ye, "Distributionally Robust Optimization Under Moment
Uncertainty with Application to Data-Driven Problems," Oper. Res., vol. 58, no.
3, pp. 595-612, May-Jun 2010, doi: 10.1287/opre.1090.0741.
[123] G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann, "A distributionally
robust perspective on uncertainty quantification and chance constrained
programming," Math. Program., vol. 151, no. 1, pp. 35-62, Jun 2015, doi:
10.1007/s10107-015-0896-z.
[124] P. M. Esfahani and D. Kuhn, "Data-driven distributionally robust optimization
using the Wasserstein metric: performance guarantees and tractable
reformulations," Math. Program., vol. 171, no. 1-2, pp. 115-166, 2018, doi:
10.1007/s10107-017-1172-1.
[125] C. Shang and F. Q. You, "Distributionally robust optimization for planning and
scheduling under uncertainty," Comput. Chem. Eng., vol. 110, pp. 53-68, Feb
2018, doi: 10.1016/j.compchemeng.2017.12.002.
[126] G. C. Calafiore and L. El Ghaoui, "On distributionally robust chance-
constrained linear programs," J. Optim. Theory Appl., vol. 130, no. 1, pp. 1-22,
Jul 2006, doi: 10.1007/s10957-006-9084-x.
[127] J. Gao, C. Ning, and F. You, "Data-driven distributionally robust optimization
of shale gas supply chains under uncertainty," AIChE J., vol.
(doi:10.1002/aic.16488), no. 0, 2018, doi: doi:10.1002/aic.16488.
[128] Z. Hu and L. J. Hong, "Kullback-Leibler divergence constrained distributionally
robust optimization," Available at Optimization Online, 2013.
[129] D. Klabjan, D. Simchi-Levi, and M. Song, "Robust Stochastic Lot-Sizing by
Means of Histograms," Production and Operations Management, vol. 22, no. 3,
pp. 691-710, May-Jun 2013, doi: 10.1111/j.1937-5956.2012.01420.x.
[130] G. Bayraksan and D. K. Love, "Data-Driven Stochastic Programming Using
Phi-Divergences," in The Operations Research Revolution, 2015, pp. 1-19.
[131] G. A. Hanasusanto and D. Kuhn, "Conic Programming Reformulations of Two-
Stage Distributionally Robust Linear Programs over Wasserstein Balls," Oper.
Res., vol. 66, no. 3, pp. 849-869, May-Jun 2018, doi: 10.1287/opre.2017.1698.
[132] D. Bertsimas, M. Sim, and M. Zhang, "Adaptive Distributionally Robust
Optimization," Manage. Sci., vol. 0, no. 0, p. null, doi: 10.1287/mnsc.2017.2952.
[133] P. Xiong, P. Jirutitijaroen, and C. Singh, "A Distributionally Robust
Optimization Model for Unit Commitment Considering Uncertain Wind Power
Generation," IEEE Trans. Power Syst., vol. 32, no. 1, pp. 39-49, Jan 2017, doi:
10.1109/tpwrs.2016.2544795.
[134] Y. W. Chen, Q. L. Guo, H. B. Sun, Z. S. Li, W. C. Wu, and Z. H. Li, "A
Distributionally Robust Optimization Model for Unit Commitment Based on
Kullback-Leibler Divergence," IEEE Trans. Power Syst., vol. 33, no. 5, pp.
5147-5160, Sep 2018, doi: 10.1109/tpwrs.2018.2797069.
231
[135] C. Duan, L. Jiang, W. L. Fang, and J. Liu, "Data-Driven Affinely Adjustable
Distributionally Robust Unit Commitment," IEEE Trans. Power Syst., vol. 33,
no. 2, pp. 1385-1398, Mar 2018, doi: 10.1109/tpwrs.2017.2741506.
[136] C. Y. Zhao and Y. P. Guan, "Data-Driven Stochastic Unit Commitment for
Integrating Wind Generation," IEEE Trans. Power Syst., vol. 31, no. 4, pp. 2587-
2596, Jul 2016, doi: 10.1109/tpwrs.2015.2477311.
[137] C. Wang, R. Gao, F. Qiu, J. Wang, and L. Xin, "Risk-Based Distributionally
Robust Optimal Power Flow With Dynamic Line Rating," IEEE Trans. Power
Syst., vol. 33, no. 6, pp. 6074-6086, 2018, doi: 10.1109/TPWRS.2018.2844356.
[138] Y. Guo, K. Baker, E. Dall'Anese, Z. Hu, and T. Summers, "Stochastic Optimal
Power Flow Based on Data-Driven Distributionally Robust Optimization," in
2018 Annual American Control Conference (ACC), 27-29 June 2018 2018, pp.
3840-3846, doi: 10.23919/ACC.2018.8431542.
[139] S. Zymler, D. Kuhn, and B. Rustem, "Distributionally robust joint chance
constraints with second-order moment information," Math. Program., vol. 137,
no. 1-2, pp. 167-198, 2013.
[140] B. Li, R. Jiang, and J. L. Mathieu, "Ambiguous risk constraints with moment
and unimodality information," Math. Program., journal article November 24
2017, doi: 10.1007/s10107-017-1212-x.
[141] Z. Chen, S. Peng, and J. Liu, "Data-Driven Robust Chance Constrained
Problems: A Mixture Model Approach," J. Optim. Theory Appl., journal article
vol. 179, no. 3, pp. 1065-1085, December 01 2018, doi: 10.1007/s10957-018-
1376-4.
[142] L. El Ghaoui, M. Oks, and F. Oustry, "Worst-case Value-at-Risk and robust
portfolio optimization: A conic programming approach," Oper. Res., vol. 51, no.
4, pp. 543-556, Jul-Aug 2003. [Online]. Available: <Go to
ISI>://WOS:000184512800004.
[143] J. Cheng, E. Delage, and A. Lisser, "Distributionally Robust Stochastic
Knapsack Problem," SIAM J. Optim., vol. 24, no. 3, pp. 1485-1506, 2014, doi:
10.1137/130915315.
[144] Y. Zhang, R. Jiang, and S. Shen, "Ambiguous Chance-Constrained Binary
Programs under Mean-Covariance Information," SIAM J. Optim., vol. 28, no. 4,
pp. 2922-2944, 2018, doi: 10.1137/17m1158707.
[145] G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann, "Ambiguous Joint
Chance Constraints Under Mean and Dispersion Information," Oper. Res., vol.
65, no. 3, pp. 751-767, May-Jun 2017, doi: 10.1287/opre.2016.1583.
[146] W. Wiesemann, D. Kuhn, and M. Sim, "Distributionally Robust Convex
Optimization," Oper. Res., vol. 62, no. 6, pp. 1358-1376, Nov-Dec 2014, doi:
10.1287/opre.2014.1314.
[147] W. Z. Yang and H. Xu, "Distributionally robust chance constraints for non-linear
uncertainties," Math. Program., vol. 155, no. 1-2, pp. 231-265, Jan 2016, doi:
10.1007/s10107-014-0842-5.
[148] W. J. Xie and S. Ahmed, "On Deterministic Reformulations of Distributionally
Robust Joint Chance Constrained Optimization Problems," SIAM J. Optim., vol.
28, no. 2, pp. 1151-1182, 2018, doi: 10.1137/16m1094725.
232
[149] K. Postek, A. Ben-Tal, D. den Hertog, and B. Melenberg, "Robust Optimization
with Ambiguous Stochastic Constraints Under Mean and Dispersion
Information," Oper. Res., vol. 66, no. 3, pp. 814-833, May-Jun 2018, doi:
10.1287/opre.2017.1688.
[150] J. Lasserre and T. Weisser, "Distributionally robust polynomial chance-
constraints under mixture ambiguity sets," 2018.
[151] E. Erdogan and G. Iyengar, "Ambiguous chance constrained problems and
robust optimization," Math. Program., vol. 107, no. 1-2, pp. 37-61, Jun 2006,
doi: 10.1007/s10107-005-0678-0.
[152] R. W. Jiang and Y. P. Guan, "Data-driven chance constrained stochastic
program," Math. Program., vol. 158, no. 1-2, pp. 291-327, Jul 2016, doi:
10.1007/s10107-015-0929-7.
[153] Z. Chen, D. Kuhn, and W. Wiesemann, "Data-Driven Chance Constrained
Programs over Wasserstein Balls," arXiv preprint arXiv:1809.00210, 2018.
[154] R. Ji and M. Lejeune, "Data-Driven Distributionally Robust Chance-
Constrained Programming with Wasserstein Metric," 2018.
[155] R. Gao and A. J. Kleywegt, "Distributionally robust stochastic optimization with
Wasserstein distance," arXiv preprint arXiv:1604.02199, 2016.
[156] W. Xie, "On Distributionally Robust Chance Constrained Program with
Wasserstein Distance," arXiv preprint arXiv:1806.07418, 2018.
[157] A. R. Hota, A. Cherukuri, and J. Lygeros, "Data-Driven Chance Constrained
Optimization under Wasserstein Ambiguity Sets," arXiv preprint
arXiv:1805.06729, 2018.
[158] W. Xie and S. Ahmed, "Distributionally Robust Chance Constrained Optimal
Power Flow with Renewables: A Conic Reformulation," IEEE Trans. Power
Syst., vol. 33, no. 2, pp. 1860-1867, 2018, doi: 10.1109/TPWRS.2017.2725581.
[159] B. P. G. Van Parys, D. Kuhn, P. J. Goulart, and M. Morari, "Distributionally
Robust Control of Constrained Stochastic Systems," IEEE Trans. Autom.
Control., vol. 61, no. 2, pp. 430-442, Feb 2016, doi: 10.1109/tac.2015.2444134.
[160] S. Ghosal and W. Wiesemann, "The Distributionally Robust Chance
Constrained Vehicle Routing Problem," Available on Optimization Online, 2018.
[161] D. E. Bell, "Regret in Decision Making under Uncertainty," Oper. Res., vol. 30,
no. 5, pp. 961-981, 1982, doi: 10.1287/opre.30.5.961.
[162] C. Ning and F. You, "Adaptive robust optimization with minimax regret
criterion: Multiobjective optimization framework and computational algorithm
for planning and scheduling under uncertainty," Comput. Chem. Eng., vol. 108,
no. Supplement C, pp. 425-447, 2018, doi:
[163] C. Ning and F. You, "Data-driven stochastic robust optimization: General
computational framework and algorithm leveraging machine learning for
optimization under uncertainty in the big data era," Comput. Chem. Eng., vol.
111, pp. 115-133, 2018, doi:
233
[164] C. Shang, X. Huang, and F. You, "Data-driven robust optimization based on
kernel learning," Comput. Chem. Eng., vol. 106, pp. 464-479, 2017, doi:
[165] D. Bertsimas, V. Gupta, and N. Kallus, "Data-driven robust optimization," Math.
Program., journal article vol. 167, no. 2, pp. 235-292, February 01 2018, doi:
10.1007/s10107-017-1125-8.
[166] Y. Zhang, X. Z. Jin, Y. P. Feng, and G. Rong, "Data-driven robust optimization
under correlated uncertainty: A case study of production scheduling in ethylene
plant (Reprinted from computers and Chemical Engineering, vol 109, pg 48-67,
2017)," Comput. Chem. Eng., vol. 116, pp. 17-36, Aug 2018, doi:
10.1016/j.compchemeng.2017.10.039.
[167] Y. Zhang, Y. P. Feng, and G. Rong, "Data-driven rolling-horizon robust
optimization for petrochemical scheduling using probability density contours,"
Comput. Chem. Eng., vol. 115, pp. 342-360, Jul 2018, doi:
10.1016/j.compchemeng.2018.04.013.
[168] L. Zhao, C. Ning, and F. You, "Operational optimization of industrial steam
systems under uncertainty using data-Driven adaptive robust optimization,"
AIChE J., vol. (doi:10.1002/aic.16500), no. 0, doi: doi:10.1002/aic.16500.
[169] F. Miao et al., "Data-Driven Robust Taxi Dispatch Under Demand
Uncertainties," IEEE Transactions on Control Systems Technology, vol. 27, no.
1, pp. 175-191, Jan 2019, doi: 10.1109/tcst.2017.2766042.
[170] G. Calafiore and M. C. Campi, "Uncertain convex programs: randomized
solutions and confidence levels," Math. Program., journal article vol. 102, no.
1, pp. 25-46, January 01 2005, doi: 10.1007/s10107-003-0499-y.
[171] M. C. Campi, S. Garatti, and M. Prandini, "The scenario approach for systems
and control design," Annual Reviews in Control, vol. 33, no. 2, pp. 149-157, Dec
2009, doi: 10.1016/j.arcontrol.2009.07.001.
[172] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press,
2004.
[173] M. C. Campi and S. Garatti, "The Exact Feasibility of Randomized Solutions of
Uncertain Convex Programs," SIAM J. Optim., vol. 19, no. 3, pp. 1211-1230,
2008, doi: 10.1137/07069821x.
[174] X. J. Zhang, S. Grammatico, G. Schildbach, P. Goulart, and J. Lygeros, "On the
sample size of random convex programs with structured dependence on the
uncertainty," Automatica, vol. 60, pp. 182-188, Oct 2015, doi:
10.1016/j.automatica.2015.07.013.
[175] T. Kanamori and A. Takeda, "Worst-Case Violation of Sampled Convex
Programs for Optimization with Uncertainty," J. Optim. Theory Appl., vol. 152,
no. 1, pp. 171-197, Jan 2012, doi: 10.1007/s10957-011-9923-2.
[176] G. Calafiore, "On the Expected Probability of Constraint Violation in Sampled
Convex Programs," J. Optim. Theory Appl., vol. 143, no. 2, pp. 405-412, Nov
2009, doi: 10.1007/s10957-009-9579-3.
[177] P. M. Esfahani, T. Sutter, and J. Lygeros, "Performance Bounds for the Scenario
Approach and an Extension to a Class of Non-Convex Programs," IEEE Trans.
234
Autom. Control., vol. 60, no. 1, pp. 46-58, Jan 2015, doi:
10.1109/tac.2014.2330702.
[178] G. C. Calafiore, "RANDOM CONVEX PROGRAMS," SIAM J. Optim., vol. 20,
no. 6, pp. 3427-3464, 2010, doi: 10.1137/090773490.
[179] M. C. Campi and S. Garatti, "A Sampling-and-Discarding Approach to Chance-
Constrained Optimization: Feasibility and Optimality," J. Optim. Theory Appl.,
vol. 148, no. 2, pp. 257-280, Feb 2011, doi: 10.1007/s10957-010-9754-6.
[180] M. C. Campi and S. Garatti, "Wait-and-judge scenario optimization," Math.
Program., vol. 167, no. 1, pp. 155-189, Jan 2018, doi: 10.1007/s10107-016-
1056-9.
[181] N. Kariotoglou, K. Margellos, and J. Lygeros, "On the computational
complexity and generalization properties of multi-stage and stage-wise coupled
scenario programs," Systems & Control Letters, vol. 94, pp. 63-69, Aug 2016,
doi: 10.1016/j.sysconle.2016.05.009.
[182] P. Vayanos, D. Kuhn, and B. Rustem, "A constraint sampling approach for
multi-stage robust optimization," Automatica, vol. 48, no. 3, pp. 459-471,
2012/03/01/ 2012, doi: 10.1016/j.automatica.2011.12.002.
[183] G. Calafiore, D. Lyons, and L. Fagiano, "On mixed-integer random convex
programs," in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC),
10-13 Dec. 2012 2012, pp. 3508-3513, doi: 10.1109/CDC.2012.6426905.
[184] J. A. De Loera, R. N. La Haye, D. Oliveros, and E. Roldan-Pensado, "Chance-
Constrained Convex Mixed-Integer Optimization and Beyond: Two Sampling
Algorithms within S-Optimization," Journal of Convex Analysis, vol. 25, no. 1,
pp. 201-218, 2018. [Online]. Available: <Go to ISI>://WOS:000428115600012.
[185] M. Chamanbaz, F. Dabbene, R. Tempo, V. Venkataramanan, and Q. G. Wang,
"Sequential Randomized Algorithms for Convex Optimization in the Presence
of Uncertainty," IEEE Trans. Autom. Control., vol. 61, no. 9, pp. 2565-2571,
Sep 2016, doi: 10.1109/tac.2015.2494875.
[186] T. Alamo, R. Tempo, A. Luque, and D. R. Ramirez, "Randomized methods for
design of uncertain systems: Sample complexity and sequential algorithms,"
Automatica, vol. 52, pp. 160-172, Feb 2015, doi:
10.1016/j.automatica.2014.11.004.
[187] G. Calafiore, "Repetitive Scenario Design," IEEE Trans. Autom. Control., vol.
62, no. 3, pp. 1125-1137, Mar 2017, doi: 10.1109/tac.2016.2575859.
[188] K. You, R. Tempo, and P. Xie, "Distributed Algorithms for Robust Convex
Optimization via the Scenario Approach," IEEE Trans. Autom. Control., pp. 1-
1, 2018, doi: 10.1109/TAC.2018.2828093.
[189] K. Margellos, A. Falsone, S. Garatti, and M. Prandini, "Distributed Constrained
Optimization and Consensus in Uncertain Networks via Proximal
Minimization," IEEE Trans. Autom. Control., vol. 63, no. 5, pp. 1372-1387,
May 2018, doi: 10.1109/tac.2017.2747505.
[190] L. Carlone, V. Srivastava, F. Bullo, and G. C. Calafiore, "Distributed Random
Convex Programming via Constraints Consensus," SIAM Journal on Control
and Optimization, vol. 52, no. 1, pp. 629-662, 2014, doi: 10.1137/120885796.
235
[191] A. Care, S. Garatti, and M. C. Campi, "FAST-Fast Algorithm for the Scenario
Technique," Oper. Res., vol. 62, no. 3, pp. 662-671, May-Jun 2014, doi:
10.1287/opre.2014.1257.
[192] M. C. Campi, S. Garatti, and F. A. Ramponi, "A General Scenario Theory for
Nonconvex Optimization and Decision Making," IEEE Trans. Autom. Control.,
vol. 63, no. 12, pp. 4067-4078, 2018, doi: 10.1109/TAC.2018.2808446.
[193] T. Alamo, R. Tempo, and E. F. Camacho, "Randomized Strategies for
Probabilistic Solutions of Uncertain Feasibility and Optimization Problems,"
IEEE Trans. Autom. Control., vol. 54, no. 11, pp. 2545-2559, Nov 2009, doi:
10.1109/tac.2009.2031207.
[194] G. Calafiore, F. Dabbene, and R. Tempo, "Research on probabilistic methods
for control system design," Automatica, vol. 47, no. 7, pp. 1279-1293, Jul 2011,
doi: 10.1016/j.automatica.2011.02.029.
[195] S. Grammatico, X. J. Zhang, K. Margellos, P. Goulart, and J. Lygeros, "A
Scenario Approach for Non-Convex Control Design," IEEE Trans. Autom.
Control., vol. 61, no. 2, pp. 334-345, Feb 2016, doi: 10.1109/tac.2015.2433591.
[196] A. R. Mohamed, G. E. Dahl, and G. Hinton, "Acoustic Modeling Using Deep
Belief Networks," Ieee Transactions on Audio Speech and Language Processing,
vol. 20, no. 1, pp. 14-22, Jan 2012, doi: 10.1109/tasl.2011.2109382.
[197] Z. P. Zhang and J. S. Zhao, "A deep belief network based fault diagnosis model
for complex chemical processes," Comput. Chem. Eng., vol. 107, pp. 395-407,
Dec 2017, doi: 10.1016/j.compchemeng.2017.02.041.
[198] C. Shang, F. Yang, D. X. Huang, and W. X. Lyu, "Data-driven soft sensor
development based on deep learning technique," Journal of Process Control,
vol. 24, no. 3, pp. 223-233, Mar 2014, doi: 10.1016/j.jprocont.2014.012.
[199] E. Gawehn, J. A. Hiss, and G. Schneider, "Deep learning in drug discovery,"
Molecular informatics, vol. 35, no. 1, pp. 3-14, 2016.
[200] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with
Deep Convolutional Neural Networks," Communications of the Acm, vol. 60, no.
6, pp. 84-90, Jun 2017, doi: 10.1145/3065386.
[201] Y. Wu, H. Tan, L. Qin, B. Ran, and Z. Jiang, "A hybrid deep learning based
traffic flow prediction method and its understanding," Transportation Research
Part C: Emerging Technologies, vol. 90, pp. 166-180, 2018/05/01/ 2018, doi:
https://doi.org/10.1016/j.trc.2018.03.001.
[202] A. Graves, A. R. Mohamed, G. Hinton, and Ieee, "SPEECH RECOGNITION
WITH DEEP RECURRENT NEURAL NETWORKS," in 2013 Ieee
International Conference on Acoustics, Speech and Signal Processing,
(International Conference on Acoustics Speech and Signal Processing ICASSP,
2013, pp. 6645-6649.
[203] J. Vermaak and E. C. Botha, "Recurrent neural networks for short-term load
forecasting," IEEE Trans. Power Syst., vol. 13, no. 1, pp. 126-132, 1998, doi:
10.1109/59.651623.
[204] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural
computation, vol. 9, no. 8, pp. 1735-1780, 1997.
236
[205] J. Potočnik, "Renewable Energy Sources and the Realities of Setting an Energy
Agenda," Science, vol. 315, no. 5813, pp. 810-811, 2007, doi:
10.1126/science.1139086.
[206] H. Kopetz, "Build a biomass energy market," Nature, Comment vol. 494, pp.
29-31, 2013, doi: 10.1038/494029a.
[207] D. Yue, F. You, and S. W. Snyder, "Biomass-to-bioenergy and biofuel supply
chain optimization: Overview, key issues and challenges," Comput. Chem. Eng.,
vol. 66, pp. 36-56, 2014, doi:
http://dx.doi.org/10.1016/j.compchemeng.2013.11.016.
[208] Z. Hu, Y. Wang, and Z. Wen, "Alkali (NaOH) pretreatment of switchgrass by
radio frequency-based dielectric heating," Appl. Biochem. Biotechnol., vol. 148,
no. 1-3, pp. 71-81, 2008, doi: 10.1007/s12010-007-8083-1.
[209] M. Safar et al., "Catalytic effects of potassium on biomass pyrolysis, combustion
and torrefaction," Applied Energy, vol. 235, pp. 346-355, 2019, doi:
https://doi.org/10.1016/j.apenergy.2018.10.065.
[210] V. Benedetti, F. Patuzzi, and M. Baratieri, "Characterization of char from
biomass gasification and its similarities with activated carbon in adsorption
applications," Applied Energy, vol. 227, pp. 92-99, 2018, doi:
[211] W. Zhang, J. R. Barone, and S. Renneckar, "Biomass Fractionation after
Denaturing Cell Walls by Glycerol Thermal Processing," ACS Sustain. Chem.
Eng., vol. 3, no. 3, pp. 413-420, 2015, doi: 10.1021/sc500564g.
[212] T. Damartzis and A. Zabaniotou, "Thermochemical conversion of biomass to
second generation biofuels through integrated process design—A review,"
Renewable and Sustainable Energy Reviews, vol. 15, no. 1, pp. 366-378, 2011,
doi: https://doi.org/10.1016/j.rser.2010.08.003.
[213] K. Dutta, A. Daverey, and J.-G. Lin, "Evolution retrospective for alternative
fuels: First to fourth generation," Renewable Energy, vol. 69, pp. 114-122, 2014,
doi: https://doi.org/10.1016/j.renene.2014.02.044.
[214] R. A. Lee and J.-M. Lavoie, "From first- to third-generation biofuels: Challenges
of producing a commodity from a biomass of increasing complexity," Animal
Frontiers, vol. 3, no. 2, pp. 6-11, 2013, doi: 10.2527/af.2013-0010.
[215] L. Gil-Carrera, J. D. Browne, I. Kilgallon, and J. D. Murphy, "Feasibility study
of an off-grid biomethane mobile solution for agri-waste," Applied Energy, vol.
239, pp. 471-481, 2019, doi: https://doi.org/10.1016/j.apenergy.2019.01.141.
[216] J. Lee et al., "Pyrolysis process of agricultural waste using CO2 for waste
management, energy recovery, and biochar fabrication," Applied Energy, vol.
185, pp. 214-222, 2017, doi: https://doi.org/10.1016/j.apenergy.2016.10.092.
[217] M. Rajinipriya, M. Nagalakshmaiah, M. Robert, and S. Elkoun, "Importance of
Agricultural and Industrial Waste in the Field of Nanocellulose and Recent
Industrial Developments of Wood Based Nanocellulose: A Review," ACS
Sustain. Chem. Eng., vol. 6, no. 3, pp. 2807-2828, 2018, doi:
10.1021/acssuschemeng.7b03437.
237
[218] W. H. Chen et al., "A comprehensive analysis of food waste derived liquefaction
bio-oil properties for industrial application," Applied Energy, vol. 237, pp. 283-
291, 2019, doi: 10.1016/j.apenergy.2018.12.084.
[219] D. J. Garcia and F. You, "Multiobjective optimization of product and process
networks: General modeling framework, efficient global optimization algorithm,
and case studies on bioconversion," AIChE J., vol. 61, no. 2, pp. 530-554, 2015,
doi: 10.1002/aic.14666.
[220] A. Soroudi and T. Amraee, "Decision making under uncertainty in energy
systems: State of the art," Renewable and Sustainable Energy Reviews, vol. 28,
pp. 376-384, 2013, doi: https://doi.org/10.1016/j.rser.2013.08.039.
[221] C. Ning and F. You, "Optimization under uncertainty in the era of big data and
deep learning: When machine learning meets mathematical programming,"
Comput. Chem. Eng., vol. 125, pp. 434-448, 2019, doi:
[222] C. Ning and F. You, "A data-driven multistage adaptive robust optimization
framework for planning and scheduling under uncertainty," AIChE J., vol. 63,
no. 10, pp. 4343-4369, 2017, doi: 10.1002/aic.15792.
[223] P. Daoutidis, W. A. Marvin, S. Rangarajan, and A. I. Torres, "Engineering
Biomass Conversion Processes: A Systems Perspective," AIChE J., vol. 59, no.
1, pp. 3-18, 2013, doi: 10.1002/aic.13978.
[224] S. Rangarajan, A. Bhan, and P. Daoutidis, "Rule-Based Generation of
Thermochemical Routes to Biomass Conversion," Ind. Eng. Chem. Res., vol. 49,
no. 21, pp. 10459-10470, 2010, doi: 10.1021/ie100546t.
[225] J. Kim, S. M. Sen, and C. T. Maravelias, "An optimization-based assessment
framework for biomass-to-fuel conversion strategies," Energy & Environmental
Science, vol. 6, no. 4, pp. 1093-1104, 2013, doi: 10.1039/c3ee24243a.
[226] J. Gong and F. You, "Global Optimization for Sustainable Design and Synthesis
of Algae Processing Network for CO2 Mitigation and Biofuel Production Using
Life Cycle Optimization," AIChE J., vol. 60, no. 9, pp. 3195-3210, 2014, doi:
10.1002/aic.14504.
[227] J. Gong and F. You, "Sustainable design and synthesis of energy systems,"
Current Opinion in Chemical Engineering, vol. 10, pp. 77-86, 2015, doi:
10.1016/j.coche.2015.09.001.
[228] S. Bairamzadeh, M. Saidi-Mehrabad, and M. S. Pishvaee, "Modelling different
types of uncertainty in biofuel supply network design and planning: A robust
optimization approach," Renewable Energy, vol. 116, pp. 500-517, 2018, doi:
https://doi.org/10.1016/j.renene.2017.09.020.
[229] C. Caldeira, O. Swei, F. Freire, L. C. Dias, E. A. Olivetti, and R. Kirchain,
"Planning strategies to address operational and price uncertainty in biodiesel
production," Applied Energy, vol. 238, pp. 1573-1581, 2019, doi:
10.1016/j.apenergy.2019.01.195.
[230] K. Tong, J. Gong, D. Yue, and F. You, "Stochastic Programming Approach to
Optimal Design and Operations of Integrated Hydrocarbon Biofuel and
Petroleum Supply Chains," ACS Sustain. Chem. Eng., vol. 2, no. 1, pp. 49-61,
2014, doi: 10.1021/sc4002671.
238
[231] A. Osmani and J. Zhang, "Economic and environmental optimization of a large
scale sustainable dual feedstock lignocellulosic-based bioethanol supply chain
in a stochastic environment," Applied Energy, vol. 114, pp. 572-587, 2014, doi:
10.1016/j.apenergy.2013.10.024.
[232] D. Bertsimas, S. Shtern, and B. Sturt, "A Data-Driven Approach for Multi-Stage
Linear Optimization," optimization-online.org, 2019.
[233] Z. Chen, M. Sim, and P. Xiong, "Robust Stochastic Optimization: The Synergy
of Robust Optimization and Stochastic Programming," optimization-online.org,
2019.
[234] W. Xie, "On Distributionally Robust Chance Constrained Programs with
Wasserstein Distance," arXiv preprint arXiv:1806.07418, 2018.
[235] E. Delage and Y. Ye, "Distributionally Robust Optimization Under Moment
Uncertainty with Application to Data-Driven Problems," Oper. Res., vol. 58, no.
3, pp. 595-612, 2010, doi: 10.1287/opre.1090.0741.
[236] K. L. Hoffman, "A method for globally minimizing concave functions over
convex sets," Math. Program., journal article vol. 20, no. 1, pp. 22-32, 1981, doi:
10.1007/bf01589330.
[237] S. S. Toor, L. Rosendahl, and A. Rudolf, "Hydrothermal liquefaction of biomass:
A review of subcritical water technologies," Energy, vol. 36, no. 5, pp. 2328-
2342, 2011, doi: https://doi.org/10.1016/j.energy.2011.03.013.
[238] D. J. Garcia and F. You, "Systems engineering opportunities for agricultural and
organic waste management in the food-water-energy nexus," Current Opinion
in Chemical Engineering, vol. 18, pp. 23-31, 2017, doi:
10.1016/j.coche.2017.08.004.
[239] P. Morone, A. Koutinas, N. Gathergood, M. Arshadi, and A. Matharu, "Food
waste: Challenges and opportunities for enhancing the emerging bio-economy,"
Journal of Cleaner Production, vol. 221, pp. 10-16, 2019, doi:
https://doi.org/10.1016/j.jclepro.2019.02.258.
[240] J. Nicoletti, C. Ning, and F. You, "Incorporating Agricultural Waste-to-Energy
Pathways into Biomass Product and Process Network through Data-Driven
Nonlinear Adaptive Robust Optimization," Energy, vol. 180, pp. 556-571, 2019.
[241] R. Hakawati, B. M. Smyth, G. McCullough, F. De Rosa, and D. Rooney, "What
is the most energy efficient route for biogas utilization: Heat, electricity or
transport?," Applied Energy, vol. 206, pp. 1076-1087, 2017, doi:
10.1016/j.apenergy.2017.08.068.
[242] Y. Y. Jin, T. Chen, X. Chen, and Z. X. Yu, "Life-cycle assessment of energy
consumption and environmental impact of an integrated food waste-based
biogas plant," Applied Energy, vol. 151, pp. 227-236, 2015, doi:
10.1016/j.apenergy.2015.04.058.
[243] R. Campuzano and S. González-Martínez, "Characteristics of the organic
fraction of municipal solid waste and methane production: A review," Waste
Manage. (Oxford), vol. 54, pp. 3-12, 2016, doi:
https://doi.org/10.1016/j.wasman.2016.05.016.
[244] C. Villani, Optimal transport: old and new. Springer Science & Business Media,
2008.
239
[245] C. Zhao and Y. Guan, "Data-driven risk-averse stochastic optimization with
Wasserstein metric," Oper. Res. Lett., vol. 46, no. 2, pp. 262-267, 2018, doi:
https://doi.org/10.1016/j.orl.2018.01.011.
[246] T. A. Reddy, Applied data analysis and modeling for energy engineers and
scientists. Springer Science & Business Media, 2011.
[247] M. Rizwan, J. H. Lee, and R. Gani, "Optimal design of microalgae-based
biorefinery: Economics, opportunities and challenges," Applied Energy, vol. 150,
pp. 69-79, 2015, doi: https://doi.org/10.1016/j.apenergy.2015.04.018.
[248] Z. Zheng et al., "Effect of dairy manure to switchgrass co-digestion ratio on
methane production and the bacterial community in batch anaerobic digestion,"
Applied Energy, vol. 151, pp. 249-257, 2015, doi:
[249] C. G. Gutierrez-Arriaga, M. Serna-Gonzalez, J. M. Ponce-Ortega, and M. M. El-
Halwagi, "Sustainable Integration of Algal Biodiesel Production with Steam
Electric Power Plants for Greenhouse Gas Mitigation," ACS Sustain. Chem. Eng.,
vol. 2, no. 6, pp. 1388-1403, 2014, doi: 10.1021/sc400436a.
[250] J. Seader, W. D. Seider, and D. R. Lewin, Product and process design principles:
synthesis, analysis and evaluation. Wiley, 2004.
[251] D. Bertsimas, M. Sim, and M. Zhang, "Adaptive Distributionally Robust
Optimization," Manage. Sci., vol. 65, no. 2, pp. 604-618, 2019, doi:
10.1287/mnsc.2017.2952.
[252] D. J. Garcia and F. You, "Network-Based Life Cycle Optimization of the Net
Atmospheric CO2-eq Ratio (NACR) of Fuels and Chemicals Production from
Biomass," ACS Sustain. Chem. Eng., vol. 3, no. 8, pp. 1732-1744, 2015, doi:
10.1021/acssuschemeng.5b00262.
[253] "https://www.indexmundi.com/commodities/," 2019.
[254] E. Rosenthal, GAMS-A user’s guide. Washington, DC: GAMS Development
Corporation, 2008.
[255] J. Remon, P. Arcelus-Arrillaga, L. Garcia, and J. Arauzo, "Simultaneous
production of gaseous and liquid biofuels from the synergetic co-valorisation of
bio-oil and crude glycerol in supercritical water," Applied Energy, vol. 228, pp.
2275-2287, 2018, doi: 10.1016/j.apenergy.2018.07.093.
[256] I. Ullah Khan et al., "Biogas as a renewable energy fuel – A review of biogas
upgrading, utilisation and storage," Energy Convers. Manage., vol. 150, pp. 277-
294, 2017, doi: https://doi.org/10.1016/j.enconman.2017.08.035.
[257] A. Shapiro, "On Duality Theory of Conic Linear Problems," in Semi-Infinite
Programming: Recent Advances, M. Á. Goberna and M. A. López Eds. Boston,
MA: Springer US, 2001, pp. 135-165.
[258] A. J. Conejo and L. Baringo, Power system operations. Springer, 2018.
[259] X. Xia and A. M. Elaiw, "Optimal dynamic economic dispatch of generation: A
review," Electric Power Systems Research, vol. 80, no. 8, pp. 975-986, 2010,
doi: https://doi.org/10.1016/j.epsr.2009.12.012.
[260] J. Hetzer, D. C. Yu, and K. Bhattarai, "An Economic Dispatch Model
Incorporating Wind Power," IEEE Transactions on Energy Conversion, vol. 23,
no. 2, pp. 603-611, 2008, doi: 10.1109/TEC.2007.914171.
240
[261] A. Alqurashi, A. H. Etemadi, and A. Khodaei, "Treatment of uncertainty for next
generation power systems: State-of-the-art in stochastic optimization," Electric
Power Systems Research, vol. 141, pp. 233-245, 2016, doi:
https://doi.org/10.1016/j.epsr.2016.08.009.
[262] L. Á and X. A. Sun, "Adaptive Robust Optimization With Dynamic Uncertainty
Sets for Multi-Period Economic Dispatch Under Significant Wind," IEEE Trans.
Power Syst., vol. 30, no. 4, pp. 1702-1713, 2015, doi:
10.1109/TPWRS.2014.2357714.
[263] J. Zhao, T. Zheng, and E. Litvinov, "Variable Resource Dispatch Through Do-
Not-Exceed Limit," IEEE Trans. Power Syst., vol. 30, no. 2, pp. 820-828, 2015,
doi: 10.1109/TPWRS.2014.2333367.
[264] R. A. Jabr, S. Karaki, and J. A. Korbane, "Robust Multi-Period OPF With
Storage and Renewables," IEEE Trans. Power Syst., vol. 30, no. 5, pp. 2790-
2799, 2015, doi: 10.1109/TPWRS.2014.2365835.
[265] H. Qiu, B. Zhao, W. Gu, and R. Bo, "Bi-Level Two-Stage Robust Optimal
Scheduling for AC/DC Hybrid Multi-Microgrids," IEEE Transactions on Smart
Grid, vol. 9, no. 5, pp. 5455-5466, 2018, doi: 10.1109/TSG.2018.2806973.
[266] W. Wu, J. Chen, B. Zhang, and H. Sun, "A Robust Wind Power Optimization
Method for Look-Ahead Power Dispatch," IEEE Transactions on Sustainable
Energy, vol. 5, no. 2, pp. 507-515, 2014, doi: 10.1109/TSTE.2013.2294467.
[267] Z. Li, W. Wu, B. Zhang, and B. Wang, "Adjustable Robust Real-Time Power
Dispatch With Large-Scale Wind Power Integration," IEEE Transactions on
Sustainable Energy, vol. 6, no. 2, pp. 357-368, 2015, doi:
10.1109/TSTE.2014.2377752.
[268] Z. Lin, H. Chen, Q. Wu, W. Li, M. Li, and T. Ji, "Mean-tracking model based
stochastic economic dispatch for power systems with high penetration of wind
power," Energy, vol. 193, p. 116826, 2020, doi:
https://doi.org/10.1016/j.energy.2019.116826.
[269] R. Lu, T. Ding, B. Qin, J. Ma, X. Fang, and Z. Y. Dong, "Multi-Stage Stochastic
Programming to Joint Economic Dispatch for Energy and Reserve with
Uncertain Renewable Energy," IEEE Transactions on Sustainable Energy, pp.
1-1, 2019, doi: 10.1109/TSTE.2019.2918269.
[270] F. Qiu and J. Wang, "Chance-Constrained Transmission Switching With
Guaranteed Wind Power Utilization," IEEE Trans. Power Syst., vol. 30, no. 3,
pp. 1270-1278, 2015, doi: 10.1109/TPWRS.2014.2346987.
[271] Z. Zhang, Y. Sun, D. W. Gao, J. Lin, and L. Cheng, "A Versatile Probability
Distribution Model for Wind Power Forecast Errors and Its Application in
Economic Dispatch," IEEE Trans. Power Syst., vol. 28, no. 3, pp. 3114-3125,
2013, doi: 10.1109/TPWRS.2013.2249596.
[272] C. Tang et al., "Look-Ahead Economic Dispatch With Adjustable Confidence
Interval Based on a Truncated Versatile Distribution Model for Wind Power,"
IEEE Trans. Power Syst., vol. 33, no. 2, pp. 1755-1767, 2018, doi:
10.1109/TPWRS.2017.2715852.
[273] Z. Wang, C. Shen, F. Liu, X. Wu, C. Liu, and F. Gao, "Chance-Constrained
Economic Dispatch With Non-Gaussian Correlated Wind Power Uncertainty,"
241
IEEE Trans. Power Syst., vol. 32, no. 6, pp. 4880-4893, 2017, doi:
10.1109/TPWRS.2017.2672750.
[274] Y. Yang, W. Wu, B. Wang, and M. Li, "Analytical Reformulation for Stochastic
Unit Commitment Considering Wind Power Uncertainty with Gaussian Mixture
Model," IEEE Trans. Power Syst., pp. 1-1, 2019, doi:
10.1109/TPWRS.2019.2960389.
[275] B. Khorramdel, A. Zare, C. Y. Chung, and P. Gavriliadis, "A Generic Convex
Model for a Chance-Constrained Look-Ahead Economic Dispatch Problem
Incorporating an Efficient Wind Power Distribution Modeling," IEEE Trans.
10.1109/TPWRS.2019.2940288.
[276] K. Baker and A. Bernstein, "Joint Chance Constraints in AC Optimal Power
Flow: Improving Bounds Through Learning," IEEE Transactions on Smart Grid,
vol. 10, no. 6, pp. 6376-6385, 2019, doi: 10.1109/TSG.2019.2903767.
[277] M. S. Modarresi et al., "Scenario-Based Economic Dispatch With Tunable Risk
Levels in High-Renewable Power Systems," IEEE Trans. Power Syst., vol. 34,
no. 6, pp. 5103-5114, 2019, doi: 10.1109/TPWRS.2018.2874464.
[278] H. Ming, L. Xie, M. C. Campi, S. Garatti, and P. R. Kumar, "Scenario-Based
Economic Dispatch With Uncertain Demand Response," IEEE Transactions on
Smart Grid, vol. 10, no. 2, pp. 1858-1868, 2019, doi:
10.1109/TSG.2017.2778688.
[279] X. Geng and L. Xie, "Data-driven decision making in power systems with
probabilistic guarantees: Theory and applications of chance-constrained
optimization," Annual Reviews in Control, vol. 47, pp. 341-363, 2019, doi:
https://doi.org/10.1016/j.arcontrol.2019.05.005.
[280] O. Ciftci, M. Mehrtash, and A. Kargarian, "Data-Driven Nonparametric Chance-
Constrained Optimization for Microgrid Energy Management," IEEE
Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2447-2457, 2020, doi:
10.1109/TII.2019.2932078.
[281] W. Sun, M. Zamani, M. R. Hesamzadeh, and H. Zhang, "Data-Driven
Probabilistic Optimal Power Flow With Nonparametric Bayesian Modeling and
Inference," IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1077-1090,
2020, doi: 10.1109/TSG.2019.2931160.
[282] C. Ning and F. You, "Data-Driven Adaptive Robust Unit Commitment Under
Wind Power Uncertainty: A Bayesian Nonparametric Approach," IEEE Trans.
10.1109/TPWRS.2019.2891057.
[283] W. Wei, F. Liu, and S. Mei, "Distributionally Robust Co-Optimization of
Energy and Reserve Dispatch," IEEE Transactions on Sustainable Energy, vol.
7, no. 1, pp. 289-300, 2016, doi: 10.1109/TSTE.2015.2494010.
[284] Y. L. Zhang, S. Q. Shen, and J. L. Mathieu, "Distributionally Robust Chance-
Constrained Optimal Power Flow With Uncertain Renewables and Uncertain
Reserves Provided by Loads," IEEE Trans. Power Syst., vol. 32, no. 2, pp. 1378-
1388, Mar 2017, doi: 10.1109/tpwrs.2016.2572104.
242
[285] M. Shahidehpour, Y. Zhou, Z. Wei, S. Chen, Z. Li, and G. Sun, "Distributionally
Robust Co-optimization of Energy and Reserve for Combined Distribution
Networks of Power and District Heating," IEEE Trans. Power Syst., pp. 1-1,
2019, doi: 10.1109/TPWRS.2019.2954710.
[286] Z. Shi, H. Liang, S. Huang, and V. Dinavahi, "Distributionally Robust Chance-
Constrained Energy Management for Islanded Microgrids," IEEE Transactions
on Smart Grid, vol. 10, no. 2, pp. 2234-2244, 2019, doi:
10.1109/TSG.2018.2792322.
[287] X. Lu, K. W. Chan, S. Xia, B. Zhou, and X. Luo, "Security-Constrained
Multiperiod Economic Dispatch With Renewable Energy Utilizing
Distributionally Robust Optimization," IEEE Transactions on Sustainable
Energy, vol. 10, no. 2, pp. 768-779, 2019, doi: 10.1109/TSTE.2018.2847419.
[288] B. Li, R. Jiang, and J. L. Mathieu, "Distributionally Robust Chance-Constrained
Optimal Power Flow Assuming Unimodal Distributions With Misspecified
Modes," IEEE Transactions on Control of Network Systems, vol. 6, no. 3, pp.
1223-1234, 2019, doi: 10.1109/TCNS.2019.2930872.
[289] Y. Chen, W. Wei, F. Liu, and S. Mei, "Distributionally robust hydro-thermal-
wind economic dispatch," Applied Energy, vol. 173, pp. 511-519, 2016/07/01/
2016, doi: https://doi.org/10.1016/j.apenergy.2016.04.060.
[290] H. Ma, R. Jiang, and Z. Yan, "Distributionally Robust Co-Optimization of
Power Dispatch and Do-Not-Exceed Limits," IEEE Trans. Power Syst., vol. 35,
no. 2, pp. 887-897, 2020, doi: 10.1109/TPWRS.2019.2941635.
[291] W. J. Xie and S. Ahmed, "Distributionally Robust Chance Constrained Optimal
Power Flow with Renewables: A Conic Reformulation," IEEE Trans. Power
Syst., vol. 33, no. 2, pp. 1860-1867, Mar 2018, doi:
10.1109/tpwrs.2017.2725581.
[292] M. Lubin, Y. Dvorkin, and S. Backhaus, "A Robust Approach to Chance
Constrained Optimal Power Flow With Renewable Generation," IEEE Trans.
10.1109/TPWRS.2015.2499753.
[293] C. Duan, L. Jiang, W. Fang, J. Liu, and S. Liu, "Data-Driven Distributionally
Robust Energy-Reserve-Storage Dispatch," IEEE Transactions on Industrial
Informatics, vol. 14, no. 7, pp. 2826-2836, 2018, doi: 10.1109/TII.2017.2771355.
[294] H. Zhang, Z. Hu, E. Munsing, S. J. Moura, and Y. Song, "Data-Driven Chance-
Constrained Regulation Capacity Offering for Distributed Energy Resources,"
IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 2713-2725, 2019, doi:
10.1109/TSG.2018.2809046.
[295] Y. Guo, K. Baker, E. Dall'Anese, Z. Hu, and T. Summers, "Data-based
distributionally robust stochastic optimal power flow, Part I: Methodologies,"
IEEE Trans. Power Syst., pp. 1-1, 2018, doi: 10.1109/TPWRS.2018.2878385.
[296] C. Ordoudis, V. A. Nguyen, D. Kuhn, and P. Pinson, "Energy and Reserve
Dispatch with Distributionally Robust Joint Chance Constraints," 2018.
[297] Y. Chen, Q. Guo, H. Sun, Z. Li, W. Wu, and Z. Li, "A Distributionally Robust
Optimization Model for Unit Commitment Based on Kullback–Leibler
243
Divergence," IEEE Trans. Power Syst., vol. 33, no. 5, pp. 5147-5160, 2018, doi:
10.1109/TPWRS.2018.2797069.
[298] Y. Z. Chen, Y. S. Wang, D. Kirschen, and B. S. Zhang, "Model-Free Renewable
Scenario Generation Using Generative Adversarial Networks," IEEE Trans.
Power Syst., vol. 33, no. 3, pp. 3265-3275, May 2018, doi:
10.1109/tpwrs.2018.2794541.
[299] S. Zhao and F. You, "Distributionally Robust Chance Constrained Programming
with Generative Adversarial Networks (GANs)," AIChE J., vol. 66, no. 6, p.
e16963, 2020.
[300] S. Nowozin, B. Cseke, and R. Tomioka, "f-gan: Training generative neural
samplers using variational divergence minimization," in Advances in neural
information processing systems, 2016, pp. 271-279.
[301] X. Nguyen, M. J. Wainwright, and M. I. Jordan, "Estimating Divergence
Functionals and the Likelihood Ratio by Convex Risk Minimization," IEEE
Transactions on Information Theory, vol. 56, no. 11, pp. 5847-5861, 2010, doi:
10.1109/TIT.2010.2068870.
[302] I. J. Goodfellow et al., "Generative Adversarial Nets," in Advances in Neural
Information Processing Systems 27, vol. 27, Z. Ghahramani, M. Welling, C.
Cortes, N. D. Lawrence, and K. Q. Weinberger Eds., (Advances in Neural
Information Processing Systems, 2014.
[303] G. Schildbach, L. Fagiano, and M. Morari, "Randomized Solutions to Convex
Programs with Multiple Chance Constraints," SIAM J. Optim., vol. 23, no. 4, pp.
2479-2501, 2013, doi: 10.1137/120878719.
[304] C. Draxl, A. Clifton, B.-M. Hodge, and J. McCaa, "The Wind Integration
National Dataset (WIND) Toolkit," Applied Energy, vol. 151, pp. 355-366, 2015,
doi: https://doi.org/10.1016/j.apenergy.2015.03.121.
[305] G. Morales-España, "Unit commitment: computational performance, system
representation and wind uncertainty management," Comillas Pontifical
University, 2014.
[306] D. Q. Mayne, "Model predictive control: Recent developments and future
promise," Automatica, vol. 50, no. 12, pp. 2967-2986, 2014, doi:
https://doi.org/10.1016/j.automatica.2014.10.128.
[307] M. Morari and J. Lee, "Model predictive control: past, present and future,"
Comput. Chem. Eng., vol. 23, no. 4, pp. 667-682, 1999/05/01/ 1999, doi:
https://doi.org/10.1016/S0098-1354(98)00301-9.
[308] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert, "Constrained
model predictive control: Stability and optimality," Automatica, vol. 36, no. 6,
pp. 789-814, 2000, doi: https://doi.org/10.1016/S0005-1098(99)00214-9.
[309] S. J. Qin and T. A. Badgwell, "A survey of industrial model predictive control
technology," Control Engineering Practice, vol. 11, no. 7, pp. 733-764, 2003,
doi: https://doi.org/10.1016/S0967-0661(02)00186-7.
[310] J. B. Rawlings and D. Q. Mayne, Model predictive control: Theory and design.
Nob Hill Pub., 2009.
[311] A. Bemporad and M. Morari, "Robust model predictive control: A survey," in
Robustness in identification and control: Springer, 1999, pp. 207-226.
244
[312] D. Q. Mayne, M. M. Seron, and S. V. Raković, "Robust model predictive control
of constrained linear systems with bounded disturbances," Automatica, vol. 41,
no. 2, pp. 219-224, 2005/02/01/ 2005, doi:
[313] W. Langson, I. Chryssochoos, S. V. Raković, and D. Q. Mayne, "Robust model
predictive control using tubes," Automatica, vol. 40, no. 1, pp. 125-133,
2004/01/01/ 2004, doi: https://doi.org/10.1016/j.automatica.2003.08.009.
[314] L. Chisci, J. A. Rossiter, and G. Zappa, "Systems with persistent disturbances:
predictive control with restricted constraints," Automatica, vol. 37, no. 7, pp.
1019-1028, 2001, doi: https://doi.org/10.1016/S0005-1098(01)00051-6.
[315] A. Mesbah, "Stochastic Model Predictive Control: An Overview and
Perspectives for Future Research," IEEE Control Systems Magazine, vol. 36, no.
6, pp. 30-44, 2016, doi: 10.1109/MCS.2016.2602087.
[316] M. Farina, L. Giulioni, and R. Scattolini, "Stochastic linear Model Predictive
Control with chance constraints – A review," Journal of Process Control, vol.
44, pp. 53-67, 2016, doi: https://doi.org/10.1016/j.jprocont.2016.03.005.
[317] D. Mayne, "Robust and stochastic model predictive control: Are we going in the
right direction?," Annual Reviews in Control, vol. 41, pp. 184-192, 2016, doi:
https://doi.org/10.1016/j.arcontrol.2016.04.006.
[318] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allgöwer, "Constraint-Tightening
and Stability in Stochastic Model Predictive Control," IEEE Trans. Autom.
Control., vol. 62, no. 7, pp. 3165-3177, 2017, doi: 10.1109/TAC.2016.2625048.
[319] D. Muñoz-Carpintero, G. Hu, and C. J. Spanos, "Stochastic Model Predictive
Control with adaptive constraint tightening for non-conservative chance
constraints satisfaction," Automatica, vol. 96, pp. 32-39, 2018, doi:
[320] B. Kouvaritakis, M. Cannon, S. V. Raković, and Q. Cheng, "Explicit use of
probabilistic distributions in linear predictive control," Automatica, vol. 46, no.
10, pp. 1719-1724, 2010, doi: https://doi.org/10.1016/j.automatica.2010.06.034.
[321] M. Cannon, B. Kouvaritakis, S. V. Raković, and Q. Cheng, "Stochastic Tubes
in Model Predictive Control With Probabilistic Constraints," IEEE Trans.
Autom. Control., vol. 56, no. 1, pp. 194-200, 2011, doi:
10.1109/TAC.2010.2086553.
[322] L. Dai, Y. Xia, Y. Gao, B. Kouvaritakis, and M. Cannon, "Cooperative
distributed stochastic MPC for systems with state estimation and coupled
probabilistic constraints," Automatica, vol. 61, pp. 89-96, 2015, doi:
[323] M. Korda, R. Gondhalekar, F. Oldewurtel, and C. N. Jones, "Stochastic MPC
Framework for Controlling the Average Constraint Violation," IEEE Trans.
10.1109/TAC.2014.2310066.
[324] D. Chatterjee and J. Lygeros, "On Stability and Performance of Stochastic
Predictive Control Techniques," IEEE Trans. Autom. Control., vol. 60, no. 2, pp.
509-514, 2015, doi: 10.1109/TAC.2014.2335274.
245
[325] D. Chatterjee, P. Hokayem, and J. Lygeros, "Stochastic Receding Horizon
Control With Bounded Control Inputs: A Vector Space Approach," IEEE Trans.
10.1109/TAC.2011.2159422.
[326] G. Schildbach, L. Fagiano, C. Frei, and M. Morari, "The scenario approach for
Stochastic Model Predictive Control with bounds on closed-loop constraint
violations," Automatica, vol. 50, no. 12, pp. 3009-3018, 2014, doi:
[327] G. C. Calafiore and L. Fagiano, "Stochastic model predictive control of LPV
systems via scenario optimization," Automatica, vol. 49, no. 6, pp. 1861-1866,
2013, doi: https://doi.org/10.1016/j.automatica.2013.02.060.
[328] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allgöwer, "Stochastic MPC with
offline uncertainty sampling," Automatica, vol. 81, pp. 176-183, 2017, doi:
[329] M. Farina, L. Giulioni, L. Magni, and R. Scattolini, "An approach to output-
feedback MPC of stochastic linear discrete-time systems," Automatica, vol. 55,
pp. 140-149, 2015, doi: https://doi.org/10.1016/j.automatica.2015.02.039.
[330] M. Farina and R. Scattolini, "Model predictive control of linear systems with
multiplicative unbounded uncertainty and chance constraints," Automatica, vol.
70, pp. 258-265, 2016, doi: https://doi.org/10.1016/j.automatica.2016.04.008.
[331] J. A. Paulson and A. Mesbah, "An efficient method for stochastic optimal
control with joint chance constraints for nonlinear systems," International
Journal of Robust and Nonlinear Control, 2017.
[332] P. Sopasakis, D. Herceg, A. Bemporad, and P. Patrinos, "Risk-averse model
predictive control," Automatica, vol. 100, pp. 281-288, 2019, doi:
[333] S. Singh, Y. Chow, A. Majumdar, and M. Pavone, "A Framework for Time-
Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms,"
IEEE Trans. Autom. Control., vol. 64, no. 7, pp. 2905-2912, 2019, doi:
10.1109/TAC.2018.2874704.
[334] I. Yang, "A dynamic game approach to distributionally robust safety
specifications for stochastic systems," Automatica, vol. 94, pp. 94-101, Aug
2018, doi: 10.1016/j.automatica.2018.04.022.
[335] A. Aswani, H. Gonzalez, S. S. Sastry, and C. Tomlin, "Provably safe and robust
learning-based model predictive control," Automatica, vol. 49, no. 5, pp. 1216-
1226, 2013, doi: https://doi.org/10.1016/j.automatica.2013.02.003.
[336] U. Rosolia and F. Borrelli, "Learning Model Predictive Control for Iterative
Tasks. A Data-Driven Control Framework," IEEE Trans. Autom. Control., vol.
63, no. 7, pp. 1883-1896, 2018, doi: 10.1109/TAC.2017.2753460.
[337] U. Rosolia, X. Zhang, and F. Borrelli, "Data-Driven Predictive Control for
Autonomous Systems," Annual Review of Control, Robotics, and Autonomous
Systems, vol. 1, no. 1, pp. 259-286, 2018, doi: 10.1146/annurev-control-060117-
105215.
[338] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, "Learning-Based
Model Predictive Control for Safe Exploration," in 2018 IEEE Conference on
246
Decision and Control (CDC), 2018, pp. 6059-6066, doi:
10.1109/CDC.2018.8619572.
[339] D. Limon, J. Calliess, and J. M. Maciejowski, "Learning-based Nonlinear Model
Predictive Control," IFAC-PapersOnLine, vol. 50, no. 1, pp. 7769-7776, 2017,
doi: https://doi.org/10.1016/j.ifacol.2017.08.1050.
[340] E. Terzi, L. Fagiano, M. Farina, and R. Scattolini, "Learning-based predictive
control for linear systems: A unitary approach," Automatica, vol. 108, p. 108473,
2019, doi: https://doi.org/10.1016/j.automatica.2019.06.025.
[341] T. A. N. Heirung, B. E. Ydstie, and B. Foss, "Dual adaptive model predictive
control," Automatica, vol. 80, pp. 340-348, 2017, doi:
[342] N. M. Filatov and H. Unbehauen, "Survey of adaptive dual control methods,"
IEE Proceedings Control Theory and Applications, vol. 147, no. 1, pp. 118-128,
2000.
[343] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, "Learning-Based
Model Predictive Control: Toward Safe Learning in Control," Annual Review of
Control, Robotics, and Autonomous Systems, vol. 3, no. 1, pp. 269-296, 2020,
doi: 10.1146/annurev-control-090419-075625.
[344] L. Hewing, J. Kabzan, and M. N. Zeilinger, "Cautious model predictive control
using Gaussian process regression," arXiv preprint arXiv:1705.10702, 2017.
[345] R. Soloperto, M. A. Müller, S. Trimpe, and F. Allgöwer, "Learning-Based
Robust Model Predictive Control with State-Dependent Uncertainty," IFAC-
PapersOnLine, vol. 51, no. 20, pp. 442-447, 2018, doi:
https://doi.org/10.1016/j.ifacol.2018.11.052.
[346] Z. Wu, D. Rincon, and P. D. Christofides, "Real-Time Adaptive Machine-
Learning-Based Predictive Control of Nonlinear Processes," Ind. Eng. Chem.
Res., 2019, doi: 10.1021/acs.iecr.9b03055.
[347] R. Gomes, M. Welling, and P. Perona, "Incremental learning of nonparametric
Bayesian mixture models," in 2008 IEEE Conference on Computer Vision and
Pattern Recognition, 2008, pp. 1-8, doi: 10.1109/CVPR.2008.4587370.
[348] J. A. Paulson, T. L. M. Santos, and A. Mesbah, "Mixed stochastic-deterministic
tube MPC for offset-free tracking in the presence of plant-model mismatch,"
Journal of Process Control, 2018, doi:
https://doi.org/10.1016/j.jprocont.2018.04.010.
[349] M. Korda, R. Gondhalekar, J. Cigler, and F. Oldewurtel, "Strongly feasible
stochastic model predictive control," in 2011 50th IEEE Conference on Decision
and Control and European Control Conference, 2011, pp. 1245-1251, doi:
10.1109/CDC.2011.6161250.
[350] S. Samuelson and I. Yang, "Safety-Aware Optimal Control of Stochastic
Systems Using Conditional Value-at-Risk," in 2018 Annual American Control
Conference (ACC), 2018, pp. 6285-6290, doi: 10.23919/ACC.2018.8430957.
[351] J. Sethuraman, "A CONSTRUCTIVE DEFINITION OF DIRICHLET
PRIORS," Statistica Sinica, vol. 4, no. 2, pp. 639-650, Jul 1994. [Online].
Available: <Go to ISI>://WOS:A1994PK25300016.
247
[352] T. Campbell and J. P. How, "Bayesian Nonparametric Set Construction for
Robust Optimization," in 2015 American Control Conference, (Proceedings of
the American Control Conference, 2015, pp. 4216-4221.
[353] D. M. Blei and M. I. Jordan, "Variational Inference for Dirichlet Process
Mixtures," Bayesian Anal., vol. 1, no. 1, pp. 121-143, 2006, doi: 10.1214/06-
ba104.
[354] K. Kurihara, M. Welling, and N. Vlassis, "Accelerated variational Dirichlet
process mixtures," in Advances in neural information processing systems, 2007,
pp. 761-768.
[355] F. D. Brunner, W. Heemels, and F. Allgöwer, "Robust event-triggered MPC
with guaranteed asymptotic bound and average sampling rate," IEEE
Transactions on Automatic Control, vol. 62, no. 11, pp. 5694-5709, 2017.
[356] M. Cannon and B. Kouvaritakis, "Model Predictive Control—Classical, Robust
and Stochastic," ed: Springer: New York, NY, USA, 2016.
[357] F. Blanchini, "Set invariance in control," Automatica, vol. 35, no. 11, pp. 1747-
1767, 1999/11/01/ 1999, doi: https://doi.org/10.1016/S0005-1098(99)00113-2.
[358] I. Kolmanovsky and E. G. Gilbert, "Theory and computation of disturbance
invariant sets for discrete-time linear systems," Mathematical problems in
engineering, vol. 4, no. 4, pp. 317-367, 1998.
[359] J. Lofberg, "YALMIP: A toolbox for modeling and optimization in MATLAB,"
in 2004 IEEE international conference on robotics and automation (IEEE Cat.
No. 04CH37508), 2004: IEEE, pp. 284-289.
[360] M. Herceg, M. Kvasnica, C. N. Jones, and M. Morari, "Multi-parametric
toolbox 3.0," in 2013 European control conference (ECC), 2013: IEEE, pp. 502-
510.
[361] S. V. Rakovic, E. C. Kerrigan, K. I. Kouramas, and D. Q. Mayne, "Invariant
approximations of the minimal robust positively invariant set," IEEE
Transactions on automatic control, vol. 50, no. 3, pp. 406-410, 2005.
[362] A. Shapiro and A. Kleywegt, "Minimax analysis of stochastic problems,"
Optimization Methods and Software, vol. 17, no. 3, pp. 523-542, 2002.
[363] G. A. Hanasusanto, D. Kuhn, S. W. Wallace, and S. Zymler, "Distributionally
robust multi-item newsvendor problems with multimodal demand
distributions," Math. Program., vol. 152, no. 1-2, pp. 1-32, 2015.
[364] X. J. Zhang, M. Kamgarpour, A. Georghiou, P. Goulart, and J. Lygeros, "Robust
optimal control with adjustable uncertainty sets," Automatica, vol. 75, pp. 249-
259, Jan 2017, doi: 10.1016/j.automatica.2016.09.016.
[365] I. R. Petersen and R. Tempo, "Robust control of uncertain systems: Classical
results and recent developments," Automatica, vol. 50, no. 5, pp. 1315-1335,
May 2014, doi: 10.1016/j.automatica.2014.02.042.
[366] C. Z. Wu, K. L. Teo, and S. Y. Wu, "Min-max optimal control of linear systems
with uncertainty and terminal state constraints," Automatica, vol. 49, no. 6, pp.
1809-1815, Jun 2013, doi: 10.1016/j.automatica.2013.02.052.
[367] M. E. Villanueva, R. Quirynen, M. Diehl, B. Chachuat, and B. Houska, "Robust
MPC via min-max differential inequalities," Automatica, vol. 77, pp. 311-321,
Mar 2017, doi: 10.1016/j.automatica.2016.11.022.
248
[368] D. Bertsimas and M. Sim, "The price of robustness," Oper. Res., vol. 52, no. 1,
pp. 35-53, Jan-Feb 2004, doi: 10.1287/opre.1030.0065.
[369] C. Ning and F. You, "Data-driven adaptive nested robust optimization: General
modeling framework and efficient computational algorithm for decision making
under uncertainty," AIChE J., vol. 63, no. 9, pp. 3790-3817, 2017a, doi:
10.1002/aic.15717.
[370] İ. Yanıkoğlu, B. L. Gorissen, and D. d. Hertog, "A Survey of Adjustable Robust
Optimization," Eur. J. Oper. Res., 2018/09/06/ 2018, doi:
10.1016/j.ejor.2018.08.031.
[371] D. Bertsimas and I. Dunning, "Multistage Robust Mixed-Integer Optimization
with Adaptive Partitions," Oper. Res., vol. 64, no. 4, pp. 980-998, 2016, doi:
10.1287/opre.2016.1515.
[372] G. C. Calafiore, "Multi-period portfolio optimization with linear control
policies," Automatica, vol. 44, no. 10, pp. 2463-2473, Oct 2008, doi:
10.1016/j.automatica.2008.02.007.
[373] H. Bannister, B. Goldys, S. Penev, and W. Wu, "Multiperiod mean-standard-
deviation time consistent portfolio selection," Automatica, vol. 73, pp. 15-26,
Nov 2016, doi: 10.1016/j.automatica.2016.06.021.
[374] G. C. Calafiore, "Direct data-driven portfolio optimization with guaranteed
shortfall probability," Automatica, vol. 49, no. 2, pp. 370-380, Feb 2013, doi:
10.1016/j.automatica.2012.11.012.
[375] F. Oldewurtel, R. Gondhalekar, C. N. Jones, M. Morari, and Ieee, "Blocking
Parameterizations for Improving the Computational Tractability of Affine
Disturbance Feedback MPC Problems," in Proceedings of the 48th IEEE
Conference on Decision and Control, 2009 Held Jointly with the 2009 28th
Chinese Control Conference, (IEEE Conference on Decision and Control, 2009,
pp. 7381-7386.
[376] P. J. Goulart, E. C. Kerrigan, and J. A. Maciejowski, "Optimization over state
feedback policies for robust control with constraints," Automatica, vol. 42, no.
4, pp. 523-533, Apr 2006, doi: 10.1016/j.automatica.2005.08.023.
[377] K. Postek and D. den Hertog, "Multistage Adjustable Robust Mixed-Integer
Optimization via Iterative Splitting of the Uncertainty Set," INFORMS J.
Comput., vol. 28, no. 3, pp. 553-574, Sum 2016, doi: 10.1287/ijoc.2016.0696.
[378] D. Bertsimas and F. de Ruiter, "Duality in Two-Stage Adaptive Linear
Optimization: Faster Computation and Stronger Bounds," INFORMS J. Comput.,
vol. 28, no. 3, pp. 500-511, Sum 2016, doi: 10.1287/ijoc.2016.0689.
[379] G. C. Calafiore, "An affine control method for optimal dynamic asset allocation
with transaction costs," SIAM Journal on Control and Optimization, vol. 48, no.
4, pp. 2254-2274, 2009, doi: 10.1137/080723776.
[380] A. Lorca, X. A. Sun, E. Litvinov, and T. Zheng, "Multistage adaptive robust
optimization for the unit commitment problem," Oper. Res., vol. 64, no. 1, pp.
32-51, 2016.
[381] D. Bertsimas and A. Georghiou, "Design of near optimal decision rules in
multistage adaptive mixed-integer optimization," Oper. Res., vol. 63, no. 3, pp.
610-627, 2015, doi: 10.1287/opre.2015.1365.
249
[382] D. Bertsimas and V. Goyal, "On the power and limitations of affine policies in
two-stage adaptive optimization," Math. Program., journal article vol. 134, no.
2, pp. 491-531, 2012, doi: 10.1007/s10107-011-0444-4.
[383] D. Bertsimas, D. A. Iancu, and P. A. Parrilo, "A Hierarchy of Near-Optimal
Policies for Multistage Adaptive Optimization," IEEE Trans. Autom. Control.,
vol. 56, no. 12, pp. 2803-2818, Dec 2011, doi: 10.1109/tac.2011.2162878.
[384] D. Bertsimas and C. Caramanis, "Finite Adaptability in Multistage Linear
Optimization," IEEE Trans. Autom. Control., vol. 55, no. 12, pp. 2751-2766,
2010, doi: 10.1109/TAC.2010.2049764.
[385] G. A. Hanasusanto, D. Kuhn, and W. Wiesemann, "K-Adaptability in Two-
Stage Robust Binary Programming," Oper. Res., vol. 63, no. 4, pp. 877-891,
2015, doi: 10.1287/opre.2015.1392.
[386] A. Ardestani-Jaafari and E. Delage, "Linearized robust counterparts of two-stage
robust optimization problem with applications in operations management,"
Manuscript, HEC Montreal, 2016.
[387] G. Xu and S. Burer, "A copositive approach for two-stage adjustable robust
optimization with uncertain right-hand sides," Comput. Optim. Appl., journal
article vol. 70, no. 1, pp. 33-59, December 27 2018, doi: 10.1007/s10589-017-
9974-x.
[388] A. Takeda, S. Taguchi, and R. H. Tütüncü, "Adjustable Robust Optimization
Models for a Nonlinear Two-Period System," J. Optim. Theory Appl., journal
article vol. 136, no. 2, pp. 275-295, 2008, doi: 10.1007/s10957-007-9288-8.
[389] A. Thiele, T. Terry, and M. Epelman, "Robust linear optimization with
recourse," Tech. Rep., pp. 4-37, 2009.
[390] A. Georghiou, A. Tsoukalas, and W. Wiesemann, "A Primal-Dual Lifting
Scheme for Two-Stage Robust Optimization," Optimization Online, 2017.
[391] M. Bodur and J. Luedtke, "Two-stage Linear Decision Rules for Multi-stage
Stochastic Programming," arXiv preprint arXiv:1701.04102, 2017.
[392] J. Zou, S. Ahmed, and X. A. Sun, "Stochastic dual dynamic integer
programming," Math. Program., journal article March 02 2018, doi:
10.1007/s10107-018-1249-5.
[393] C. Ning and F. You, " A Transformation-Proximal Bundle Algorithm for
Solving Multistage Adaptive Robust Optimization Problems," in 2018 IEEE
57th Conference on Decision and Control (CDC), Miami Beach, FL, USA 17-
19 Dec. 2018 2018, pp. 2439-2444.
[394] J.-B. Hiriart-Urruty and C. Lemaréchal, Convex analysis and minimization
algorithms I: Fundamentals. Springer science & business media, 2013.
[395] C. Lemarechal and C. Sagastizabal, "Practical aspects of the Moreau-Yosida
regularization: Theoretical preliminaries," SIAM J. Optim., vol. 7, no. 2, pp. 367-
385, May 1997, doi: 10.1137/s1052623494267127.
[396] X. Chen and Y. Zhang, "Uncertain Linear Programs: Extended Affinely
Adjustable Robust Counterparts," Oper. Res., vol. 57, no. 6, pp. 1469-1482,
Nov-Dec 2009, doi: 10.1287/opre.1080.0605.
250
[397] K. C. Kiwiel, "An Inexact Bundle Approach to Cutting-Stock Problems,"
INFORMS J. Comput., vol. 22, no. 1, pp. 131-143, Win 2010, doi:
10.1287/ijoc.1090.0326.
[398] W. van Ackooij, N. Lebbe, and J. Malick, "Regularized decomposition of large
scale block-structured robust optimization problems," Comput. Manag. Sci., vol.
14, no. 3, pp. 393-421, 2017, doi: 10.1007/s10287-017-0281-x.
[399] A. Ruszczynski and A. Swietanowski, "Accelerating the regularized
decomposition method for two stage stochastic linear problems," Eur. J. Oper.
Res., vol. 101, no. 2, pp. 328-342, Sep 1997, doi: 10.1016/s0377-
2217(96)00401-8.
[400] K. C. Kiwiel, "A Proximal Bundle Method with Approximate Subgradient
Linearizations," SIAM J. Optim., vol. 16, no. 4, pp. 1007-1023, 2006, doi:
10.1137/040603929.
[401] Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization. Princeton
University Press, 2009.
[402] L. A. Wolsey, Integer programming. Wiley, 1998.
[403] A. Belloni, "Lecture Notes for IAP 2005 Course Introduction to Bundle
Methods," Operation Research Center, MIT, Version of February, 2005.
[404] M. J. Hadjiyiannis, P. J. Goulart, and D. Kuhn, "A scenario approach for
estimating the suboptimality of linear decision rules in two-stage robust
optimization," in 2011 50th IEEE Conference on Decision and Control and
European Control Conference, 12-15 Dec. 2011 2011, pp. 7386-7391, doi:
10.1109/CDC.2011.6161342.
[405] A. Ardestani-Jaafari and E. Delage, "The Value of Flexibility in Robust
Location-Transportation Problems," Transp. Res., vol. 52, no. 1, pp. 189-209,
Jan-Feb 2018, doi: 10.1287/trsc.2016.0728.
[406] A. Ben-Tal, B. Golany, and S. Shtern, "Robust multi-echelon multi-period
inventory control," Eur. J. Oper. Res., vol. 199, no. 3, pp. 922-935, Dec 2009,
doi: 10.1016/j.ejor.2009.01.058.
[407] D. Bertsimas and A. Thiele, "A robust optimization approach to inventory
theory," Oper. Res., vol. 54, no. 1, pp. 150-168, Jan-Feb 2006, doi:
10.1287/opre.1050.0238.
[408] J. D. Schwartz, W. L. Wang, and D. E. Rivera, "Simulation-based optimization
of process control policies for inventory management in supply chains,"
Automatica, vol. 42, no. 8, pp. 1311-1320, Aug 2006, doi:
10.1016/j.automatica.2006.03.019.
[409] A. Georghiou, A. Tsoukalas, and W. Wiesemann, "Robust Dual Dynamic
Programming," Available on Optimization Online, 2016.
[410] C.-T. See and M. Sim, "Robust Approximation to Multiperiod Inventory
Management," Oper. Res., vol. 58, no. 3, pp. 583-594, 2010, doi:
10.1287/opre.1090.0746.
[411] F. Maggioni, M. Bertocchi, F. Dabbene, and R. Tempo, "Sampling methods for
multistage robust convex optimization problems," arXiv preprint
arXiv:1611.00980, 2016.
251
[412] F. You and I. E. Grossmann, "Stochastic inventory management for tactical
process planning under uncertainties: MINLP models and algorithms," AIChE
J., vol. 57, no. 5, pp. 1250-1277, 2011, doi: 10.1002/aic.12338.
[413] C. Ning and F. You, "A Data-Driven Multistage Adaptive Robust Optimization
Framework for Planning and Scheduling under Uncertainty," AIChE J., vol. 63,
no. 10, pp. 4343–4369, 2017, doi: 10.1002/aic.15792.
[414] D. A. Van Dyk and X.-L. Meng, "The art of data augmentation," Journal of
Computational and Graphical Statistics, vol. 10, no. 1, pp. 1-50, 2001.
[415] S. Hauberg, O. Freifeld, A. B. L. Larsen, J. Fisher, and L. Hansen, "Dreaming
more data: Class-dependent distributions over diffeomorphisms for learned data
augmentation," in Artificial Intelligence and Statistics, 2016, pp. 342-350.
252

Ning Cornellgrad 0058F 12204

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ning Cornellgrad 0058F 12204

Uploaded by

Copyright:

Available Formats

DATA-DRIVEN OPTIMIZATION UNDER UNCERTAINTY IN THE ERA OF BIG

DATA AND DEEP LEARNING: GENERAL FRAMEWORKS, ALGORITHMS,

Presented to the Faculty of the Graduate School

In Partial Fulfillment of the Requirements for the Degree of

DATA AND DEEP LEARNING: GENERAL FRAMEWORKS, ALGORITHMS,

Chao Ning, Ph. D.

This dissertation deals with the development of fundamental data-driven optimization

driven distributionally robust optimization for hedging against distributional

uncertainties in energy systems, online learning based receding-horizon optimization

solving large-scale data-driven multistage robust optimization problems.

project, we propose a novel data-driven Wasserstein distributionally robust mixed-

price distributions is devised using the Wasserstein metric. To address computational

challenges, we propose a reformulation-based branch-and-refine algorithm. In the

joint chance constrained economic dispatch optimization framework for a high

penetration of renewable energy. By leveraging a deep generative adversarial network

(GAN), an f-divergence-based ambiguity set of wind power distributions is constructed

as a ball in the probability space centered at the distribution induced by a generator

chance constraints are equivalently reformulated as ambiguity-free chance constraints,

guarantee a predefined risk level. To facilitate large-scale applications, we further

develop a prescreening technique to increase computational and memory efficiencies

by exploiting problem structure.

receding-horizon optimization-based control. In the related project, data-driven

additive stochastic disturbance, whose probability distribution is unknown but can be

partially inferred from real-time disturbance data. The conditional value-at-risk

order moment information of each mixture component is incorporated into the

strategy based on an equivalent reformulation of distributionally robust constraints over

the proposed ambiguity set. Additionally, we establish theoretical guarantees on

adaptive robust mixed-integer linear programs. In the related project, we propose a

multi-to-two transformation theory and develop a novel transformation-proximal

original multistage robust optimization problem is shown to be transformed into an

set. To quantitatively assess solution quality, we further develop a scenario-tree-based

are fully demonstrated in inventory control and process network planning.

Chao Ning grew up in Shanxi, China. He graduated from University of Electronic

Science and Technology of China, China in 2012 with a Bachelor’s degree in

in late 2015 at Northwestern University to pursue a Ph.D. degree. In 2016 summer, he

research interests include data-driven optimization under uncertainty, learning for

operations, and renewable energy systems.

research, to have fruitful discussions with me on research ideas, and to encourage me to

and honored to be his student.

I am also thankful for my committee members, Professor Lindsay Anderson and

constructive comments and valuable feedbacks to make this dissertation possible.

some softwares, drawing high-quality figures, as well as my group presentations. He

me a lot in learning robust optimization and in providing valuable suggestions on

comments on my presentations. Dr. Daniel Garcia helped me a lot by kindly providing

critical comments on my presentations and editing manuscripts. I really learned a lot

BIOGRAPHICAL SKETCH ......................................................................................... vi

1.4 Outline of the dissertation .............................................................................. 32

DATA-DRIVEN WASSERSTEIN DISTRIBUTIONALLY ROBUST

2.2 Problem statement.......................................................................................... 42

2.3 Mathematical formulation.............................................................................. 47

2.4 Solution methodology .................................................................................... 55

2.5 Case studies.................................................................................................... 60

2.6 Summary ........................................................................................................ 75

2.7 Appendix: Derivation of Wasserstein distributionally robust counterpart .... 77

2.8 Nomenclature ................................................................................................. 80

DEEP LEARNING BASED AMBIGUOUS JOINT CHANCE CONSTRAINED

optimization .............................................................................................................. 90

3.4 Solution methodology .................................................................................... 96

3.5 Computational experiments ......................................................................... 100

3.6 Summary ...................................................................................................... 109

3.7 Nomenclature ............................................................................................... 110

ONLINE LEARNING BASED RISK-AVERSE STOCHASTIC MODEL

4.2 Problem setup and preliminaries.................................................................. 118