Download as pdf or txt
Download as pdf or txt
You are on page 1of 269

DATA-DRIVEN OPTIMIZATION UNDER UNCERTAINTY IN THE ERA OF BIG

DATA AND DEEP LEARNING: GENERAL FRAMEWORKS, ALGORITHMS,

AND APPLICATIONS

A Dissertation

Presented to the Faculty of the Graduate School

of Cornell University

In Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

by

Chao Ning

August 2020
© 2020 Chao Ning
DATA-DRIVEN OPTIMIZATION UNDER UNCERTAINTY IN THE ERA OF BIG

DATA AND DEEP LEARNING: GENERAL FRAMEWORKS, ALGORITHMS,

AND APPLICATIONS

Chao Ning, Ph. D.


Cornell University 2020

This dissertation deals with the development of fundamental data-driven optimization

under uncertainty, including its modeling frameworks, solution algorithms, and a wide

variety of applications. Specifically, three research aims are proposed, including data-

driven distributionally robust optimization for hedging against distributional

uncertainties in energy systems, online learning based receding-horizon optimization

that accommodates real-time uncertainty data, and an efficient solution algorithm for

solving large-scale data-driven multistage robust optimization problems.

There are two distinct research projects under the first research aim. In the first related

project, we propose a novel data-driven Wasserstein distributionally robust mixed-

integer nonlinear programming model for the optimal biomass with agricultural waste-
to-energy network design under uncertainty. A data-driven uncertainty set of feedstock

price distributions is devised using the Wasserstein metric. To address computational

challenges, we propose a reformulation-based branch-and-refine algorithm. In the

second related project, we develop a novel deep learning based distributionally robust

joint chance constrained economic dispatch optimization framework for a high

penetration of renewable energy. By leveraging a deep generative adversarial network

(GAN), an f-divergence-based ambiguity set of wind power distributions is constructed

as a ball in the probability space centered at the distribution induced by a generator


neural network. To facilitate its solution process, the resulting distributionally robust

chance constraints are equivalently reformulated as ambiguity-free chance constraints,

which are further tackled using a scenario approach. Additionally, we derive a priori

bound on the required number of synthetic wind power data generated by f-GAN to

guarantee a predefined risk level. To facilitate large-scale applications, we further

develop a prescreening technique to increase computational and memory efficiencies

by exploiting problem structure.

The second research aim addresses the online learning of real-time uncertainty data for

receding-horizon optimization-based control. In the related project, data-driven

stochastic model predictive control is proposed for linear time-invariant systems under

additive stochastic disturbance, whose probability distribution is unknown but can be

partially inferred from real-time disturbance data. The conditional value-at-risk

constraints on system states are required to hold for an ambiguity set of disturbance

distributions. By leveraging a Dirichlet process mixture model, the first and second-

order moment information of each mixture component is incorporated into the

ambiguity set. As more data are gathered during the runtime of controller, the ambiguity

set is updated based on real-time data. We then develop a novel constraint tightening

strategy based on an equivalent reformulation of distributionally robust constraints over

the proposed ambiguity set. Additionally, we establish theoretical guarantees on

recursive feasibility and closed-loop stability of the proposed model predictive control.

The third research aim focuses on algorithm development for data-driven multistage

adaptive robust mixed-integer linear programs. In the related project, we propose a

multi-to-two transformation theory and develop a novel transformation-proximal

bundle algorithm. By partitioning recourse decisions into state and control decisions,

affine decision rules are applied exclusively on the state decisions. In this way, the

original multistage robust optimization problem is shown to be transformed into an


equivalent two-stage robust optimization problem, which is further addressed using a

proximal bundle method. The finite convergence of the proposed solution algorithm is

guaranteed for the multistage robust optimization problem with a generic uncertainty

set. To quantitatively assess solution quality, we further develop a scenario-tree-based

lower bounding technique. The effectiveness and advantages of the proposed algorithm

are fully demonstrated in inventory control and process network planning.


BIOGRAPHICAL SKETCH

Chao Ning grew up in Shanxi, China. He graduated from University of Electronic

Science and Technology of China, China in 2012 with a Bachelor’s degree in

Automation. He received the M.S. degree in Control Science and Engineering from

Tsinghua University, China, in 2015. He joined Professor Fengqi You’s research group

in late 2015 at Northwestern University to pursue a Ph.D. degree. In 2016 summer, he

transferred to Cornell University with Professor You to continue his Ph.D. program. His

research interests include data-driven optimization under uncertainty, learning for

dynamics and control, big data analytics and machine learning, power systems

operations, and renewable energy systems.

vi
ACKNOWLEDGMENTS

First and foremost, I would like to express my sincerest thanks to my advisor, Professor

Fengqi You, for his kind help, constant support, and heartfelt encouragement.

Throughout my PhD study, he spends great effort to help me with how to do high-impact

research, to have fruitful discussions with me on research ideas, and to encourage me to

go through many research challenges. Without his kind help and constant support on

my research, I will never make this PhD. His vision on research directions, broad

knowledge, unbounded energy, and great enthusiasm about research are always ture

inspiration for me and will have a great impact on my future career. I feel greatly proud

and honored to be his student.

I am also thankful for my committee members, Professor Lindsay Anderson and

Professor Oliver Gao, for their kind guidance and help. They have offered me with very

constructive comments and valuable feedbacks to make this dissertation possible.

My thanks also go to all my colleagues and friends in the PEESE group, who has made

my Ph.D. study life wonderful and enjoyable. Dr. Dajun Yue helped me a lot with using

some softwares, drawing high-quality figures, as well as my group presentations. He

also helped me get into the background of batch process scheduling. Dr. Jian Gong was

always the go-to person when I encountered any types of questions in the lab. He helped

me a lot in learning robust optimization and in providing valuable suggestions on

manuscript writting. Dr. Jiyao Gao was always willing to help me and gave constructive

comments on my presentations. Dr. Daniel Garcia helped me a lot by kindly providing

critical comments on my presentations and editing manuscripts. I really learned a lot

from him on biomass process network. Dr. Karson Leperi helped me a lot by kindly

providing critical comments on my presentations. Dr. Chao Shang and I had great

discussions on data-driven optimization. Dr. Inkyu Lee kindly taught me how to draw

vii
fantastic figures with Powerpoint and had great disscusions with me on energy systems.

Xueyu Tian, Wei-Han Chen, Yanqiu Tao, Ning Zhao, Jiwei Yao, Jack Nicoletti, Raaj

Bora, Akshay Ajagekar, Abdulelah Alshehri, and Xiang Zhao, thank you guys for all

your kind help and the wonderful time with me. Most importantly, they are amazing

friends and I will never forget the happy hours we spent together. It was a pleasure to

discuss electric power systems with Haifeng Qiu and learned a lot from our discussions.

Many thanks to Natalia Lujan Juncua for her kind help in the unit commitment project.

Visiting scholars including Dr. Minbo Yang, Dr. Yuting Tang, Dr. Hua Zhou, Dr. Na

Luo, Dr. Zuwei Liao, Dr. Liang Zhao, Dr. Li Sun, and Dr. Runda Jia helped me with

both life and research, and provided me with valuable guindance and suggestions on

future career.

Last but not the least, I want to express my deepest gratitude to my father and mother

for their unconditional love, support, and encourgagement along the way.

viii
TABLE OF CONTENTS

BIOGRAPHICAL SKETCH ......................................................................................... vi 


ACKNOWLEDGMENTS ............................................................................................ vii 
TABLE OF CONTENTS .............................................................................................. ix 
LIST OF FIGURES ...................................................................................................... xii 
LIST OF TABLES ...................................................................................................... xvi 
INTRODUCTION .......................................................................................................... 1 
1.1  Background on optimization under uncertainty............................................... 2 

1.2  Existing methods for data-driven optimization under uncertainty ................ 10 

1.3  Various types of deep learning techniques and their potentials..................... 29 

1.4  Outline of the dissertation .............................................................................. 32 

DATA-DRIVEN WASSERSTEIN DISTRIBUTIONALLY ROBUST


OPTIMIZATION FOR BIOMASS WITH AGRICULTURAL WASTE-TO-ENERGY
NETWORK DESIGN UNDER UNCERTAINTY ...................................................... 36 
2.1  Introduction .................................................................................................... 36 

2.2  Problem statement.......................................................................................... 42 

2.3  Mathematical formulation.............................................................................. 47 

2.4  Solution methodology .................................................................................... 55 

2.5  Case studies.................................................................................................... 60 

2.6  Summary ........................................................................................................ 75 

2.7  Appendix: Derivation of Wasserstein distributionally robust counterpart .... 77 

2.8  Nomenclature ................................................................................................. 80 

DEEP LEARNING BASED AMBIGUOUS JOINT CHANCE CONSTRAINED


ECONOMIC DISPATCH UNDER SPATIAL-TEMPORAL CORRELATED WIND
POWER UNCERTAINTY ........................................................................................... 82 
3.1  Introduction .................................................................................................... 82 

ix
3.2  Mathematical formulation.............................................................................. 87 

3.3  Deep learning based ambiguous joint chance constrained economic dispatch

optimization .............................................................................................................. 90 

3.4  Solution methodology .................................................................................... 96 

3.5  Computational experiments ......................................................................... 100 

3.6  Summary ...................................................................................................... 109 

3.7  Nomenclature ............................................................................................... 110 

ONLINE LEARNING BASED RISK-AVERSE STOCHASTIC MODEL


PREDICTIVE CONTROL OF CONSTRAINED LINEAR UNCERTAIN SYSTEMS
.................................................................................................................................... 112 
4.1  Introduction .................................................................................................. 112 

4.2  Problem setup and preliminaries.................................................................. 118 

4.3  Online learning based risk-averse stochastic MPC...................................... 123 

4.4  The theoretical properties of the proposed online learning based risk-averse

stochastic MPC ....................................................................................................... 135 

4.5  Numerical examples .................................................................................... 138 

4.6  Summary ...................................................................................................... 144 

4.7  Appendix A: The derivation of control objective ........................................ 145 

4.8  Appendix B. Proof of Theorem 4.1 ............................................................. 147 

4.9  Appendix C: Proof of Proposition 4.1 ......................................................... 151 

4.10  Appendix D: Proof of Theorem 4.2 ......................................................... 152 

4.11  Appendix E: Proof of Theorem 4.3 .......................................................... 154 

x
A TRANSFORMATION-PROXIMAL BUNDLE ALGORITHM FOR SOLVING
LARGE-SCALE MULTISTAGE ADAPTIVE ROBUST OPTIMIZATION
PROBLEMS ............................................................................................................... 156 
5.1  Introduction .................................................................................................. 156 

5.2  The multi-to-two transformation scheme .................................................... 160 

5.3  Transformation-proximal bundle algorithm ................................................ 165 

5.4  The lower bounding technique .................................................................... 183 

5.5  Applications ................................................................................................. 187 

5.6  Summary ...................................................................................................... 209 

5.7  Appendix: Tables of computational results in Application 1 ...................... 210 

5.8  Nomenclature ............................................................................................... 212 

CONCLUSIONS ........................................................................................................ 215 


REFERENCES ........................................................................................................... 222 

xi
LIST OF FIGURES

Figure 1. The data-driven uncertainty model based on the Dirichlet process mixture

model. ........................................................................................................................... 22 

Figure 2. The structure of the biomass with agricultural waste-to-energy network

considered in this work. ................................................................................................ 44 

Figure 3. Illustrative figure on the biomass with agricultural waste-to-energy network

with the corresponding data-driven Wasserstein DRO model. .................................... 45 

Figure 4. The pseudocode of the proposed reformulation-based branch-and-refine

algorithm for solving (WDRO) problem. ..................................................................... 59 

Figure 5. The empirical probability distributions of total cost for (a) the stochastic

programming method, (b) the proposed data-driven Wasserstein DRO approach. ..... 64 

Figure 6. The optimal bioconversion network design determined by the stochastic

programming approach. The optimal production capacity is displayed under processes.

...................................................................................................................................... 66 

Figure 7. The optimal bioconversion network design determined by the data-driven

Wasserstein DRO approach. The optimal production capacity is displayed under

processes. ...................................................................................................................... 67 

Figure 8. Cost breakdowns determined by (a) the stochastic programming method, (b)

the proposed data-driven Wasserstein DRO approach. ................................................ 68 

Figure 9. Capital cost distributions determined by (a) the stochastic programming

method, (b) the proposed data-driven Wasserstein DRO approach. ............................ 69 

Figure 10. Sensitivity analysis of discount rate for the data-driven Wasserstein DRO

approach. ...................................................................................................................... 70 

xii
Figure 11. Sensitivity analysis of the in-sample objective value, out-of-sample average

cost, and computational time with different radii of Wasserstein balls. ...................... 71 

Figure 12. Upper and lower bounds in each iteration of the reformulation-based branch-

and-refine algorithm for global optimization of the (WDRO) problem in the case study.

...................................................................................................................................... 72 

Figure 13. Out-of-sample performance of stochastic programming and the proposed

Wasserstein DRO method based on the testing of 100 uncertainty scenarios. ............ 74 

Figure 14. The dependences of the average cost reduction and standard deviation

reduction on the number of testing samples. ................................................................ 75 

Figure 15. The schematic of the six-bus system........................................................ 101 

Figure 16. The training process of f-GAN................................................................. 102 

Figure 17. The cost breakdown of (a) the DRCCED method with moment information,

and (b) the proposed ED approach. ............................................................................ 104 

Figure 18. The power dispatch of each conventional generator determined by the

proposed ED approach. .............................................................................................. 105 

Figure 19. The spatial correlations of the ten wind farm energy outputs for (a) real wind

power data, and (b) wind power data generated by f-GAN. The color darkness of one

single cell represents the level of spatial correlation coefficient for corresponding two

wind farms. Comparison of spatial correlations can be made by focusing on the darkness

patterns of heat maps. The temporal correlations of WF10 for (c) real wind power data,

and (d) wind power data generated by f-GAN. The level of auto-correlation coefficient

is depicted by bar height. Comparison of spatial correlations can be done by considering

the height of each bar for every time lag. ................................................................... 106 

xiii
Figure 20. The empirical distribution of the wind power utilization efficiency for (a)

DRCCED with moment information, and (b) the proposed approach. ...................... 108 

Figure 21. The pseudocode of the proposed online-learning based risk-averse stochastic

MPC algorithm. .......................................................................................................... 134 

Figure 22. The average computational times of the proposed online learn-ing based

risk-averse stochastic MPC method over 2,000 time steps. ....................................... 141 

Figure 23. (a): The closed-loop trajectories of system states for the proposed online

learning based risk-averse stochastic MPC with 100 realizations of disturbance

sequences, (b): The zoom-in view of state trajectories near the upper limit of x(2). . 143 

Figure 24. The online adaption of constraint tightening parameters in the proposed MPC

for time-varying disturbance distribution in a simulation. ......................................... 144 

Figure 25. The pseudocode of transformation-proximal bundle. .............................. 172 

Figure 26. Inventory profiles determined by different control policies under the worst-

case uncertainty realization. ....................................................................................... 191 

Figure 27. Cost breakdowns determined by (a) the affine control policy, (b) the

proposed control policy. ............................................................................................. 193 

Figure 28. Lower bounds of multi-period inventory cost determined by the proposed

method and the data-driven approach......................................................................... 193 

Figure 29. The impacts of the number of uncertainty scenarios on the generated lower

bound of the original multistage ARO problem and computational time in the data-

driven approach. ......................................................................................................... 194 

Figure 30. The schematic of a small-scale process network. ..................................... 196 

xiv
Figure 31. The optimal design and planning decisions at the end of the planning horizon

determined by (a) the affine decision rule method, and (b) the transformation-proximal

bundle algorithm. ........................................................................................................ 201 

Figure 32. Optimal capacity expansion decisions over the entire planning horizon

determined by (a) the affine decision rule method, and (b) the transformation-proximal

bundle algorithm. ........................................................................................................ 202 

Figure 33. Revenues and cost break down determined by the affine decision rule method

and the transformation-proximal bundle algorithm. ................................................... 203 

Figure 34. The schematic of a large-scale petrochemical process network where

chemical names are listed. .......................................................................................... 204 

Figure 35. Revenues and cost break down at each time period determined by the affine

decision rule method (denoted by ADR in the figure) and the transformation-proximal

bundle algorithm (denoted by TPB in the figure). ..................................................... 206 

Figure 36. Optimal capacity expansion decisions over the entire planning horizon

determined the transformation-proximal bundle algorithm. ...................................... 207 

Figure 37. Optimal feedstock purchase at each time stage determined by (a) the affine

decision rule method, and (b) the transformation-proximal bundle algorithm. ......... 208 

Figure 38. Spider charts showing optimal sale quantities (kt/y) of final products at each

time stage determined by (a) the affine decision rule method, and (b) the transformation-

proximal bundle algorithm. ........................................................................................ 209 

xv
LIST OF TABLES

Table 1. Comparisons of problem sizes and computational results of the deterministic

optimization, the conventional stochastic programming method and the data-driven

Wasserstein DRO approach. ......................................................................................... 63 

Table 2. The out-of-sample performance of the deterministic optimization, the

conventional stochastic programming method and the data-driven Wasserstein DRO

approach. ...................................................................................................................... 64 

Table 3. The out-of-sample performance of the deterministic optimization, the

conventional stochastic programming method and the data-driven WDRO approach

when the number of training data N=100. .................................................................... 74 

Table 4. Comparisons of problem sizes and computational results for the DRCCED

method with moment information and the proposed ED method with/without

prescreening in six-bus test system. ........................................................................... 103 

Table 5. Comparisons of problem sizes and computational results for the DRCCED

method with moment information and the proposed ED method with/without

prescreening in IEEE 118 bus system. ....................................................................... 108 

Table 6. Mass balance relationships for different processes. ..................................... 197 

Table 7. Computational results of different methods in the process network planning

application. ................................................................................................................. 200 

Table 8. Computational performances of different solution algorithms in the multistage

robust inventory control problem under demand uncertainty for T=5. ...................... 210 

Table 9. Computational performances of different solution algorithms in the multistage

robust inventory control problem under demand uncertainty for T=10. .................... 211 

xvi
Table 10. Computational performances of different solution algorithms in the multistage

robust inventory control problem under demand uncertainty for T=15. .................... 212 

xvii
CHAPTER 1

INTRODUCTION

Optimization applications abound in many areas of science and engineering [1-3]. In

real practice, some parameters involved in optimization problems are subject to

uncertainty due to a variety of reasons, including estimation errors and unexpected

disturbance [4]. Such uncertain parameters can be product demands in process planning

[5], kinetic constants in reaction-separation-recycling system design [6], and task

durations in batch process scheduling [7], among others. The issue of uncertainty could

unfortunately render the solution of a deterministic optimization problem (i.e. the one

disregarding uncertainty) suboptimal or even infeasible [8]. The infeasibility, i.e. the

violation of constraints in optimization problems, has a disastrous consequence on the

solution quality. Motivated by the practical concern, optimization under uncertainty has

attracted tremendous attention from both academia and industry [4, 9-11].

In the era of big data and deep learning, intelligent use of data has a great potential to

benefit many areas. Although there is no rigorous definition of big data [12], people

typically characterize big data with five Vs, namely, volume, velocity, variety, veracity

and value [13]. Torrents of data are routinely collected and archived in process

industries, and these data are becoming an increasingly important asset in process

control, operations and design [14-18]. Nowadays, a wide array of emerging machine

learning tools can be leveraged to analyze data and extract accurate, relevant, and useful

information to facilitate knowledge discovery and decision-making. Deep learning, one

of the most rapidly growing machine learning subfields, demonstrates remarkable

power in deciphering multiple layers of representations from raw data without any

1
domain expertise in designing feature extractors [19]. More recently, dramatic progress

of mathematical programming [20], coupled with recent advances in machine learning

[21], especially in deep learning over the past decade [22], sparks a flurry of interest in

data-driven optimization [23-36]. In the data-driven optimization paradigm, uncertainty

model is formulated based on data, thus allowing uncertainty data “speak” for

themselves in the optimization algorithm. In this way, rich information underlying

uncertainty data can be harnessed in an automatic manner for smart and data-driven

decision making.

In this chapter, we summarize and classify the existing contributions of data-driven

optimization under uncertainty, highlight the current research trends, point out the

research challenges, and introduce promising methodologies that can be used to tackle

these challenges. We briefly review conventional mathematical programming

techniques for hedging against uncertainty, alongside their wide spectrum of

applications in Process Systems Engineering (PSE). We then summarize the existing

research papers on data-driven optimization under uncertainty and classify them into

four categories according to their unique approach for uncertainty modeling and distinct

optimization structures. Based on the literature survey, we identify three promising

research directions on optimization under uncertainty in the era of big data and deep

learning and highlight respective research challenges and potential methodologies.

1.1 Background on optimization under uncertainty

In recent years, mathematical programming techniques for decision making under

uncertainty have gained tremendous popularity among the PSE community, as

2
witnessed by various successful applications in process synthesis and design [10, 37],

production scheduling and planning [7, 38], and process control [35, 39-42]. In this

section, we present some background knowledge of methodologies for optimization

under uncertainty, along with computational algorithms and applications in PSE.

Specifically, we briefly review three leading modeling paradigms for optimization

under uncertainty, namely stochastic programming, chance-constrained programming,

and robust optimization. For extensive and detailed surveys in the field of conventional

optimization under uncertainty methods, we refer the reader to the previous reviews on

this subject [43, 44].

1.1.1 Stochastic programming

Stochastic programming is a powerful modeling paradigm for decision making under

uncertainty that aims to optimize the expected objective value across all the uncertainty

realizations [45]. The key idea of the stochastic programming approach is to model the

randomness in uncertain parameters with probability distributions [46]. For instance,

product demands are assumed to follow normal distributions in stochastic

programming-based supply chain model [47]. In general, the stochastic programming

approach can effectively accommodate decision making processes with various time

stages. In single-stage stochastic programs, there are no recourse variables and all the

decisions must be made before knowing uncertainty realizations. By contrast, stochastic

programming with recourse can take corrective actions after uncertainty is revealed.

Among the stochastic programming approach with recourse, the most widely used one

3
is the two-stage stochastic program, in which decisions are partitioned into “here-and-

now” decisions and “wait-and-see” decisions.

The general mathematical formulation of a two-stage stochastic programming problem

is given as follows [45].

min c T x   Q  x,   
x X (1.1)
s.t. Ax  d

The recourse function Q(x, ω) is defined by,

Q  x,    min b   y  
T

y  Y
(1.2)
s.t. W   y    h    T   x

where x represents first-stage decisions made “here-and-now” before the uncertainty ω

is realized, while the second-stage decisions y are postponed in a “wait-and-see” manner

after observing the uncertainty realization. The objective of the two-stage stochastic

programming model includes two parts: the first-stage objective cTx and the expectation

of the second-stage objective b(ω)Ty(ω). The constraints associated with the first-stage

decisions are Ax  d, x  X , and the constraints of the second-stage decisions are

W   y    h    T   x and y    Y . Sets X and Y can include nonnegativity,

continuity or integrality restrictions.

The resulting two-stage stochastic programming problem is computationally expensive

to solve because of the growth of computational time with the number of scenarios. To

this end, decomposition based algorithms have been developed in the existing literature,

including Benders decomposition or the L-shaped method [48, 49], and Lagrangean

decomposition [50]. The location of binary decision variables is critical for the design

of computational algorithms. For stochastic programs with integer recourse, the


4
expected recourse function is no longer convex, and even discontinuous, thus hindering

the employment of conventional L-shaped method. As a result, research efforts have

made on computational algorithms for efficient solution of two-stage stochastic mixed-

integer programs [51], such as Lagrangian relaxation [52], branch-and-bound scheme

[53], and an improved L-shaped method [54].

Stochastic programming has demonstrated various applications in PSE, such as design

and operation of batch processes [55-57], optimization of flow sheets [58], energy

systems [59, 60], and supply chain management [61-64]. Due to its wide applicability,

immense research efforts have been made on the variants of stochastic programming

approach. For instance, the two-stage formulation in (1.1) can be readily extended to a

multi-stage stochastic programming setup by utilizing scenario trees. Other extensions

include stochastic nonlinear programming [65], and stochastic programs with

endogenous uncertainties [66, 67].

1.1.2 Chance constrained optimization

As another powerful paradigm for optimization under uncertainty, chance constrained

programming aims to optimize an objective while ensuring constraints to be satisfied

with a specified probability in uncertain environment [68]. As in the stochastic

programming approach, probability distribution is the key uncertainty model to capture

the randomness of uncertain parameters in chance constrained optimization. The chance

constrained program was first introduced in the seminal work of [69], and attracted

considerable attention ever since. Such chance constraints or probabilistic constraints

5
are flexible enough to quantify the trade-off between objective performance and system

reliability [70].

The generic formulation of a chance constrained optimization problem is presented as

follows,

min f  x 
xX
(1.3)
 
s.t.  ξ   G  x, ξ   0  1  

where x represents the vector of decision variables, X denotes the deterministic feasible

region, f is the objective function to be minimized, ξ is a random vector following a

known probability distribution  with the support set Ξ, G   g1 , , g m  represents a

constraint mapping, 0 is a vector of all zeros, and parameter ε is a pre-specified risk

level.

 
The chance constraint  ξ   G  x, ξ   0  1   guarantees that decision x satisfies

constraints with a probability of at least 1−ε. Note that when the number of constraints

m=1, the above optimization model is an individual chance constrained program; for

m>1, it is called joint chance constrained program [71]. A salient merit of chance

constrained programs is that it allows decision makers choose their own risk levels for

the improvement in objectives. To model sequential decision-making processes, two-

stage chance constrained optimization with recourse was recently studied and had

various applications [72, 73].

Despite of its promising modeling power, the resulting chance constrained program is

generally computationally intractable for the following two main reasons. First,

calculating the probability of constraint satisfaction for a given x involves a multivariate

6
integral, which is believed to be computationally prohibitive. Second, the feasible

region is not convex even if set X is convex and G(x, ξ) is convex in x for any

realizations of uncertain vector ξ [68]. In light of these computational challenges, a large

body of related literature is devoted into the development of solution algorithms for

chance constrained optimization problems, such as sample average approximation [74],

sequential approximation [75, 76], and convex conservative approximation schemes

[77]. Note that chance constrained programs admit convex reformulation for some very

special cases. For example, individual chance constrained programs are endowed with

tractable convex reformulations for normal distributions [45]. Chance constraints with

right-hand-side uncertainty are convex if uncertain parameters are independent and

follow log-concave distributions [68].

In the PSE community, chance constraints are usually employed for customer demand

satisfaction, product quality specification, service level, and reliability level of chemical

processes [78-81]. Due to its practical relevance, chance constrained optimization has

been applied in numerous applications, including model predictive control [82, 83],

process design and operation [84], refinery blend planning [85], biopharmaceutical

manufacturing [86], and supply chain planning problem [87-90].

1.1.3 Robust optimization

As a promising alternative paradigm, robust optimization does not require accurate

knowledge on probability distributions of uncertain parameters [91-94]. Instead, it

models uncertain parameters using an uncertainty set, which includes possible

uncertainty realizations. It is worth noting that uncertainty set is a paramount ingredient

7
in robust optimization framework [94]. Given a specific uncertainty set, the idea of

robust optimization is to hedge against the worst case within the uncertainty set. The

worst-case uncertainty realization is defined based on different contexts: it could be the

realization giving rise to the largest constraint violation, the realization leading to the

lowest asset return [95] or the one resulting in the highest regret [96].

The conventional box uncertainty set is not a good choice since it includes the unlikely-

to-happen scenario where uncertain parameters simultaneously increase to their highest

values. The conventional box uncertainty set is defined as follows [97].

U box  u uiL  ui  uUi , i (1.4)

where Ubox is a box uncertainty set, u is a vector of uncertain parameters, ui is the i-th

component of uncertainty vector u. uiL and uiU represent the lower bound and the upper

bound of uncertain parameter ui, respectively. Box uncertainty set simply defines the

range of each uncertain parameter in vector u. One cannot easily control the size of this

uncertainty set to meet his or her risk-averse attitude. To this end, researchers propose

the following budgeted uncertainty set [93].

 
U budget  u ui  ui  ui  zi ,  1  zi  1,

z
i
i  , i 

(1.5)

where Ubudget denotes a budgeted uncertainty set, u and ui have the same definitions as

in (1.4), ui is the nominal value of ui, ui is the largest possible deviation of uncertain

parameter ui, zi denotes the extent and direction of parameter deviation, and Γ is an

uncertainty budget.

Traditional robust optimization approaches, also known as static robust optimization

[98], make all the decisions at once. This modeling framework cannot well represent
8
sequential decision-making problems [5, 99-105]. Adaptive robust optimization (ARO)

was proposed to offer a new paradigm for optimization under uncertainty by

incorporating recourse decisions [106]. Due to the flexibility of adjusting recourse

decisions after observing uncertainty realizations, ARO typically generates less

conservative solutions than static robust optimization [105, 107-109]. The general form

of a two-stage adaptive robust mixed-integer programming model is given as follows:

min cT x  max min bT y


x uU y ( x ,u )

s.t. Ax  d, x  Rn1  Z n2 (1.6)


 
  x, u   y  Rn3 : Wy  h  Tx  Mu

where x is the first-stage decision made before uncertainty u is realized, while the

second-stage decision y is postponed in a “wait-and-see” manner. x includes both

continuous and integer variables, while y only includes continuous variables. c and b

are the vectors of the cost coefficients. U is an uncertainty set that characterizes the

region of uncertainty realizations.

Besides the two-stage ARO framework, the multistage ARO method has attracted

immense attention due to its unique feature in reflecting sequential realizations of

uncertainties over time [110, 111]. In multistage ARO, decisions are made sequentially,

and uncertainties are revealed gradually over stages. Note that the additional value

delivered by ARO over static robust optimization is its adjustability of recourse

decisions based on uncertainty realizations [106]. Accordingly, the multistage ARO

method has demonstrated applications in process scheduling and planning [104, 105,

112].

9
Despite popularity of the above three leading paradigms for optimization under

uncertainty, these approaches have their own limitations and specific application

scopes. To this end, research efforts have been made on “hybrid” methods that leverage

the synergy of different optimization approaches to inherit their corresponding strengths

and complement respective weaknesses [113-120]. For instance, stochastic

programming was integrated with robust optimization for supply chain design and

operation under multi-scale uncertainties [114]. Robust chance constrained optimization

along with global solution algorithms were developed and applied to process design

under price and demand uncertainties [120].

1.2 Existing methods for data-driven optimization under uncertainty

In this section, we review the recent advances in optimization under uncertainty in the

era of big data and deep learning. Recent years have witnessed a rapidly growing

number of publications on data-driven optimization under uncertainty, an active area

integrating machine learning and mathematical programming. These publications cover

various topics and can be roughly classified into four categories, namely data-driven

stochastic program, data-driven chance constrained program, data-driven robust

optimization, and data-driven scenario-based optimization. Unlike the conventional

mathematical programming techniques, these data-driven approaches do not presume

the uncertainty model is perfectly given a priori, rather they all focus on the practical

setting where only uncertainty data are available.

10
1.2.1 Data-driven stochastic program and distributionally robust

optimization

The literature review of data-driven stochastic program, also known as distributionally

robust optimization (DRO), is presented in detail in this subsection. The motivation of

this emerging paradigm on data-driven optimization under uncertainty is first presented,

followed by its model formulation. In this modeling paradigm, the uncertainty is

modeled via a family of probability distributions that well capture uncertainty data on

hand. This set of probability distributions is referred to as ambiguity set. We then present

and analyze various types of ambiguity sets alongside their corresponding strengths and

weaknesses. Finally, the extension of DRO to the multistage decision-making setting is

also discussed, as well as their recent applications in PSE.

In the stochastic programming approach, it is assumed that the probability distribution

of uncertain parameters is perfectly known. However, such precise information of the

uncertainty distribution is rarely available in practice. Instead, what the decision maker

has is a set of historical and/or real-time uncertainty data and possibly some prior

structure knowledge of the probability. Moreover, the assumed probability in

conventional stochastic programming might deviate from the true distribution.

Therefore, relying on a single probability distribution could result in sub-optimal

solutions, or even lead to the deterioration in out-of-sample performance [121].

Motivated by these weaknesses of stochastic programming, DRO emerges as a new

data-driven optimization paradigm which hedges against the worst-case distribution in

an ambiguity set. Rather than assuming a single uncertainty distribution, the DRO

approach constructs an uncertainty set of probability distributions from uncertainty data


11
through statistical inference and big data analytics. In this way, DRO is capable of

hedging against the distribution errors, and accounts for the input of uncertainty data.

The general model formulation of data-driven stochastic programming is presented as

follows [122].

min max   l  x ,ξ (1.7)


xX 

where x is the vector of decision variables, X is the feasible set, l is the objective

function, and ξ represents a random vector whose probability distribution  is only

known to reside in an ambiguity set  . The DRO approach aims for optimal decisions

under the worst-case distribution, and as a result offers performance guarantee over the

family of distributions.

The DRO or data-driven stochastic optimization framework enjoys two salient merits

compared with the conventional stochastic programming approach. First, it allows the

decision maker to incorporate partial distribution information learned from uncertainty

data into the optimization. As a result, the data-driven stochastic programming approach

greatly mitigates the issue of optimizer’s curse and improves the out-of-sample

performance. Second, data-driven stochastic programming inherits the computational

tractability from robust optimization and some resulting problems can be solved exactly

in polynomial time without resorting to the approximation scheme via sampling or

discretization. For example, optimization problem (1.7) for a convex program with

continuous variables and a moment-based ambiguity set is proved to be solvable in

polynomial time [122].

The choice of ambiguity sets plays a critical role in the performance of DRO. When

choosing ambiguity set, the decision maker need to consider the following three factors,
12
namely tractability, statistical meaning, and performance [123]. First, the data-driven

stochastic programming problem with the ambiguity set should be computationally

tractable, meaning the resulting optimization could be formulated as linear, conic

quadratic or semidefinite programs. Second, the derived ambiguity set should have clear

statistical meaning. Therefore, various ways of constructing ambiguity sets based on

uncertainty data were extensively studied [122, 124, 125]. Third, the devised ambiguity

set should be tight to increase the performance of resulting decisions.

One commonly used approach to constructing ambiguity set is moment-based

approaches, in which first and second order information is extracted from uncertainty

data using statistical inference [126]. The ambiguity set that specifies the support, first

and second moment information is shown as follows,

 
    Ξ  1 
 
            (1.8)
 
            
T
  
 

where ξ represents the uncertainty vector, Ξ is the support,  represents the probability

distribution of ξ,  denotes the set of all probability measures,   denotes the

expectation with respect to distribution  . Parameters μ and Σ represent the mean vector

and covariance matrix estimated from uncertainty data, respectively.

The ambiguity set in (1.8) fails to account for the fact that the mean and covariance

matrix are also subject to uncertainty. To this end, an ambiguity set was proposed based

on the distribution’s support information as well as the confidence regions for the mean

and second-moment matrix in the work of [122]. The resulting DRO problem could be

solved efficiently in polynomial time.


13
    Ξ  1 
 
 
                        1 
T 1
(1.9)
 
            2 
T
   
 

where ξ represents the uncertainty vector, Ξ is the support,  represents the probability

distribution of ξ. The equality constraint    Ξ   1 enforces that all uncertainty

realizations reside in the support set Ξ. Parameters ψ1 and ψ2 are used to define the sizes

of confidence regions for the first and second moment information, respectively.

The moment-based ambiguity sets typically enjoy the advantage of computational

tractability. For example, DRO with the ambiguity set based on principal component

analysis and first-order deviation functions was developed [125]. Additionally, the

computational effectiveness of this data-driven DRO method was demonstrated via

process network planning and batch production scheduling [125]. Recently, a data-

driven DRO model was developed for the optimal design and operations of shale gas

supply chains to hedge against uncertainties associated with shale well estimated

ultimate recovery and product demand [127]. However, the moment-based ambiguity

set is not guaranteed to converge to the true probability distribution as the number of

uncertainty data goes to infinity. Consequently, this type of ambiguity set suffers from

the conservatism with moderate uncertainty data. To address the above issue with

moment-based methods, ambiguity sets based on statistical distance between

probability distributions were developed, as shown below,


     d  , 0     (1.10)

14
where  is the probability distribution of uncertain parameters, 0 represents the

reference distribution such as the empirical distribution, d denotes some statistical

distance between two distributions, and θ stands for the confidence level.

Ambiguity set in (1.10) can be further classified based on the adopted distance metric,

such as Kullback-Leibler divergence [128] and Wasserstein distance [124]. For

example, a DRO model was proposed for lot-sizing problem, in which the chi-square

goodness-of-fit test and robust optimization were combined. The ambiguity set of

demand was constructed from uncertainty data by using a hypothesis test in statistics,

called the chi-square goodness-of-fit test [129]. This set is well defined by linear

constraints and second order cone constraints. It is worth noting that the input of their

model is histograms, which make it possible to use a finite dimensional probability

vector to characterize the distribution. The adopted statistic belonged to the phi-

divergences, which motivated researchers to construct distribution uncertainty set by

using the phi-divergences [130].

To account for the sequential decision-making process, researchers recently developed

the adaptive DRO method by incorporating recourse decision variables [131, 132]. A

general two-stage data-driven stochastic programming model is presented in the

following form:

min cT x  max   Q  x ,ξ


x X 

s.t. Ax  d (1.11)
min b ξ yT

Q  x,ξ   yY
 s.t. T ξ x  W ξ y  h ξ

15
where x presents the vector of first-stage decision variables that need to be determined

before observing uncertainty realizations, y denotes the vector of second-stage decision

variables that can be adjustable based on the realized uncertain parameters ξ, sets X and

Y can include nonnegativity, continuity or integrality restrictions, and Q represents the

recourse function. The objective of the above data-driven stochastic program is to

minimize the worst-case expected cost with respect to all possible uncertainty

distributions  within the ambiguity set  . Based on the literature, multistage data-

driven DRO is becoming a rapidly evolving research direction.

Data-driven stochastic programming has several salient merits over the conventional

stochastic programming approach. However, based on the existing literature, there are

few papers on its PSE applications [125, 127]. In real world applications, the trend of

big data has fueled the increasing popularity of data-driven stochastic programming in

many areas, especially in power systems operation. Recently, DRO emerges as a new

data-driven optimization paradigm which hedges against the worst-case distribution in

an ambiguity set, and has various applications in power systems, such as unit

commitment problems [133-136], and optimal power flow [137, 138].

1.2.2 Data-driven chance constrained program

In contrast to the data-driven stochastic programming approach, data-driven chance

constrained programming is another paradigm focusing on chance constraint

satisfaction under the worst-case probability instead of optimizing the worst-case

expected objective. Although both data-driven chance constrained program and DRO

adopt ambiguity sets in the uncertainty models, they have distinct model structures.

16
Specifically, data-driven chance constrained program features constraints subject to

uncertainty in probability distributions, while DRO typically only involves the worst-

case expectation of an objective function with respect to a family of probability

distributions. The chance constrained programming approach assumes the complete

distribution information is perfectly known. However, the decision maker only has

access to a finite number of uncertainty realizations or uncertainty data. On one hand,

such complete knowledge of distribution is usually estimated from limited number of

uncertainty data or obtained from expert knowledge. On the other hand, even if the

probability distribution is available, the chance constrained program is computationally

cumbersome. In practice, one can only have partial information on the probability

distribution of uncertainty. Therefore, data-driven chance constrained optimization

emerges as another paradigm for hedging against uncertainty in the era of big data.

The general form of data-driven chance constrained program is given by,

min f  x 
x X
(1.12)
 
s.t. min  ξ   G  x, ξ   0  1  


where x represents the vector of decision variables, X denotes the deterministic feasible

region, f is the objective function, ξ is a random vector following a probability

distribution  that belongs to an ambiguity set  . G   g1 , , g m  represents a

constraint mapping, 0 is a vector of all zeros, and parameter ε is a pre-specified risk

level. The data-driven chance constraints enforce classical chance constraints to be

satisfied for every probability distribution within the ambiguity set.

The computational tractability of the resulting data-driven chance constrained program

can vary depending on both the ambiguity sets and the structure of the optimization
17
problem. In the following, we summarize the relevant papers according to the adopted

uncertainty set of distributions and optimization structures.

Distributionally robust individual linear chance constraints under the ambiguity set

comprised of all distributions sharing the same known mean and covariance were

reformulated as convex second-order cone constraints [126]. The deterministic convex

conditions to enforce distributionally robust chance constraints were provided under

distribution families of (a) independent random variables with box-type support and (b)

radially symmetric non-increasing distributions over the orthotope support. The worst-

case conditional value-at-risk (CVaR) approximation for distributionally robust joint

chance constraints was studied assuming first and second moment [139], and the

resulting conservative approximation can be cast as semidefinite program. In addition

to moment information, a specific structural information of distributions called

unimodality was incorporated into the ambiguity set, and the corresponding ambiguous

risk constraints were reformulated as a set of second second-order cone constraints

[140]. Instead of assuming unimodality of distributions, data-driven robust individual

chance constrained programs along with convex approximations were recently

developed using a mixture distribution-based ambiguity set with fixed component

distribution and uncertain mixture weights [141].

In real world applications, exact moment information can be challenging to obtain, and

can only be estimated through confidence intervals from uncertainty realizations [122].

To accommodate this moment uncertainty, attempts were made in the context of

distributionally robust chance constraints, including constructing convex moment

ambiguity set [142], employing Chebyshev ambiguity set with bounds on second-order

18
moment [143], characterizing a family of distributions with upper bounds on both mean

and covariance [144]. Ambiguous joint chance constraints were studied where the

ambiguity set was characterized by the mean, convex support, and an upper bound on

the dispersion [145], and the resulting constraints were conic representable for right-

hand-side uncertainty. In addition to generalized moment bounds [146], structural

properties of distributions, such as symmetry, unimodality, multimodality and

independence, were further integrated into distributionally robust chance constrained

programs leveraging a Choquet representation [123]. Nonlinear extensions of

distributionally robust chance constraints were made under the ambiguity sets defined

by mean and variance [147], convex moment constraints [148], mean absolute deviation

[149], and a mixture of distributions [150].

Although moment-based ambiguity sets achieve certain success, they do not converge

to the true probability distribution as the number of available uncertainty data increases.

Consequently, the resulting data-driven chance-constrained programs tend to generate

conservative solutions. To this end, data-driven chance-constrained programs with

distance-based ambiguity set were proposed to alleviate the undesirable consequence of

moment-based data-driven chance-constrained programs. The ambiguity set defined by

the Prohorov metric was introduced into the distributionally robust chance constraints,

and the resulting optimization problem was approximated by using robust sampled

problem [151]. Distributionally robust chance constraints with the ambiguity set

containing all distributions close to a reference distribution in terms of Kullback-Leibler

divergence were cast as classical chance constraints with an adjusted risk level [128].

Data-driven chance constrained programs with ϕ-divergence based ambiguity set were

19
proposed [152], and further extensions were made using the kernel smoothing method

[27, 31]. Recently, data-driven chance constraints over Wasserstein balls were exactly

reformulated as mixed-integer conic constraints [153, 154]. Leveraging the strong

duality result [155], distributionally robust chance constrained programs with

Wasserstein ambiguity set were studied for linear constraints with both right and left

hand uncertainty [156], as well as for general nonlinear constraints [157].

Data-driven chance constrained programs have successful applications in a number of

areas, such as power system [158], stochastic control [159], and vehicle routing problem

[160].

1.2.3 Data-driven robust optimization

As a paramount ingredient in robust optimization, uncertainty sets endogenously

determine robust optimal solutions and therefore should be devised with special care.

However, uncertainty sets in the conventional robust optimization methodology are

typically set a priori using a fixed shape and/or model without providing sufficient

flexibility to capture the structure and complexity of uncertainty data. For example, the

geometric shapes of uncertainty sets in (1.4) and (1.5) do not change with the intrinsic

structure and complexity of uncertainty data. Furthermore, these uncertainty sets are

specified by finite number of parameters, thereby having limited modeling flexibility.

Motivated by this knowledge gap, data-driven robust optimization emerges as a

powerful paradigm for addressing uncertainty in decision making.

A data-driven ARO framework that leverages the power of Dirichlet process mixture

model was proposed [32]. The data-driven approach for defining uncertainty set was

20
developed based on Bayesian machine learning. This machine learning model was then

integrated with the ARO method through a four-level optimization framework. This

developed framework effectively accounted for the correlation, asymmetry and

multimode of uncertainty data, so it generated less conservative solutions. Its salient

feature is that multiple basic uncertainty sets are used to provide a high-fidelity

description of uncertainties. Although the data-driven ARO has a number of attractive

features, it does not account for an important evaluation metric, known as regret, in

decision-making [161]. Motivated by the knowledge gap, a data-driven bi-criterion

ARO framework was developed that effectively accounted for the conventional

robustness as well as minimax regret [162].

In some applications, uncertainty data in large datasets are usually collected under

multiple conditions. A data-driven stochastic robust optimization framework was

proposed for optimization under uncertainty leveraging labeled multi-class uncertainty

data [163]. Machine learning methods including Dirichlet process mixture model and

maximum likelihood estimation were employed for uncertainty modeling, which is

illustrated in Figure 1. This framework was further proposed based on the data-driven

uncertainty model through a bi-level optimization structure. The outer optimization

problem followed the two-stage stochastic programming approach, while ARO was

nested as the inner problem for maintaining computational tractability.

21
Figure 1. The data-driven uncertainty model based on the Dirichlet process mixture

model.

To mitigate computational burden, research effort has been made on convex polyhedral

data-driven uncertainty set based on machine learning techniques, such as principal

component analysis and support vector clustering. A data-driven robust optimization

framework that leveraged the power of principal component analysis and kernel

smoothing for decision-making under uncertainty was studied [34]. In this approach,

correlations between uncertain parameters were effectively captured, and latent

uncertainty sources were identified by principal component analysis. To account for

asymmetric distributions, forward and backward deviation vectors were utilized in the

22
uncertainty set, which was further integrated with robust optimization models. A data-

driven static robust optimization framework based on support vector clustering that aims

to find the hypersphere with minimal volume to enclose uncertainty data was proposed

[164]. The adopted piecewise linear kernel incorporates the covariance information,

thus effectively capturing the correlation among uncertainties. These two data-driven

robust optimization approaches utilized polyhedral uncertainty learned from data, and

thus enjoying computational efficiency. Various types of data-driven uncertainty sets

were developed for static robust optimization based on statistical hypothesis tests [165],

copula [166], and probability density contours [167].

To address multistage decision making under uncertainty, a data-driven approach for

optimization under uncertainty based on multistage ARO and nonparametric kernel

density M-estimation was developed [112]. The salient feature of the framework was

its incorporation of distributional information to address the issue of over-conservatism.

Robust kernel density estimation was employed to extract probability distributions from

data. This data-driven multistage ARO framework exploited robust statistics to be

immunized to data outliers. An exact robust counterpart was developed for solving the

resulting data-driven ARO problem.

In recent years, data-driven robust optimization has been applied to a variety of areas,

such as power systems [33], industrial steam systems [168], planning and scheduling

[112, 166], process control [35], and transportation systems [169].

23
1.2.4 Scenario optimization approach for chance constrained programs

A salient feature of scenario-based optimization is that it does not require the explicit

knowledge of probability distribution as in the stochastic programming approach.

Additionally, scenario-based optimization uses uncertainty scenarios to seek an optimal

solution having a high probabilistic guarantee of constraint satisfaction instead of

utilizing scenarios or samples to approximate the expectation term as in stochastic

programming. Although the scenario-based optimization can be regarded as a special

type of robust optimization that has a discrete uncertainty set consisting of uncertainty

data, it can provide probabilistic guarantee for those unobserved uncertainty data in the

testing data set. Note that the scenario-based optimization approach provides a viable

and data-driven route to achieving approximate solutions of chance-constrained

programs. The scenario-based optimization approach is a general data-driven

optimization under uncertainty framework in which uncertainty data or random samples

are utilized in a more direct manner compared with other data-driven optimization

methods. This data-driven optimization framework was first introduced in [170], and

has gained great popularity within the systems and control community [171]. As in data-

driven chance constrained programs, the knowledge of true underlying uncertainty

distribution is not required in scenario optimization but a finite number of uncertainty

realizations. Specifically, the scenario approach enforces the constraint satisfaction with

N independent identically distributed uncertainty data u(1), …, u(N). The resulting

scenario optimization problem is given by,

min cT x
x X
(1.13)
s.t.  
f x, ui   0, i  1, , N
24
where x is the vector of decision variables, X represents a deterministic convex and

closed set unaffected by uncertainty, c is the vector of cost coefficients, and f denotes

the constraint function affected by uncertainty u. Note that function f is typically

assumed to be convex in x, and can have arbitrarily nonlinear dependence on u, as

opposed to data-driven nonlinear chance constrained program assuming the constraint

function must be quasi-convex in u [147]. Additionally, scenario-based optimization

can be considered as a special case of data-driven robust optimization when the

uncertainty set is constructed as a union of u(1), …, u(N).

 
In the scenario optimization literature,   u 1 , , u N  is referred to as the multi-

sample or scenario that is drawn from the product probability space. Due to the random

nature of the multi-sample, the optimal solution of the scenario optimization problem

(1.13), denoted as x*(ω), is also random. One key merit of the scenario approach is that

the scenario optimization problem admits the same problem type as its deterministic

counterpart, so that it can be solved efficiently by convex optimization algorithms when

f(x, u) is convex in x [172]. Moreover, the optimal solution x*(ω) is guaranteed to satisfy

the constraints with other unseen uncertainty realizations with a high probability [173].

For the sake of clarity, we revisit the following definition and theorem [173].

Definition 1.1 (Violation probability) The violation probability of a given decision x is

defined as follows:

 
V  x    u   f  x, u   0 (1.14)

where V(x) denotes the probability of violation for a given x, and Ξ represents the

support of uncertainty u. We say a decision x is ε-feasible if V(x) ≤ ε.

25
Theorem 1.1 Assuming x*(ω) is the unique optimal solution of the scenario

optimization problem. It holds that

   
n 1 N
 N  V  x*       1      i 1   
N i
(1.15)
i 0  i 

where n is the number of decision variables, N denotes the number of uncertainty data,

and  N is a product probability governing the sample generation.

The above theorem implies that the optimal solution x*(ω) satisfies the corresponding

chance constraint with a certain confidence level. The proof of this theorem depends on

the fundamental fact that the number of support constraints, the removal of which

changes the optimal solution, is upper bounded by the number of decision variables

[170]. Note that (1.15) holds with equality for the fully-supported convex optimization

problem [173], meaning that the probability bound is tight. Additionally, the result holds

true irrespective of probability distribution information or even its support set.

By exploiting the structured dependence on uncertainty, the sample size required by the

scenario optimization problem was reduced through a tighter bound on Helly’s

dimension [174]. Rather than focusing on the constraint violation probability,

considerable research efforts have been made on the degree of violation [175], expected

probability of constraint violation [176], and the performance bounds for objective

values [177]. To make a trade-off between feasibility and performance, the case was

studied where some of the sampled constraints were allowed to be violated for

improving the performance of the objective [178]. Subsequent work along this direction

includes a sampling-and-discarding method [179]. A wait-and-judge scenario

optimization framework was proposed in which the level of robustness was assessed a

26
posteriori after the optimal solution was obtained [180]. Recently, the extension of

scenario-based optimization to the multistage decision making setting was made [181,

182].

While the scenario optimization problems with continuous decision variables are

extensively studied [171], the mixed-integer scenario optimization was less developed.

An attempt to extend the scenario theory to random convex programs with mixed-

integer decision variables was made [183], and the Helly dimension in the mixed-integer

scenario program was proved to depend geometrically on the number of integer

variables. This result suggests that the required sample size can be prohibitively large

for scenario programs with many discrete variables. Along this research direction, two

sampling algorithms within the framework of S-optimization were recently developed

for solving mixed-integer convex scenario programs [184].

In some real-world applications, the required sample size can be very large, resulting in

great computational burden for scenario optimization problems with huge number of

sampling constraints. One way to circumvent this difficulty is to devise sequential

solution algorithms. Along this direction, sequential randomized algorithms were

developed for convex scenario optimization problems [185], and fell into the

framework of Sequential Probabilistic Validation (SPV) [186]. The motivation behind

these sequential algorithms is that validating a given solution with a large number of

samples is less computational expensive than solving the corresponding scenario

optimization problem. Recently, a repetitive scenario design approach was proposed by

iterating between reduced-size scenario optimization problems and the probabilistic

feasibility check [187]. The trade-off between the sample size and the expected number

27
of repetitions was also revealed in the repetitive scenario design [187]. Note that the

classical scenario-based approach is an extreme situation in the trade-off curve, where

one seeks to find the solution at one step. Another effective way to reduce the

computation cost of large-scale scenario optimization is to employ distributed

algorithms [188-190]. Particularly, the sampled constraints were distributed among

multiple processors of a network, and the large-scale scenario optimization problems

can be efficiently solved via constraint consensus schemes [190]. Along this direction,

a distributed computing framework was developed for the scenario convex program

with multiple processors connected by a graph [188]. The major advantage of this

approach is that the computational cost for each processor becomes lower and the

original scenario optimization problem can be solved collaboratively. Other

contribution to reduce computational cost is made based on a non-iterative two-step

procedure, i.e. the optimization step and detuning step [191]. As a consequence, the total

sample complexity was greatly decreased.

Traditionally, the field of scenario optimization has focused on convex optimization

problems, in which the number of support constraints is upper bounded by the number

of decision variables. However, such upper bounds are no longer available in nonconvex

scenario optimization problems, giving rise to research challenges of extending the

scenario theory to the nonconvex setting. To date, few works have considered

nonconvex uncertain program using the scenario approach. One contribution is that of

[192], in which assessing the generalization of the optimal solution in a wait-and-judge

manner through the concept of support sub-sample was proposed. The proposed

approach can be employed to general nonconvex setups, including mixed-integer

28
scenario optimization problems. Another attempt to address nonconvex scenario

optimization made use of the statistical learning theory for bounding the violation

probability, and devised a randomized solution algorithm [193]. The statistical learning

theory-based method provided the probabilistic guarantee for all feasible solutions, as

opposed to the convex scenario approach where such guarantee is valid only for the

optimal solution. This unique feature regarding probabilistic guarantees for all feasible

solutions granted by the statistical learning based method is of practical relevance [194],

since it is computationally challenging to solve nonconvex optimization problems to

global optimality. A class of non-convex scenario optimization problem, which has non-

convex objective functions and convex constraints, was recently studied [195]. Since

the Helly’s dimension for the optimal solution of such non-convex scenario program

can be unbounded, the direct application of scenario approaches based on Helly’s

theorem is impossible. To overcome the research challenge, the feasible region was

restricted to the convex hull of few optimizers, thus enabling the application of sample

complexity results [173].

1.3 Various types of deep learning techniques and their potentials

In this subsection, we present three types of deep learning techniques, including deep

belief networks, convolutional neural networks, and recurrent neural networks, and

explore their potential applications in data-driven optimization under uncertainty.

 Deep belief networks

Among deep learning techniques, deep belief networks (DBNs) are becoming

increasingly popular primarily because its unique feature in capturing a hierarchy of

29
latent features [196]. DBNs essentially belong to probabilistic graphical models and are

structured by stacking a series of restricted Boltzmann machines (RBMs). This specific

network structure is designed based on the fact that a single RBM with only one hidden

layer fall shorts of capturing the intrinsic complexities in high-dimensional data. As the

building blocks for DBNs, RBMs are characterized as two layers of neurons, namely

hidden layer and visible layer. Note that the hidden layer can be regarded as the abstract

representation of the visible layer. There are undirected connections between these two

layers, while there exist no intra-connections within each layer. The training process of

DBNs typically involves the pre-training and fine-tuning procedures in a layer-wise

scheme. Armed with multiple layers of hidden variables, DBNs enjoy unique power in

extracting a hierarchy of latent features automatically, which is desirable in many

practical applications. As a result, DBNs have been applied in a wide spectrum of areas,

including fault diagnosis [197], soft sensor [198], and drug discovery [199]. DBNs can

decipher complicated nonlinear correlation among uncertain parameters. Recently, deep

Gaussian process model was proposed as a special type of DBN based Gaussian process

mappings. Due to its unique advantage in nonlinear regression, deep Gaussian process

model should be used to characterize the relationship between uncertain parameters,

such as product price and demand.

 Convolutional neural networks

Convolutional neural networks (CNNs) are one specialized version of deep neural

networks [200], and they have become increasingly popular in areas such as image

classification, speech recognition, and robotics. Inspired by the visual neuroscience,

CNNs are designed to fully exploit the three main ideas, namely sparse connectivity,

30
weight sharing, and equivariant representations [19]. This kind of neural network is

suited for processing data in the form of multiple arrays, particularly two-dimensional

image data. The architecture of a CNN typically consists of convolution layers,

nonlinear layers, and pooling layers. In convolution layers, feature maps are extracted

by performing convolutions between local patch of data and filters. The filters share the

same weights when moving across the dataset, leading to reduced number of parameters

in networks. The obtained results are further passed through a nonlinear activation

function, such as rectified linear unit (ReLU). After that, pooling layers, such as max

pooling and average pooling, are applied to aggregate semantically similar features.

Such different types of layers are alternatively connected to extract hierarchical features

with various abstractions. For the purpose of classification, a fully connected layer is

stacked after extracting the high-level features. Although CNNs are mainly used for

image classification, they have been used to learn spatial features of traffic flow data at

nearby locations which exhibit strong spatial correlations [201]. Given its unique power

in spatial data modeling, CNNs hold the potential to model uncertainty data with large

spatial correlations, such as demand data in different adjacent market locations. In

addition, the CNNs can be trained for the labeled multi-class uncertainty data to perform

the task of classification. Therefore, the output of the CNN potentially acts as the

probability weights used in the data-driven stochastic robust optimization framework.

 Recurrent neural networks

Besides the aforementioned models for spatial data, recurrent neural networks (RNNs)

are widely recognized as the state-of-the-art deep learning technique for processing time

series data, especially those from language and speech [202]. RNNs can be considered

31
as feedforward neural networks if they are unfolded in time scale. The architecture of

neural networks in a RNN possesses a unique structure of directed cycles among hidden

units. In addition, the inputs of the hidden unit come from both the hidden unit of

previous time and the input unit at current time. Accordingly, these hidden units in the

architecture of RNNs constitute the state vectors and store the historical information of

past input data. With this special architecture, RNNs are well-suited for feature learning

for sequential data and demonstrate successful applications in various areas, including

natural speech recognition [202], and load forecasting [203]. However, one drawback

of RNNs is its weakness in storing long-term memory due to gradient vanishing and

exploding problems. To address this issue, research efforts have been made on variants

of RNNs, such as long short-term memory (LSTM) and gated recurrent unit (GRU)

[204]. By explicitly incorporating input, output and forget gates, LSTM enhances the

capability of memorizing the long-term dependency among sequential data. In

sequential mathematical programming under uncertainty, massive time series of

uncertain parameters are collected. Uncertainty data realized at different time stages

often exhibit temporal dynamics. To this end, deep learning techniques, such as deep

RNNs and LSTM, could be leveraged to decipher the temporal dynamics and

trajectories of uncertainty over time stages.

1.4 Outline of the dissertation

This dissertation focuses on the data-driven optimization under uncertainty. The

roadmap of the dissertation is provided as follows.

32
In Chapter 2, we propose a novel data-driven Wasserstein distributionally robust

optimization model for hedging against uncertainty in the optimal biomass with

agricultural waste-to-energy network design under uncertainty. Instead of assuming

perfect knowledge of probability distribution for uncertain parameters, we construct a

data-driven ambiguity set of candidate distributions based on the Wasserstein metric,

which is utilized to quantify their distances from the data-based empirical distribution.

Equipped with this ambiguity set, the two-stage distributionally robust optimization

model not only accommodates the sequential decision making at design and operational

stages, but also hedges against the distributional ambiguity arising from finite amount

of uncertainty data. A solution algorithm is further developed to solve the resulting two-

stage distributionally robust mixed-integer nonlinear program.

In Chapter 3, we propose a novel deep learning based ambiguous joint chance

constrained economic dispatch (ED) framework for high penetration of renewable

energy. By leveraging a deep generative adversarial network (GAN), an f-divergence-

based ambiguity set of wind power distributions is constructed as a ball in the

probability space centered at the distribution induced by a generator network.

Specifically, wind power data are utilized to train f-GAN, in which its discriminator

network criticizes the performance of the generator network in terms of f-divergence.

Based upon this ambiguity set, a data-driven joint chance constrained ED model is

developed to hedge against distributional uncertainty present in multiple constraints

regarding wind power utilization. To facilitate its solution process, the resulting

distributionally robust chance constraints are equivalently reformulated as ambiguity-

free chance constraints, which are further tackled using a scenario approach. Theoretical

33
a priori bound on the required number of synthetic wind power data generated by f-

GAN is explicitly derived for the multi-period ED problem to guarantee a predefined

risk level. By exploiting the ED problem structure, a prescreening technique is employed

to greatly boost both computational and memory efficiencies.

In Chapter 4, we investigate the problem of designing data-driven stochastic Model

Predictive Control (MPC) for linear time-invariant systems under additive stochastic

disturbance, whose probability distribution is unknown but can be partially inferred

from data. We propose a novel online learning-based risk-averse stochastic MPC

framework in which Conditional Value-at-Risk (CVaR) constraints on system states are

required to hold for a family of distribu-tions called an ambiguity set. The ambiguity set

is constructed from disturbance data by leveraging a Dirichlet process mixture model

that is self-adaptive to the underlying data structure and complexity. Specifically, the

structural property of multimodality is exploited, so that the first and second-order

moment information of each mixture component is incorporated into the ambiguity set.

A novel constraint tightening strategy is then developed based on an equivalent

reformulation of distributionally robust CVaR constraints over the proposed ambiguity

set. As more data are gathered during the runtime of controller, the ambiguity set is

updated online using real-time disturbance data, which enables the risk-averse

stochastic MPC to cope with time-varying disturbance distributions. The guarantees on

recursive feasibility and closed-loop stability of the proposed MPC are established via

a safe update scheme.

In Chapter 5, we develop a novel transformation-proximal bundle algorithm for

Multistage Adaptive Robust Mixed-Integer Linear Programs (MARMILPs). By

34
partitioning recourse decisions into state and control decisions, the proposed algorithm

applies affine control policy only to state decisions and allows control decisions to be

fully adaptive to uncertainty. In this way, the MARMILP is proved to be transformed

into an equivalent two-stage Adaptive Robust Optimization (ARO) problem. The

proposed multi-to-two transformation remains valid for other types of causal control

policies besides the affine one. The proximal bundle method is developed for the

resulting two-stage problem. We theoretically show finite convergence of the proposed

algorithm with any positive tolerance. To quantitatively assess solution quality, we

develop a scenario-tree-based lower bounding technique.

The dissertation concludes in Chapter 6.

35
CHAPTER 2

DATA-DRIVEN WASSERSTEIN DISTRIBUTIONALLY ROBUST


OPTIMIZATION FOR BIOMASS WITH AGRICULTURAL WASTE-TO-ENERGY
NETWORK DESIGN UNDER UNCERTAINTY

2.1 Introduction

With growing concerns over energy crisis and global warming, the utilization of

renewable energy sources is growing rapidly around the globe [205]. As a renewable

energy source, biomass can be easily stored until needed and has the potential to be

converted into a plethora of biofuels and bioproducts [206]. Biomass feedstock has

gained tremendous popularity in industries [207], and the associated processing

technologies, such as pretreatment [208], pyrolysis [209], gasification [210], and

fractionation [211], have advanced significantly in recent years [212]. There are

different generations of biofuels in the existing literature [213]. The first-generation

biofuels typically feature the feedstocks of edible energy crops, such as corn and

sugarcane, and lead to the competition between food and fuel. To address this issue, the

second-generation biofuels utilize lignocellulosic biomass as feedstocks. The third

generation is known to be produced from algae and can reduce land use compared with

the second-generation biofuels [214]. As a result, renewable energy and renewable

chemicals/materials produced from biomass hold promise to replace their non-

renewable, petroleum-based counterparts to mitigate the issues of energy and climate.

Additionally, agricultural and organic waste sources, like animal manure and slurry

[215], can be converted to energy or value-added products through biorefinery

technologies [216]. From the perspective of sustainability, the conversion of agricultural


36
waste serves as an environmentally friendly avenue to satisfy the ever-increasing

renewable energy demand [217]. For instance, food waste is considered as a valuable

resource of sustainable energy because of its higher biodegradability, moisture and

organic content [218]. Given a myriad possible feedstocks and technologies, unraveling

the optimal biomass processing routes from a process and product network in a

systematic way is of paramount importance for the economic competitiveness as well

as environmental sustainability [219]. Meanwhile, uncertain parameters involved in

network design add more complexity in the decision-making process [220]. The

historical realizations of uncertain parameters are usually available in the energy

industry, and this data holds huge potential to support the network design. Recently,

data-driven optimization has become an emerging paradigm to address uncertainty by

employing the power of machine learning techniques [221] that include, but are not

limited to, Bayesian nonparametric models [32], kernel learning [164], principal

component analysis [34], and robust kernel density estimation [222]. Nowadays, a wide

array of machine learning techniques can be leveraged to excavate useful uncertainty

information for better decisions in the bioconversion network design and operation.

Therefore, it is crucial to design biomass with agricultural waste-to-energy network with

the explicit consideration of uncertainty by exploiting the organic integration of

machine learning and mathematical programming.

Due to the significance of energy systems design, a growing body of literature leverages

systematic mathematical programming models to address such a problem. Some

research studies focused on deterministic models without the consideration of

uncertainty [223], including rule-based method [224], superstructure optimization

37
[225], and life cycle optimization [226]. Nevertheless, the issue of uncertainty could

render the solution of a deterministic optimization problem suboptimal or even

infeasible [8]. To this end, the bioenergy system design subject to uncertainty has been

extensively investigated in the existing literature [227]. There are various types of

uncertainties in the biomass network design problem, including biomass supply,

feedstock prices, bioproduct demand, technological conversion rates, policies, and

environmental impacts [228]. One such method is robust optimization, in which

uncertainty is modeled with an uncertainty set [98]. By introducing recourse decisions

[106], adaptive robust optimization based network design method was proposed to

identify economical and efficient biofuel and bioproduct production pathways [108].

While robust optimization has achieved success in various applications, this method

typically generates over-conservative solutions, because it always hedges against the

worst-case uncertainty realization. The stochastic programming approach has gained

popularity due to the fact that it can incorporate the probability distribution information

to alleviate the conservatism, yet it generally scales poorly in the problem dimensions

[45]. Recently, the biodiesel production model considering diversified raw materials

subject to feedstock composition uncertainty was formulated as a chance-constrained

stochastic program to guarantee the technical performance on biodiesel [229]. To

account for risk aversion, stochastic programming models based on conditional value at

risk and downside risk were proposed for the optimal network design of hydrocarbon

biorefinery under supply and demand uncertainties [63]. The stochastic programming

approach was applied to an integrated hydrocarbon biofuel and petroleum network

design problem, and the resulting optimization model aimed to minimize the

38
expectation of costs under a number of scenarios associated with biomass availability,

fuel demand, and technology evolution [230]. To address the design of sustainable

biomass conversion network, a stochastic mixed-integer linear programming (MILP)

model was presented, in which uncertain purchase prices were assumed to follow

normal distributions [231].

The employment of the stochastic programming method is widespread in this area, and

most existing studies typically use the Monte Carlo method to generate uncertainty data

or scenarios based on a predefined probability distribution. In practice, such perfect

information about the true probability distribution of uncertain parameters is rarely

known, and it can only be observable through a finite number of uncertainty data. Due

to such limited amount of uncertainty data, the assumed probability distribution could

significantly deviate from the underlying true distribution. If the stochastic program for

biomass with agricultural waste-to-energy network design is calibrated to a given

uncertainty dataset, the resulting out-of-sample performance tends to be disappointing

when evaluating its optimal solution with a testing dataset [121]. The out-of-sample

performance is the actual performance, in terms of objective values, of a given optimal

solution evaluated at some uncertainty scenarios, which are different from the ones used

to obtain that solution. Consequently, the conventional stochastic programming

approach could have poor out-of-sample performance. The aforementioned issue

prompts the development of data-driven distributionally robust optimization (DRO).

The moment-based ambiguity set in DRO is not guaranteed to converge to the true

probability distribution, as the number of uncertainty data increases. Therefore, this type

of ambiguity set suffers from the conservatism issue [124]. Thus, it is imperative to

39
develop a novel optimization method for biomass network design that can (a) effectively

hedge against the distributional ambiguity; (b) leverage the value of uncertainty data via

statistical machine learning; (c) lead to tractable model formulations that are amenable

for applications; and (d) provide optimal solutions with better out-of-sample

performance in terms of lower average cost and lower variance compared with

conventional stochastic programming.

To fill this knowledge gap, we propose a novel data-driven two-stage Wasserstein

distributionally robust network design model, in which technology selection and sizing

are made at the first stage, while operation decisions are made at the second stage.

Rather than assuming perfect knowledge of probability distribution, we consider a more

realistic setting where the true probability distribution can be inferred from a set of

historical uncertainty data. Based on the Wasserstein metric, we construct the data-

driven ambiguity set as a ball (a.k.a. Wasserstein ball) in the probability space centered

at the uniform distribution on uncertainty data. Although the Lévy-Prokhorov metric

can be used to measure the distance between probability distributions, our research work

adopts the Wasserstein metric rather than using the Lévy-Prokhorov metric in the DRO

framework following the literature [38]. The ambiguity set based on the Wasserstein

metric has a better out-of-sample performance compared with other moment-based

methods [124], as well as better computational tractability [131]. Recently, the

Wasserstein ambiguity set has gained increasing popularity, and is widely adopted in

multistage adaptive DRO [232], adaptive robust stochastic optimization [233], and

distributionally robust chance-constrained optimization [153, 234]. The two-stage

stochastic programming model can be considered as a special case of the proposed DRO

40
model when the “radius” of the Wasserstein ball is tuned to be zero. Nonlinear scaling

functions are introduced in the objective function to accurately calculate each

technology’s capital cost associated with the corresponding capacity. Notably, this

research work involves all the three generations of biofuels. According to the taxonomy

of uncertainty types [228], uncertainty can be classified into three categories, namely

randomness, epistemic, and deep uncertainty. In the studied problem, the uncertainty

belongs to deep uncertainty that is characterized by insufficient knowledge of the

underlying probability distribution. The data-driven DRO approach is suitable to

address this type of uncertainty, because it hedges against the ambiguity of distribution

resulted from such lack of probability information by using a set of plausible

distributions. The data-driven Wasserstein DRO model harnesses the advantages of both

robust optimization and stochastic programming. Specially, adopting a worst-case

orientated approach regularizes the optimization problem and effectively hedges against

the worst-case distribution within the ambiguity set [235], thereby remedying the

drawback of the stochastic programming method. To the best of our knowledge, the

proposed model represents the first attempt to employ the data-driven Wasserstein DRO

to address the biomass network design problem under uncertainty. The resulting

problem is formulated as a multi-level mixed-integer nonlinear program (MINLP),

which cannot be solved directly by any off-the-shelf optimization solvers. The “multi-

level” means that the resulting optimization problem has a “min-max-min” optimization

structure. To address this computational challenge, a solution strategy is further

developed by integrating the reformulation of the worst-case expectation [124], and a

branch-and-refine algorithm [236]. In case studies, we consider the feedstock price

41
uncertainty to demonstrate the effectiveness of the proposed approach. The better out-

of-sample performance of the data-driven Wasserstein DRO in terms of lower average

cost and lower variance is validated in a case study of a biomass with agricultural waste-

to-energy network involving 216 technologies and 172 materials/compounds. A

sensitivity analysis is also performed to evaluate the impact of the ambiguity set’s size

on its corresponding DRO solution.

2.2 Problem statement

In this section, we formally state the problem of biomass with agricultural waste-to-

energy network design considering uncertain parameters as follows. As depicted in

Figure 2, we consider a comprehensive biomass with agricultural waste-to-energy

network. This network has various conversion pathways featuring a diversified portfolio

of biomass feedstocks, organic and agricultural waste feedstocks, biofuel, and

bioproducts. The purpose of this comprehensive network is to convert a variety of raw

materials or feedstocks into sustainable energy and useful bioproducts such as biofuels

and biogas. Accordingly, the network holds great value for not only producing clean

energy, but also for managing agricultural waste. There is a total of 216 processing and

upgrading technologies, as well as 172 materials/compounds in this network. The

feedstocks include soybean, corn, sugarcane, hard wood, soft wood, switchgrass, algae,

cassava, brown grease, corn stover, tomato peels, potato peels, orange peels, olive

waste, municipal solid waste, dairy manure, poultry litter, and swine manure. These

various types of feedstocks in the network are converted to energy and bioproducts in

the following way. First, feedstocks are decomposed into some basic chemical

42
compounds via processing technologies, such as hydrothermal liquefaction [237]. Those

chemical compounds are then used for producing biofuels or bioproducts through

upgrading technologies. The final products are sold to the market. Note that some of the

potential pathways in this network are “waste-to-energy” pathways, meaning that they

convert waste materials into energy-rich bioproducts and biofuels [238]. One main

component in waste feedstocks is the agricultural waste, including food waste and

animal manure [239]. Specifically, tomato peels, potato peels, and orange peels can

serve as feedstocks to produce chemical materials, like beta carotene, chlorogenic acid,

caffeic acid, and pectin. Different types of anaerobic digesters (ADs), such as the mixed

plug AD and the horizontal plug flow AD, present as technologies that convert dairy

animal manure, poultry litter, and swine manure into biogas [240]. As a fuel source, the

biogas can be further used to produce heat and electricity [241], thus providing immense

environmental benefits [242]. Meanwhile, methane can be extracted from municipal

landfills using the methane extraction technology [243].

43
Figure 2. The structure of the biomass with agricultural waste-to-energy network

considered in this work.

The most recognized type of uncertainty is the volatility in purchasing prices of biomass

resources. In this research work, the feedstock price uncertainty is considered for the

following reasons. On one hand, uncertain biomass feedstock prices typically fluctuate

due to policy changes and energy markets. Given the lifetime of equipment, the

uncertainty of feedstock prices can significantly affect the economic performances of

the optimal bioconversion network design. On the other hand, real feedstock price data

are well documented and can be easily acquired to validate the effectiveness of the

proposed data-driven approach. Within the proposed Wasserstein DRO model, useful

statistical information embedded in the uncertainty data is leveraged, and then the

ambiguity set based on the Wasserstein metric is constructed. Note that policy changes

and energy markets could lead to time-variant price distributions, which further cause

the ambiguity of probability distributions. For these two sources of uncertainty, the

44
DRO approach works because it uses an ambiguity set to hedge against the distributional

uncertainty. For those uncertainties whose probability distributions are time-invariant,

the DRO method works, since their underlying true distributions can be partially known

due to the limited number of uncertainty data. If the probability distribution can be

perfectly known to the decision maker, adding the distributional robustness is not

necessary.

Figure 3. Illustrative figure on the biomass with agricultural waste-to-energy network

with the corresponding data-driven Wasserstein DRO model.

In this problem, we aim to identify cost-effective processing pathways from biomass

with agricultural waste-to-energy network by minimizing the worst-case expected total

annualized cost. This worst-case expected cost is taken with respect to all feedstock

price distributions within the Wasserstein ambiguity set. This data-driven ambiguity set

is defined as a ball in the probability space centered at the uniform probability

distribution on biomass feedstock price data. An illustrative figure on the biomass with

agricultural waste-to-energy network along with the data-driven Wasserstein DRO

framework is provided in Figure 3. In the two-stage optimization structure of this data-


45
driven Wasserstein DRO model, the first-stage decisions are the design decisions made

prior to the feedstock price uncertainty realization. The second-stage decisions are

operational decisions that are postponed in a “wait-and-see” manner after knowing the

uncertainty realization. The selected technologies in the network are assumed to be

ready for operation at the second stage. Details of these decisions are summarized as

follows:

First-stage design decisions:

 Technology pathway selection;

 Capacity of each technology in pathways;

Second-stage operation decisions:

 Operation level of each technology in pathways;

 Quantities of biomass feedstock to use;

 Bioproduct and/or biofuel sale amount.

These design and operation decisions are optimized based on the following given

parameters:

 The upper and lower bounds of the capacity of each processing and upgrading

technology;

 Conversion coefficients for each technology;

 The availability of each biomass feedstock;

 A base capacity of each technology;

 An initial capital cost corresponding to the base capacity for each technology;

 Expected life span in years of the processing pathway;

 Discount rate;
46
 The fixed operating expense (OPEX) for each technology;

 An initial, variable OPEX corresponding to the base capacity for each

technology;

 Prices of bioproducts and biofuels;

 Feedstock price data.

2.3 Mathematical formulation

In this section, we first present a data-driven approach to construct the Wasserstein

ambiguity set for the feedstock price uncertainty. With this ambiguity set, a two-stage

adaptive distributionally robust mixed-integer nonlinear optimization model is then

proposed for the biomass with agricultural waste-to-energy network design. Finally, a

solution strategy integrating the reformulation of worst-case expectation and the branch-

and-refine algorithm is developed for solving the resulting non-convex optimization

problem.

2.3.1 Data-driven ambiguity set using Wasserstein metric

As mentioned in the problem statement, feedstock prices are subject to uncertainty and

 
the decision maker has access to the price dataset Dtrain  ξ 1 , , ξ  N  . ξ(n) denotes the

T
n-th data vector of feedstock prices, i.e. ξ    c3, 1 , , c3,I
n 
n n
 , and N represents the

number of data samples. For the stochastic programming-based network design, the

assumed probability distribution might deviate from the underlying true distribution due

to the finite amount of feedstock price data. In addition, relying on a single probability

distribution could lead to the deterioration in out-of-sample performance [121]. To


47
address this issue, we construct a family of probability distributions, also referred to as

ambiguity set, by using historical feedstock price data on hand. To quantitatively

measure the distance between feedstock price distributions, we define the Wasserstein

metric or Wasserstein distance as follows [244].

For any probability distributions 1 ,  2      , the Wasserstein metric or distance

between these two distributions is given by,

d w  1 , 2   min  ξ1  ξ 2   dξ1 , dξ 2 
 2

s.t.  is a joint distibution of ξ1 and ξ 2 (2.1)


with marginal distributions 1 and 2

where     represents the set of all probability distributions with support set Ξ, and

 denotes the norm of a vector. We adopt l1 norm in this work due to its computational

benefits in DRO [124].

From the definition, we can see that the Wasserstein metric is defined through an

optimization problem where the decision variable is a probability distribution. By

considering the decision variable Π as a transportation plan, the Wasserstein metric is

essentially the minimum transportation cost of moving probability mass from

distribution 1 to distribution 2 .

Based on the Wasserstein metric in (2.1), the data-driven ambiguity set for feedstock

prices is represented by,

 
       d w , ˆ N     (2.2)

48
where ˆ N denotes the empirical distribution. The probability distribution ˆ N is the

1 N
uniform distribution on N available feedstock price data, i.e. ˆ N   δξ n , where
N n 1

δξ n represents the Dirac measure at the price data point ξ(n). Note that ˆ N is a discrete

distribution and serves as an estimation of the underlying true distribution true . θ is a

parameter used for controlling the size of the data-driven ambiguity set  . The support

set Ξ can be specified via the upper and lower bounds of uncertain parameters and is

shown as follows.

 
  ξ ξ imin  ξ i  ξ imax , i (2.3)

The data-driven ambiguity set  contains all probability distributions whose

Wasserstein distances from the empirical distribution is no larger than θ. Therefore, the

ambiguity set  can be interpreted as a Wasserstein ball of radius θ centered at the

empirical distribution ˆ N . Note that the size of the data-driven ambiguity set can be

adjusted by using the tuning parameter θ. Specifically, decreasing the value of parameter

θ reduces the size of the Wasserstein ambiguity set. The decision maker can utilize the

radius of the Wasserstein ball to control the level of conservatism.

The radius θ of the Wasserstein ball can be calculated by [245],

 m log  1    C   C N  , if m log  1    C   C N   1
 1 2 1 2
  (2.4)
 
 max  log  1    C1   C2 N  ,1 , else

where β denotes the confidence level, m (m>2) represents the dimension of uncertainty

vector, and N is the number of feedstock price data. Here it is assumed that there exist

49

e   dx    . Note that C1 and C2 are positive constant


x
α>1 and ρ>0 satisfying

numbers. In general, Equation (2.4) is not a practicable way to obtain the Wasserstein

radius, because the constants C1 and C2 are difficult to estimate and could give loose

bounds. For this reason, cross-validation can be used as an empirical way to tune the

Wasserstein radius [124].

The steps of k-fold cross validation to tune the Wasserstein radius are given as follows

[246]. First, ξ   , , ξ 
1 N
are partitioned into k subsets. For each holdout run, only one

subset is used as a training dataset, while the remaining subsets are merged as a

validation dataset. Second, the Wasserstein radius is tuned such that the corresponding

average cost for the validation dataset is minimized. Lastly, the optimal Wasserstein

radius from cross-validation is set to be the average of the optimal radii determined in

the k holdout runs.

The data-driven ambiguity set for the feedstock price is cast as a set of possible

probability distributions that are “close” to the empirical distribution in the sense of the

Wasserstein metric. There are several merits of the Wasserstein ambiguity set. First, this

ambiguity set directly leverages the uncertainty data information via the empirical

distribution, while at the same time effectively hedges against the distributional

uncertainty based upon the Wasserstein metric. This feature is useful in the network

design problem where the distribution of uncertain feedstock prices is only observable

through a finite amount of price data. Second, there exists a statistical guarantee that the

Wasserstein ambiguity set contains the unknown true distribution with a certain

confidence level [245]. Specifically, with the Wasserstein radius in (2.4), it can be

50
1  C1e  C2 N , if   1
m

 
guaranteed that P d w true , ˆ N  
  
1  C e  C2 N  
, else
. This favorable feature
1

equips the resulting DRO solution with better out-of-sample performance in terms of

lower average cost and lower variance. Such out-of-sample performance is of practical

relevance, since price data different from the training dataset Dtrain are used to test the

data-driven network design decision. Third, the decision maker can readily adjust the

level of conservatism by tuning the radius θ of the Wasserstein ball. Lastly, the DRO

problem with the Wasserstein ambiguity set admits a tractable reformulation, which

grants the resulting biomass with agricultural waste-to-energy network design problem

with computational efficiency.

2.3.2 Data-driven Wasserstein distributionally robust network design

model

In this section, we present a novel data-driven Wasserstein distributionally robust

biomass with agricultural waste-to-energy network design model using the data-driven

ambiguity set presented in the previous section. In a biomass with agricultural waste-

to-energy network, biomass feedstocks, such as microalgae [247], and dairy manure

[248], are converted into a variety of biofuels and bioproducts via different processing

and upgrading technologies [219]. One needs to make decisions on the selection of

technology pathway, capacity and operating level of each technology, purchase amounts

of feedstocks and quantities of products to sell. The objective is to minimize the worst-

case expected total annualized cost with regard to the Wasserstein ambiguity set. Since

the proposed model aggregates yearly operations, the issue of biomass feedstock

51
seasonality is not considered in this work. In this research work, we focus on the

selection of technologies for the biomass network design, and do not consider the issue

of inventory. Therefore, the assumption on yearly operation is reasonable. Additionally,

this assumption is adopted in the existing literature [219, 225].

The data-driven Wasserstein DRO model for the network design under uncertainty can

be cast as a multi-level MINLP. The two-stage optimization structure allows for

selection and capacity decisions to be made before uncertainty realizations, while also

allowing for production, purchasing, and sale decisions to be made after uncertainty has

been realized. Specifically, the first-stage decision variables are decisions on the

selection and capacity of technology. The second-stage decisions include production

levels, quantity of biomass to use, and amounts of products to sell. The objective

function of the biomass with agricultural waste-to-energy network design is shown in

(2.5). The constraints include technology capacity constraint (2.6), production level

constraint (2.7), mass balance constraint (2.8), biomass feedstock availability constraint

(2.9), biofuel/bioproduct demand satisfaction constraint (2.10), non-negativity and

integrity constraints (2.11)-(2.12). The data-driven ambiguity set of feedstock prices is

shown in (2.13). A list of indices/sets, parameters and variables is given in the

Nomenclature section, where all parameters are denoted in lower-case symbols, and all

variables are denoted in upper-case symbols. The two-stage Wasserstein DRO (WDRO)

model formulation is presented as follows:

  
c Q j j  max   min   c2, jW j   c3,i Pi   c4,i Si 
sf
(WDRO) min 1, j (2.5)
Y ,Q
jJ

W , P,S  jJ iI iI 

s.t. a1, jY j  Q j  a2, jY j , j  J (2.6)

52
Wj  Qj , j  J (2.7)

Pi   ij W j  Si  0, i  I (2.8)
j

Pi  bi , i  I (2.9)

Si  d i , i  I (2.10)

Q j , Pi , S i , W j  0 i  I , j  J (2.11)

Y j  0,1 , j  J (2.12)




d w , ˆ N    

        1 N  (2.13)
 ˆ N   δc n 
 N n 1 3,i 

where c1,j, c2,j, c3,i, and c4,i respectively represent economic evaluation parameters for

the capital cost associated with technology j, the operating cost associated with

technology j, the purchase cost of biomass feedstock i, and the selling price of

bioproduct i. At the first stage (a.k.a. the design stage), “here-and-now” decisions

involve binary variables Yj indicating the selection of technology j as well as continuous

variables Qj representing the capacity of technology j. These decisions are termed as

“here-and-now”, since they should be made prior to any uncertain feedstock price

realizations. At the second stage or the operational stage, the decision variables,

including the operating level of each technology Wj, the amount of feedstock purchased

Pi, and the amount of product sold Si, can be postponed in a “wait-and-see” manner after

knowing the uncertainty realization.

The objective function can be roughly divided into two terms. The first term represents

the first-stage cost, namely the total capital cost. The nonlinearity arises within the
53
relation between the facility’s capital cost and its capacity [249]. Specifically, the

sf
nonlinear functions, namely power function Q j j , are employed to evaluate technology

capital costs in the (WDRO) model. Following the literature [250], sfj is typically set to

be 0.6. The second term is the worst-case expectation of the second-stage costs, and as

a result, the proposed optimization model is capable of hedging against the worst-case

feedstock price distribution within the Wasserstein ball. Based on Constraint (2.7), it

becomes clear that the decision variable for operating level Wj can be adjusted in the

range from zero to the total capacity of technology j. Therefore, the (WDRO) model

appropriately accommodates the fact that real facilities do not always operate at the

maximum capacity.

It is worth noting that the proposed (WDRO) model reduces to the stochastic

programming when the value of parameter θ is set to be 0, since the induced ambiguity

 
set changes to the singleton set ˆ N . In summary, we develop a data-driven two-stage

Wasserstein distributionally robust MINLP model where the distribution of biomass

feedstock price can only be inferred from a finite training dataset. To effectively hedge

against the distributional uncertainty, the (WDRO) model employs the objective

function of worst-case expected cost with respect to the data-driven Wasserstein

ambiguity set. The proposed biomass with agricultural waste-to-energy network design

model has the following merits. First, it directly incorporates uncertain feedstock price

data into the optimization model. Second, the (WDRO) model effectively accounts for

the ambiguity of the feedstock price distribution, thereby enjoying a better out-of-

54
sample performance in terms of lower average cost and lower variance compared with

conventional stochastic programming.

However, the multi-level optimization structure, coupled with nonconvex terms in the

objective function, makes the resulting optimization problem computationally

challenging. To address this computational challenge, we further develop a solution

method that works in solving the resulting Wasserstein distributionally robust MINLP

problem in the sequel.

2.4 Solution methodology

In this section, we develop a tailored solution method to globally optimize the (WDRO)

problem, which involves a nonconvex objective function. Problem (WDRO) cannot be

solved directly by any off-the-shelf optimization solvers due to the multilevel

optimization structure, as well as the infinite number of probability distributions

sf j
involved in the ambiguity set. The concave function Q j renders the optimization

problem non-convex, which leads to increased computational difficulty. Furthermore,

existing solution methods for two-stage DRO problems cannot handle a mixed-integer

nonconvex objective function [251].

To tackle this computational challenge, we present a tailored solution algorithm based

on the special structure of (WDRO) problem. Specifically, we employ a combination of

the reformulation of the worst-case expectation and a branch-and-refine algorithm to

globally optimize the (WDRO) problem. First, we reformulate the worst-case

expectation problem and obtain a single-level MINLP problem. The branch-and-refine

55
algorithm is then adopted to solve the resulting single-level optimization problem by

leveraging a successive piecewise linear approximation technique.

To facilitate its implementation, we present the explicit form of the two-stage

Wasserstein distributionally robust counterpart (WDRC) for the biomass with

agricultural waste-to-energy network design with uncertain feedstock price as follows.

The step-by-step derivation is provided in Appendix A. Note that the derivation to

reformulate the (WDRO) problem into (WDRC) comes from the literature [124].

N
1
c Qj j   + 
sf
(WDRC) min 1, j n (2.14)
jJ N n 1

s.t. a1, jY j  Q j  a2, jY j , j  J (2.15)

Wnj  Q j , j  J , n  N d (2.16)

Pni    ij Wnj  Sni  0, i  I , n  N d (2.17)


j

Pni  bi , i  I , n  N d (2.18)

Sin  di , i  I , n  N d (2.19)

Q j , Pni , S ni , Wnj  0 i  I , j  J , n  N d (2.20)

Y j  0,1 , j  J (2.21)

c
jJ
2, jWnj   c3,i
 n
iI
Pni   c4,i Sni
iI
(2.22)

iI

  c3,imax  c3,i
 n

 γ ni   c3,imin  c3,i
1  n
 
 γ ni   n , n  N d
iI
2

   Pni  γ ni1  γ ni2   , i  I , n  N d (2.23)

γ ni1 , γ ni2  0, i  I , n  N d (2.24)

56
 γ n1  1
where γ n   2  , γ ni and γ ni2 are the i-th entries of vectors γ n1 and γ n2 , respectively.
 γ n  

max min
c3,i and c3,i represent the upper and lower bounds for the price of feedstock i,

respectively.

By employing the reformulation techniques, we obtain the (WDRC) model which is

equivalent to the (WDRO) model. The (WDRC) for the biomass with agricultural waste-

to-energy network design is formulated as a single-level MINLP problem. One salient

feature of (WDRC) is that its model size, namely the number of decision variables and

the number of constraints, scales linearly with the number of price data N. Moreover,

the uncertain price data are directly incorporated into the proposed (WDRC) model as

witnessed in (2.22).

The resulting (WDRC) problem turns out to be a nonconvex MINLP with separable

concave terms in its objective function (2.14). These concave terms appear due to the

calculation of capital cost based on “six-tenths rule” scaling with technology capacity

[250]. Although this single-level MINLP can be solved directly using some off-the-shelf

optimization solvers such as BARON, it turns out to be computationally expensive when

applied to network design problems [252]. By leveraging the structure of separable

concave terms, we adopt a branch-and-refine algorithm based on a successive piecewise

linear approximation to solve the (WDRC) problem to its global optimality [236]. The

key idea is to approximate the concave capital cost using a series of piecewise linear

underestimates that are formulated via special ordered sets of type 1 (SOS1) variables.

The piecewise linear under-estimator for the capital cost of technology j, denoted by Ej,

is formulated in (2.25)-(2.27).
57
NP
E j   fe jp  PWip , j  J (2.25)
p 1

NP
Q j   fx jp  PW jp , j  J (2.26)
p 1

fe jp  fx 0.6
jp , j  J , p  P (2.27)

where fxjp is the predefined partition point value, fejp denotes the corresponding power

function value, p is the index of partition point, NP is the total number of partition points,

and PWjp a weight for the corresponding partition point.

Constraints on weighting factor PWjp and position indicator PEjp are defined in (2.28)-

(2.33).

NP

 PW
p 1
jp  1, j  J (2.28)

NP

 PE
p 1
jp  1, j  J (2.29)

PW j1  PE j1 , j  J (2.30)

PW jp  PE jp  PE jp 1 , j  J , 2  p  NP (2.31)

PW jNP  PE jNP 1 , i  I (2.32)

PW jp  0, PE jp  SOS1 , j  J (2.33)

where IWj,p is defined as a SOS1 variable such that only one interval is selected.

58
Algorithm. The proposed solution algorithm
1: Set LB←−∞, UB←+∞, iter←0, and ζ;
2: Reformulate problem (WDRO) to problem (WDRC);
3: While UB  LB  
4: iter ← iter + 1;
5: *
Solve problem (WDRC) with piecewise linear objective function, and obtain Qiter
and objective value OBJ*;
6: Update LB  max  LB, OBJ *  ;
7: Evaluate the original nonlinear objective value OBJ∆ using Qiter
*
;
8: Update UB  min UB, OBJ   ;
9: *
Add a new partition point at the candidate solution Qiter ;
10: End
11: Return the optimal solution

Figure 4. The pseudocode of the proposed reformulation-based branch-and-refine

algorithm for solving (WDRO) problem.

sf
Since the capital cost is underestimated when substituting Q j j with Ej, the resulting

MILP problem provides a valid lower bound. Note that the optimal solution of the MILP

problem, which can be regarded as a candidate solution, is feasible to the original

(WDRC) problem. Accordingly, a valid upper bound for the total annualized cost can

be obtained by calculating the original nonconvex objective value with the candidate

solution. The gap is then computed as the difference between the upper and lower

bounds and is utilized for determining whether a new partition point is needed to further

refine the piecewise linear approximation. Partition points are added at the candidate

solutions iteratively until the gap between the upper and lower bounds reaches below a

predefined tolerance. In Figure 4, we present the pseudocode of the developed solution

method for the global optimization of the (WDRO) problem in detailed steps. Note that

the tolerance for the optimality gap is denoted by ζ. The number of partition points

59
increases by one only for those selected technologies in each iteration of the solution

algorithm.

2.5 Case studies

To demonstrate the effectiveness of the proposed data-driven Wasserstein DRO based

network design approach and the solution algorithm, we consider a specific biomass

with agricultural waste-to-energy network design in this section. Optimal solutions are

found and validated through the optimization process of the network design problem

based on the proposed method.

2.5.1 Case description

In the considered biomass with agricultural waste-to-energy network, there are 216

processing and upgrading technologies as well as 172 materials/compounds. In the case

study, we consider the energy market, which involves the demands for biodiesel,

gasoline, ethanol, methane, and biogas. The problem parameters for technologies, such

as mass balance coefficients, generally are not influenced by the geographical region,

whereas the price parameters hold for the geographical region of USA. Note that the

datasets of problem parameters used in this work can be found in the recently published

papers [240, 252]. Thus, the data reflects the recent trend in terms of its age. Because of

market fluctuations, feedstock prices are subject to uncertainty. Since the feedstock

price data are well-documented [253], we use real price data for the case study.

We also implement the deterministic optimization method and the conventional two-

stage stochastic programming method using the same price data as scenarios, in addition

to the proposed data-driven Wasserstein DRO approach for the purpose of comparison.
60
All optimization problems are modelled in GAMS 25.0.3 [254]. The computational

experiments are performed on a computer with an Intel (R) Core (TM) i7-6700 CPU @

3.40 GHz and 32 GB RAM. In each iteration of the developed solution algorithm, an

MILP problem is solved with the solver CPLEX 12.8.0. The optimality gap for CPLEX

12.8.0 is set to be 0, and the optimality tolerance for the reformulation-based branch-

and-refine algorithm is 10-6. In the case studies, the radius of Wasserstein ball θ is

obtained through cross validation [124]. The lower and upper bounds of uncertain

parameters in support set Ξ are estimated using the empirical bounds, which are directly

obtained from the training data. Note that the average price is used as a nominal value

for the deterministic optimization method. For the stochastic programming and DRO

approaches, price uncertainty data, which represent possible price realizations, are used

in their corresponding optimization problems. At the end of the case study, a sensitivity

analysis is performed to investigate how the value of θ influences the Wasserstein DRO

solution.

2.5.2 Results and discussions

The problem sizes and computational results of different methods are summarized in

Table 1. A total of 12 training samples is used for these optimization methods. From the

table, it can be observed that the number of continuous decision variables and the

number of constraints in the reformulated problem (WDRC) are both larger than those

in the two-stage stochastic program. This is because auxiliary variables and constraints

are introduced to reformulate the worst-case expectation problem. Although the mixed-

integer optimization problem for biomass network design is NP-hard, it can be solved

61
within a reasonable amount of time empirically. From Table 1, we can see that the data-

driven Wasserstein DRO problem takes about 50% longer computational time to solve

compared with the deterministic optimization and stochastic programming methods.

Note that the biomass with agricultural waste-to-energy network design problem, which

includes both continuous and discrete decision variables, is a NP-hard problem.

Although the stochastic program and the DRO problem have a similar number of

decision variables and constraints, their model structures are quite different, leading to

different computational times. The successive piecewise linear approximation is

employed to solve the two-stage stochastic program with a nonconvex objective

function. The objective value determined by the conventional stochastic programming

method is $16.37 MM, whereas the objective value determined by the proposed data-

driven Wasserstein DRO approach is $21.67 MM. The reason is that the conventional

stochastic programming method minimizes the expected total cost based on a single

empirical distribution, while the data-driven Wasserstein DRO approach aims for the

lowest worst-case expected cost with respect to a family of candidate feedstock price

distributions. The expected value of perfect information (EVPI) is an important criterion

in decision making under uncertainty, and is used to measure the largest amount a

decision maker would be willing to pay in return for perfect information [45]. Given its

definition, EVPI is suitable for the conventional stochastic programming method instead

of DRO, because the true probability distribution required to calculate the expectation

is not known. In this case, the EVPI is $0.42MM for the stochastic programming

method. As mentioned before, the conventional stochastic programming method can be

62
considered as a special case of the proposed data-driven DRO approach, when the radius

of the Wasserstein ball is set to be zero.

Table 1. Comparisons of problem sizes and computational results of the deterministic

optimization, the conventional stochastic programming method and the data-driven

Wasserstein DRO approach.

Binary Continuous Constraints Objective CPU time (s)


variables variables value ($MM)
Deterministic 216 777 1,165 16.36 6.0
optimization
Stochastic program 216 6,937 9,217 16.37 6.1
WDRC 216 7,094 9,373 21.67 9.7

To demonstrate the advantages of the proposed approach, we perform an out-of-sample

simulation with the testing feedstock price data. The testing dataset consists of different

feedstock price realizations which are obtained from the same source [253]. The number

of testing samples is 60 in this case study. For each method, we calculate their average

cost, worst-case cost, best-case cost, and standard deviation of cost under different

uncertainty scenarios. These statistics of simulation results are summarized in Table 2.

It can be seen from the table that the average cost and the worst-case cost of the proposed

data-driven DRO approach using Wasserstein metric are 5.7% and 17.4% lower than

those determined by the conventional stochastic programming method, respectively.

Additionally, compared to the stochastic programming method, the data-driven

Wasserstein DRO approach leads to a biomass with agricultural waste-to-energy

network design that is less sensitive to feedstock price variations. Specifically, the costs

under different feedstock price scenarios determined by the data-driven Wasserstein

DRO approach feature a 37.1% smaller standard deviation ($3.41MM vs $5.42MM)

63
than its stochastic programming counterpart. The simulation results clearly demonstrate

that the data-driven Wasserstein DRO approach compares favorably against the

deterministic optimization method and the conventional stochastic programming

method in terms of the out-of-sample performance. To showcase the difference between

the stochastic programming and DRO, we present the empirical probability distributions

of total cost for both methods in Figure 5.

Table 2. The out-of-sample performance of the deterministic optimization, the

conventional stochastic programming method and the data-driven Wasserstein DRO

approach.

Average cost Worst-case Best-case cost Standard deviation


($MM) cost ($MM) ($MM) ($MM)
Deterministic 22.53 33.38 11.98 5.42
optimization
Stochastic program 22.53 33.38 11.98 5.42
Data-driven WDRO 21.24 27.57 14.42 3.41

Figure 5. The empirical probability distributions of total cost for (a) the stochastic

programming method, (b) the proposed data-driven Wasserstein DRO approach.

The network designs determined by the conventional stochastic programming method

and the proposed data-driven WDRO approach are presented in Figure 6 and Figure 7,

64
respectively. Note that the deterministic optimization method, which uses the average

value of feedstock prices, generates the same optimal network design as the stochastic

programming method. The similar results generated by the deterministic optimization

and stochastic programming methods are ascribed to the specific parameter setup and

ranges of uncertainties in this specific case study. For all these methods, the biomass

feedstock of soybeans is selected to produce biodiesel through technologies of handling

and extraction, transesterification and distillation. Polyhydroxybutyrate (PHB – a

biodegradable plastic) is produced as a byproduct. Soybeans are chosen in the biodiesel

process producing glycerol, which can be used to synthesize PHB. The pyrolysis of

switchgrass is selected in the network because of its ability to produce raw bio-oil. This

type of bio-oil can be transformed into a number of products [255], boosting the

process’s flexibility. Fermentation of cassava is also selected by both methods to make

ethanol. Note that the anaerobic digester (AD) converts dairy manure into biogas, which

can be used as a fuel source or further utilized as a material in other chemical reactions

[256]. As shown in Figure 6 and Figure 7, the best way to produce biogas from dairy

manure is through horizontal plug flow AD. Meanwhile, municipal solid waste is used

to produce methane via landfill methane extraction. By comparing the optimal

processing pathways in Figure 6 and Figure 7, the pathway of glycerol to isobutanol is

selected only in the optimal network determined by the stochastic programming method.

65
Figure 6. The optimal bioconversion network design determined by the stochastic

programming approach. The optimal production capacity is displayed under processes.

66
Figure 7. The optimal bioconversion network design determined by the data-driven

Wasserstein DRO approach. The optimal production capacity is displayed under

processes.

The details on cost breakdowns, including capital cost, operating cost, and feedstock

cost are shown in Figure 8. From the donut charts, we can see that more than half of the

total annualized cost comes from purchasing feedstocks for both the stochastic

programming method and the Wasserstein DRO approach. Additionally, the percentage

of the feedstock cost determined by the stochastic programming method is 3% higher,

because a larger quantity of soybeans is purchased in the optimal network design. For

both optimization methods, the capital cost contributes to the second largest portion,

67
meaning that the selection and capacities of technologies play a critical role in lowering

the total annualized cost.

Figure 8. Cost breakdowns determined by (a) the stochastic programming method, (b)

the proposed data-driven Wasserstein DRO approach.

To take a closer look at the capital costs of different approaches, we present the capital

cost distributions determined by the stochastic programming and Wasserstein DRO

methods in Figure 9 (a) and (b), respectively. From Figure 9 (a), we can see that the

majority of capital cost (71.1%) determined by the stochastic programming method

comes from landfill methane extraction and glycerol to isobutanol process. It indicates

that the processing pathways used to produce methane and isobutanol are expensive to

build. Switchgrass pyrolysis accounts for 6.8% of the capital cost, showing that this

technology is a costly component in the pathway from switchgrass to diesel and

gasoline. As for the capital cost distribution determined by the data-driven Wasserstein

DRO approach, we can see from Figure 9 (b) that the landfill methane extraction

contributes to 61.7% of the capital cost and accounts for the largest portion, which is

similar to the result of the stochastic programming method. The second largest cost

68
(9.4%) comes from switchgrass pyrolysis, thus again showing the significance of this

technology in the switchgrass processing pathway. Additionally, technologies regarding

the cassava, including cassava peeling and crushing, cassava fermentation, and cassava

distillation, together account for merely 2.2% of the capital cost, implying that the

pathway producing ethanol from cassava is economically favorable. Note that the ratios

between capital investment costs for the two optimization methods can be different. The

reason for different ratios is that the optimal capacities of some technologies obtained

by the stochastic programming method and the DRO approach are not the same.

Figure 9. Capital cost distributions determined by (a) the stochastic programming

method, (b) the proposed data-driven Wasserstein DRO approach.

Following the existing literature [252], the discount rate is set to be 10%. To investigate

the impact of the discount rate on the computational results of the Wasserstein DRO

approach, we conduct a sensitivity analysis and present the result in Figure 10. From

the figure, we can see that the objective value of the DRO approach increases by 18.0%

when the discount rate changes from 5% to 10%. Additionally, the objective value

grows by 17.3% if we further increase the discount rate from 10% to 15%. Note that the
69
optimal investment decisions, i.e. technology selection and capacity, do not change

when the discount rate increases from 5% to 15%.

Figure 10. Sensitivity analysis of discount rate for the data-driven Wasserstein DRO

approach.

To investigate how the in-sample objective value, out-of-sample average cost, and

computational time of (WDRO) change with the radius of Wasserstein ball, we perform

a sensitivity analysis and present results under different values of parameter θ in Figure

11. The value of θ specifies the size of the Wasserstein ambiguity set, so the decision

maker can use it to adjust the level of conservatism. The ambiguity set encapsulates

more probability distributions as the value of θ increases, meaning that more

distributions are hedged against in the (WDRO) model. Since the (WDRO) model

optimizes the worst-case expected cost with respect to the ambiguity set, increasing the

value of parameter θ leads to a higher in-sample objective value as observed in Figure

11. Additionally, we can observe that the out-of-sample average cost corresponding to

70
the testing samples decreases from $22.53MM to $21.24MM, when the radius of

Wasserstein ball changes from 0.01 to 0.03. When the radius further increases from 0.03

to 0.15, the out-of-sample performance in terms of average cost remains the same. The

optimal Wasserstein radius obtained from cross-validation is 0.25, which results in the

same out-of-sample average cost ($21.24MM) as the radii that perform best on the

testing data in Figure 11. From the orange line in Figure 11, we can see that increasing

the radius of Wasserstein ball does not add computational burden, and that the

computational time for solving the corresponding (WDRO) problem varies from 7.1 s

to 15.8 s.

Figure 11. Sensitivity analysis of the in-sample objective value, out-of-sample average

cost, and computational time with different radii of Wasserstein balls.

To demonstrate the efficiency of the proposed solution algorithm, we display the upper

and lower bounds in each iteration of the algorithm for the instance with Wasserstein

radius of 0.1 in Figure 12. In this figure, the green dots represent the upper bounds, and

71
the yellow circles stand for the lower bounds. The X-axis represents the iteration

number, and the Y-axis denotes the objective function values. From the figure, it can be

seen that the relative optimality gap decreases significantly from 57.9% to 9.8% during

the first two iterations. The reformulation-based branch-and-refine algorithm takes only

three iterations to reduce the relative optimality gap to 0.0%. The result demonstrates

that this solution algorithm works in solving the (WDRO) network design problem.

Figure 12. Upper and lower bounds in each iteration of the reformulation-based

branch-and-refine algorithm for global optimization of the (WDRO) problem in the

case study.

To further explore the impacts of the number of training uncertainty data on the

computational results, we increase the amount of training samples used in the

optimization methods. In this case study, the number of training samples N increases

from 12 to 100. Specifically, we consider a case study in which 100 feedstock price

uncertainty realizations are used in the optimization problems, and another 100

72
uncertainty data are utilized for testing their out-of-sample performances. As the number

of training data increases, the computational times of both stochastic programming and

data-driven Wasserstein DRO methods grow to 14.9 seconds and 30.5 seconds,

respectively. The size of training samples does not influence the problem size of

deterministic optimization, which utilizes the average of training data as the nominal

values of parameters. By contrast, the size of training data has an impact on the problem

sizes of stochastic programming and data-driven Wasserstein DRO. This is because the

number of constraints and continuous variables for both methods increases, as the

amount of training samples grows. The results of the out-of-sample performance for

different methods are plotted in Figure 13, where the green diamonds denote the

Wasserstein DRO solution, and the orange circles represent the stochastic programming

solution. For clear visualization, their average costs over all testing scenarios are

represented as the horizontal lines in the figure. Additionally, the statistics of out-of-

sample performances under the larger amount of uncertainty data are summarized in

Table 3. As can be seen from the table, the data-driven Wasserstein DRO approach still

outperforms the conventional stochastic programming method by lowering the average

cost by 3.8% for the testing dataset. We investigate the dependence of this average cost

reduction as well as the dependence of standard deviation reduction on the number of

testing samples, and present the results in Figure 14. It can be observed from the figure

that both the average cost reduction and the standard deviation reduction change slightly

when the number testing samples is above the threshold of 60.

73
Figure 13. Out-of-sample performance of stochastic programming and the proposed

Wasserstein DRO method based on the testing of 100 uncertainty scenarios.

Table 3. The out-of-sample performance of the deterministic optimization, the

conventional stochastic programming method and the data-driven WDRO approach

when the number of training data N=100.

Average cost Worst-case Best-case cost Standard deviation


($MM) cost ($MM) ($MM) ($MM)
Deterministic 20.43 33.38 10.39 5.94
optimization
Stochastic program 20.43 33.38 10.39 5.94
Data-driven WDRO 19.66 27.57 13.04 3.90

74
Figure 14. The dependences of the average cost reduction and standard deviation

reduction on the number of testing samples.

2.6 Summary

In this work, we proposed a data-driven two-stage Wasserstein distributionally robust

optimization model for the biomass with agricultural waste-to-energy network design

subject to feedstock price uncertainty. Based on the Wasserstein metric and support set,

the data-driven ambiguity set was constructed that encompassed all candidate

distributions of feedstock price. This ambiguity set was formulated as the Wasserstein

ball with a variable radius, which granted more flexibility in adjusting the level of

conservatism compared with other moment-based ambiguity sets. The resulting

Wasserstein distributionally robust optimization approach not only endowed the

operational decisions with full adaptability, but also hedged against the distributional

ambiguity by considering the worst-case expected cost. To improve its tractability, we

75
derived an equivalent distributionally robust counterpart for the network design problem

by taking advantage of various reformulation techniques to handle the worst-case

expectation. A case study on the biomass with agricultural waste-to-energy network

design was presented. Computational results showed that increasing the size of

ambiguity set did not result in more computational time, and that the proposed method

worked under a large amount of training data. As for the out-of-sample performance,

the proposed approach compared favorably against both deterministic optimization and

stochastic programming by achieving a lower average cost. Specifically, through cross

validation, the optimal Wasserstein radius was tuned to be 0.25, which generated the

best out-of-sample average cost of $21.24MM on the testing samples. The advantage of

the distributionally robust optimization approach lies in its robustness to hedge against

distributional ambiguity. If the underlying true distribution is invariant and the number

of training samples is large enough, the ambiguity of probability distribution is

insignificant. Therefore, the advantage of using ambiguity set, whose size is nonzero, is

not evident. However, when the number of training samples is limited or the probability

distribution is time-variant, the distributional ambiguity is remarkable. In this case, the

merit of the distributionally robust optimization approach becomes more manifest. The

dependence results of average cost reduction and standard deviation reduction on the

number of testing samples showed that these reduction values remained relatively stable

when the number testing samples was above 60.

76
2.7 Appendix: Derivation of Wasserstein distributionally robust

counterpart

For the ease of exposition, we present (WDRO) model for biomass with agricultural

waste-to-energy network design in the following abstract form, in which vectors and

matrices are introduced.

min f  x   max   l  x, ξ  


x X 

s.t. Ax  g (2.34)
min cTy y   Gξ T y

l  x, ξ    y
 s.t. Wy  h  x 

where x denotes the vector of all first-stage decisions including Yj and Qj; y is the vector

of second-stage decisions Wj, Pi, and Si; f(x) represents the first-stage cost; ξ is the vector

of uncertain parameters c3,i; and l(x, ξ) presents the second-stage cost. Note that the

second-stage cost is divided into two parts, namely the deterministic cost c Ty y

unaffected by uncertainty and the random cost (Qξ)Ty depending on the specific

uncertainty realization.

In the following paragraphs, we explain the reformulation of the worst-case expectation

max   l  x, ξ   step-by-step. Based on the definition of data-driven Wasserstein




ambiguity set in (2.2), we can re-express the worst-case expectation max   l  x, ξ  


as the following optimization problem (2.35)-(2.37) for any feasible first-stage

decisions [245]:

1 N
max
i 
 l  x, ξ  n  dξ 
N n1 
(2.35)

77
1 N
s.t. 
N n1 
ξ  ξ  n n  dξ    (2.36)

1 1
 n  dξ   , n  N (2.37)
N  N

where  denotes the set of measures, and n is the conditional probability

distribution of ξ1=ξ given that ξ2=ξ(n).

According to the strong duality of the generalized moment problem [257], we can obtain

its dual optimization problem given by (2.38)-(2.40).

1 N
min    +
 , sn
 n
N i 1
(2.38)

s.t. l  x, ξ     ξ  ξ  n    n , ξ  , n  N d (2.39)

0 (2.40)

where λ and τn are the dual variables corresponding to constraints (2.36) and (2.37),

respectively.

Since constraint (2.39) holds for any uncertainty realizations within the support set Ξ,

we can reformulate constraint (2.39) below.

max l  x, ξ     ξ  ξ      n , n  N d
n
(2.41)
ξ  

The left-hand side of constraint (2.41) can be re-expressed by,

ξ  zn * 

max l  x, ξ   max z n T ξ  ξ  
n


 max min l  x, ξ   z n T ξ  ξ  n 
ξ z n *   
 (2.42)

 min max l  x, ξ   z n T
z n *   ξ 
ξ  ξ  
n

78
where  * denotes the dual norm and zn is the introduced decision variables. Since l1

norm is adopted in this work, the corresponding dual norm is l∞ norm. Note that the

second equality is based on the minimax theorem [124].

Based on (2.41) and (2.42), we further reformulate constraint (2.41) as follows.

max l  x, ξ   z n T ξ  ξ 
ξ 
 n
   , n  N
n d (2.43)

zn *
  , n  N d (2.44)

For the ease of derivation, we express the support set (2.3) in the following compact

matrix form:

  ξ Cξ  d (2.45)

I   ξ max 
where C    and d   min  .
 I  ξ 

According to the definition of l(x, ξ) in (1.6), we further reformulate the left-hand side

of constraint (2.43) below.


max 
 
min
ξξ Cξ d y n  y n Wy n  h x     
cTy y n   Gξ T y n   z n T ξ  ξ  n  



 min max  cTy y n   Gξ  y n  z n T ξ  ξ   
y n y n Wy n  h x  ξξ Cξ d 
T n
   (2.46)

 min min d T γ n  cTy y n  z n Tξ  n  


  
y n  y n Wy n  h x  γ n  γ n CT γ n G T y n  z n  

where γn represents the vector of dual variables corresponding to constraints Cξ  d .

Note that the first equality in (2.46) is due to the minimax theorem [124], and the second

equality holds because of the strong duality of linear programs.

79
By substituting z n  G T y n  CT γ n into constraints (2.43) and (2.44), and replacing the

left-hand side of constraint (2.43) with (2.46), we equivalently reformulate the (WDRO)

model (1.6) to (2.47).

N
1
min
x ,  , sn , n , y n , γ n
f  x     +
N

i 1
n

s.t. Ax  g, x  X

Gξ    
T T
y n  cTy y n  d  Cξ 
n
γ n   n , n  N d
n
(2.47)
Wy n  h  x  , n  N d
G T y n  CT γ n   , n  N d
*

γ n  0, n  N d

2.8 Nomenclature

Sets

I set of compounds, indexed by i

J set of technologies, indexed by j

Nd set of training uncertainty data, indexed by n

P set of partition points, indexed by p

Parameters

a1,j lower bound of the capacity of technology j

a2,j upper bound of the capacity of technology j

bi availability of compound i

c1,j coefficient for economic evaluation

c2,j coefficient for economic evaluation

c3,i coefficient for economic evaluation of feedstock price

80
c4,i price of bioproduct i

κij conversion coefficient of compound i in technology j

θ radius of the Wasserstein ball

Binary variable

Yj selection of technology j

SOS1 variable

PEj,p indicator for interval p of technology j

Continuous variables

Ej approximated nonlinear term of technology j

Pi quantity of feedstock i to purchase

PWj,p weighting factor for partition point p of technology j

Qj the capacity of technology j

Si quantity of compound i to sell

Wj operating level of technology j

81
CHAPTER 3
DEEP LEARNING BASED AMBIGUOUS JOINT CHANCE CONSTRAINED
ECONOMIC DISPATCH UNDER SPATIAL-TEMPORAL CORRELATED WIND
POWER UNCERTAINTY

3.1 Introduction

Economic dispatch (ED) is one of the most fundamental decision-making problems in

power systems operations [258]. It seeks to determine the optimal power output of

available generators for serving electricity demand with the minimum operating cost

[259]. With a pressing need to reduce carbon emissions from fossil fuels, the penetration

of renewable energy sources, especially wind power, into power grids has increased

rapidly in recent years. However, such a high penetration poses a threat to the security

and reliability of large-scale power systems due to the intermittency of renewable power

generation. Therefore, it is of particular importance to investigate the ED problem

subject to the wind power uncertainty [260].

Tremendous research efforts have been devoted to developing ED optimization under

uncertainty models for an effective utilization of wind power. Methods along this

direction can be broadly categorized into three paradigms, namely, robust optimization,

stochastic optimization, and distributionally robust optimization [261]. A robust multi-

period ED model was presented based on a dynamic uncertainty set, which modeled the

temporal and spatial correlations of wind output using linear systems [262]. By

regarding uncertainty sets as decision variables, a robust dispatch framework was

proposed to identify the do-not-exceed limit of renewable generations [263]. In [264],

the storage device dynamics and renewable energy variability were considered in an
82
affinely adjustable robust multi-period power dispatch. A two-level robust method was

developed to address the multi-microgrid ED problem subject to wind power and tie-lie

disconnection uncertainties [265]. To reduce the curtailment of wind power, an interval

output schedule of wind farms and a set-point schedule of conventional generators were

introduced in a look-ahead robust ED formulation [266], as well as in an adaptive robust

formulation [267]. The aforementioned robust ED methods typically suffer from the

issue of conservatism, because they aim to immunize against the worst-case realization

and fail to leverage the available information of wind power distribution.

An alternative approach was stochastic ED [268, 269], which mitigates the conservatism

by exploiting the probability distribution of renewable energy generation. The

percentage of wind energy utilization in ED was ensured by casting it as chance

constraints [270]. In [271, 272], versatile distribution models were suggested and

applied to the stochastic ED problem. To cope with non-Gaussian distributions and

spatial correlations among multiple wind farms, the Gaussian mixture model was

introduced to chance constrained ED problems [273, 274]. In [275], beta kernel density

representation was employed to model wind power distribution in a look-ahead ED

problem with individual chance constraints. To account for joint chance constraints in

power dispatch, an iterative bounding technique was developed using support vector

classification [276]. Recently, chance constrained ED problems were handled with a

scenario approach [277, 278]. In this scenario-based ED method, a quantifiable risk of

constraint violation was guaranteed with a sufficiently large number of uncertainty data

drawn from the underlying true distribution. The above research works on stochastic

ED heavily hinge on the accurate knowledge regarding wind power distributions.

83
However, such perfect information on the probability distribution of uncertain wind

power is far-fetched in practice. Instead, power system operators typically only have

access to a certain amount of renewable energy data for the ED problem.

Nowadays, data-driven optimization under uncertainty is gaining tremendous popularity

[221], and it emerges as a promising paradigm for decision making in electric power

systems [136, 279-282]. The distributionally robust chance constrained ED framework

is one of those attempts, aiming to take advantage of both robust optimization and

stochastic programming. Rather than assuming an exactly known distribution, it

constructs a family of probability distributions, called the ambiguity set, from a finite

number of data. For distributionally robust power dispatch problems, ambiguity sets for

renewable energy were commonly characterized by moment information [283-287], as

well as additional unimodality information [288]. By using both mean and covariance

to describe the ambiguity set of wind power, a distributionally robust ED model was

proposed for hydro-thermal-wind power systems [289]. To further improve the

flexibility of power systems, the co-optimization of ED and the do-not-exceed limit was

cast as a distributionally robust joint chance constrained program, which focused on the

marginal distribution information without incorporating correlations [290]. A two-sided

distributionally robust chance constraint on transmission line capacity limit, which was

then reformulated as second-order cone constraints, was proposed in [291]. A robust

chance constrained power dispatch problem was investigated based on an ambiguity set

consisting of Gaussian distributions whose mean and variance were within certain

ranges [292]. The co-dispatch of energy, reserve and storage was suggested, and the

confidence bands of cumulative distribution function for a one-dimensional random

84
variable were used for constructing ambiguity sets [293]. In addition, another widely

adopted means of constructing an ambiguity set is via the notion of statistical distance

between probability distributions [137, 294-297]. The existing distributionally robust

chance constrained ED models primarily focus on individual chance constraints,

ignoring the spatial-temporal correlations of wind power uncertainty embedded within

different constraints. However, we should consider simultaneous constraint satisfaction

for the reliability of power systems, and incorporate the complicated, and likely

nonlinear, correlations in order to further reduce the conservatism of the optimal ED

solutions.

To fill the knowledge gap, we propose a novel data-driven ambiguous joint chance

constrained ED optimization framework by taking advantage of a powerful

unsupervised deep learning technique. To decipher the hidden spatial-temporal

correlations of wind power uncertainty, a class of generative adversarial networks

(GANs), namely f-GAN, is employed. Through a competition by a pair of neural

networks, namely generator and discriminator, the underlying true distribution is

implicitly modeled as a mapping represented by the generator network from a latent

space to an uncertainty space. Notably, GANs are extremely effective in learning

complex distributions of high-dimensional data samples, without presuming any

specific forms of probability distributions [298]. These salient features of GANs make

it desirable for optimization under uncertainty [221, 299]. Based on the extracted

uncertainty information, an ambiguity set for wind power distributions is devised as an

f-divergence ball in the probability space centered around the distribution embodied in

the generator network. Note that the f-divergence not only plays a key role in designing

85
a unified framework of deep GANs, but also in providing a natural way to characterize

the distance-based ambiguity set. Rather than disregarding uncertainty correlations and

enforcing ambiguous individual chance constraints separately, the proposed framework

accounts for the simultaneous satisfaction of multiple constraints. As a result, the

proposed framework is capable of greatly alleviating the conservatism of ED solutions.

The proposed ED optimization framework ensures that the worst-case probability of

violating constraints with regard to wind utilization is below a tunable risk level. To

facilitate its implementation in large-scale power systems, we develop an efficient

solution method by combining a reformulation technique and a scenario approach.

Additionally, a theoretical bound on the required number of wind scenarios generated

by f-GAN is established for the multiperiod ED problem. To reduce the number of

constraints in the resulting scenario program, a prescreening technique is further

developed through the exploitation of the ED problem structure. An illustrative six bus

and IEEE 118 bus test systems are used to demonstrate the effectiveness of the proposed

ED framework.

The major contributions of this chapter are summarized as follows.

 A novel deep learning based ambiguous joint chance constrained ED

optimization framework that accounts for the spatial-temporal correlations of

wind power;

 An f-divergence based ambiguity set for wind power distributions using f-GAN,

of which the training objective intimately aligns with the choice of divergence

for characterizing ambiguity set;

86
 A tailored solution method for ambiguous joint chance constrained ED problem

based on the integration of the reformulation technique with the scenario

approach;

 Theoretical bound on the data complexity of f-GAN for the ambiguous joint

chance constrained multi-period ED problem to guarantee constraint violation

within a tunable risk level.

3.2 Mathematical formulation

In this section, a distributionally robust or ambiguous joint chance constrained ED model

formulation considering intermittent wind power is presented. For the multi-period ED

problem, one schedules energy production and allocation at each time period to minimize

the total cost. The decisions include conventional thermal energy dispatch, wind power

dispatch, and load shedding amount. The available wind power is assumed to be

uncertain.

The multi-period ambiguous joint chance constrained ED optimization model is

formulated as follows. The objective of the ED problem is to minimize the total cost in

eq. (3.1). The total cost includes the operating cost of thermal units, as well as electricity

load shedding cost. Eq. (3.2) enforces the energy balance for each time period. The

minimum and maximum power output limits of each thermal unit are specified in

Constraint (3.3). Constraints (3.4)-(3.5) enforce the ramping rate limits of thermal units

for each time period. The capacity constraints for transmission lines are described in

Constraints (3.6)-(3.7). Constraint (3.8) specifies the non-negativity of ED decisions.

The ambiguous joint chance constraint (3.9) requires that, with a worst-case probability

87
of at least 1   , the power outputs of wind farms cannot exceed the random wind power

[133], and that the percentage of wind utilization is at least β. The satisfaction of these

two requirements is a random event because the available wind power is random and

wind power dispatch wbt is a scheduled quantity [270]. The wind power Wbt is subject to

uncertainty, and  is a set of its probability distributions or ambiguity set. A list of

indices/sets, parameters, and variables is given in the Nomenclature section. The

ambiguous joint chance constraint enjoys the following advantages. First, it provides a

stronger guarantee on overall power systems security than individual chance constraints

by enforcing simultaneous constraint satisfaction. Second, unlike robust optimization, it

enables a systematic trade-off between economic performance and the risk level of

constraint violation [32]. Additionally, compared with conventional joint chance

constraints, it is well capable of hedging against distributional ambiguity due to the finite

amount of available wind power data.

 
min   CiV pit   CbLS qbt  (3.1)
 i t
pit , wbt , qbt
b t 

 
s.t.   p it  wbt    Dbt   qbt , t (3.2)
b  iGb  b b

Pi min  pit  Pi max , i, t (3.3)

pit  pi ,t 1  RU i , i, t (3.4)

pi ,t 1  pit  RDi , i, t (3.5)

 
K   p bl it  wbt   Dbt  qbt    Fl , l , t (3.6)
b  iGb 

88
 
K   p
bl it  wbt   Dbt  qbt     Fl , l , t (3.7)
b  iGb 

pit , wbt , qbt  0, i, b, l , t (3.8)

 wbt  Wbt , b, t 


 
inf     1  (3.9)
  t wbt    b t Wbt

b


The ED problem is cast as the above ambiguous joint chance constrained program.

Notably, the joint feature of ambiguous chance constraints not only leads to a less

conservative model formulation, but also enables further incorporation of correlations

among uncertain wind power at different buses and time periods. By contrast,

conventional distributionally robust chance constrained ED focuses on ambiguous

individual chance constraints (3.10)-(3.11) for the ease of reformulation. Such

ambiguous individual chance constraints can be unreasonably conservative since the

constraints for each bus and each time period are respected separately.

 
inf  wbt  Wbt  1  ˆbt , b, t

(3.10)

 
inf   wbt     Wbt   1  ˆ (3.11)

b t b t 

Given the ambiguous individual chance constraints, the existing works typically

construct ambiguity set  by employing the mean and variance of wind power

generation. The knowledge gap to fill is a data-driven ambiguity set that is capable of

accurately capturing complicated spatial-temporal correlations among wind power

sources. Such an informative ambiguity set can be seamlessly integrated with the

developed ambiguous joint chance constrained ED formulation.

89
3.3 Deep learning based ambiguous joint chance constrained

economic dispatch optimization

In this section, we propose a novel deep learning based ambiguous joint chance

constrained optimization for ED with a high penetration of renewable wind energy. We

first present an introduction to the f-divergence and f-GANs. Then, we develop a deep

learning based ambiguity set of wind power distributions with the f-divergence. Finally,

a data-driven ambiguous joint chance constrained ED model is presented based upon

the mathematical formulation in Section 3.2.

3.3.1 The f-Divergence and f-GANs

In this subsection, we first present the f-divergence. To measure the discrepancy between

two probability distributions P and Q, we introduce the f-divergence, also known as ϕ-

divergence, as follows:

 dP 
Df  P Q   f   dQ (3.12)

 dQ 

where function f is a convex, lower semi-continuous function satisfying 𝑓 1 0. The

f-divergence is widely used in the field of information theory and machine learning. It is

general enough to encompass a wide variety of divergences, such as KL divergence

(𝑓 𝑟 𝑟𝑙𝑜𝑔 𝑟 ), total variation ( 𝑓 𝑟 |𝑟 1| ), Pearson χ2-divergence ( 𝑓 𝑟

𝑟 1 ), and among others.

The functions of the f-divergence in this work are two-fold. On one hand, the divergence

represented by eq. (3.12) provides a powerful way to characterize ambiguity sets. On the

other hand, it plays a critical role in defining the objective function for a f-GAN model.

90
In this sense, the f-divergence offers a unique vehicle to link the ambiguity set used in

ambiguous joint chance constraints with the deep learning model.

The promise of GANs is to decipher rich and hierarchical structures to characterize the

complicated wind power distribution, which exhibits spatial-temporal correlations. As a

powerful generalization of vanilla GAN, the f-GAN has achieved great success in

machine learning and computer vision [300]. The f-GAN model leverages the variational

formulation of eq. (3.12), which is presented below [301]:

 
D f  P Q   max  x ~ P   x     x ~ Q  f *   x   

(3.13)

where  denotes the expectation, and the maximum in the above variational formulation

is taken over all possible functions φ. The set of functions can be well approximated by

the expressive class of deep neural networks. Note that f * in eq. (3.13) represents the

convex conjugate function of f, which is defined as f *  t   sup u tu  f  u  .

In the f-GAN architecture, there are two deep neural networks, namely generator network

G and discriminator network T. Suppose the generator network is parametrized by θ,

while the discriminator network is parametrized by ω. The functions representing

generator and discriminator are denoted by G   and T    , respectively.

Generator: The generator network takes the noise vector Z with a known distribution Z

as input, and outputs data samples through up-sampling operations of deconvolutional

layers and fully connected layers. Note that Z is typically selected as uniform

distribution or Gaussian distribution. The goal of generator network G is to fool the

discriminator network by generating data samples that mimic the real wind power data.

This is accomplished by tuning parameter θ to transform distribution Z to some


91
distribution G that is “close” to the underlying true data distribution r in terms of the

f-divergence. The loss function of generator network G, denoted by LG, is given as

follows.

  
LG    Z  f * T  G  Z   

(3.14)

where LG is utilized to update parameter θ. A small value of LG implies that the data

samples produced by generator G look realistic through the eyes of discriminator T.

Discriminator: The generative model is pitted against the discriminator model. Given a

generator network, the real data samples and the samples generated by network G are fed

into the discriminator network T. Specifically, the discriminator network aims to

distinguish the generated wind power data from real ones. It employs a series of down-

sampling operations to generate a scale value T  x  , where x is either a sample from the

true data distribution or distribution G induced by the generator network. Let X be a

random variable drawn from the true data distribution r . The loss function that the

discriminator seeks to minimize is expressed by,

  
LT    X T  X     Z  f * T  G  Z   

(3.15)

where LT represents the loss function of the discriminator, and is employed to update

parameter ω. In light of eq. (3.13), the discriminator network T serves as a critic that

quantifies the f-divergence between r and G .

Formally, the f-GAN is framed as the following two-player minimax game with value

function V  G , T  :

    
min max V  G , T    X T  X     Z  f * T  G  Z   

(3.16)

92
where function T  x   g f  H   x   . The output activation function g f :   dom f * is

introduced to respect the domain of a conjugate function f* in eq. (3.16), while H   x 

is a function parameterized by ω that is exempt from output restriction.

The competition between the generator network and the discriminator network in the

above minimax game drives both deep neural networks to refine their model parameters.

Eventually, wind power data produced by the generator network become

indistinguishable from the true data [300].

Notably, f-GAN is general enough to incorporate several types of GANs. For example,

the vanilla GAN can be regarded as a special case when f-divergence is chosen as the

Jensen-Shannon divergence [302]. Additionally, by feeding random noise vectors, the

generator network in f-GAN is capable of efficiently sampling “realistic” wind power

data from a probability distribution induced by the weights of the feedforward neural

network. This salient feature is of particular significance in solving the resulting

ambiguous joint chance constrained ED problem using a scenario approach.

3.3.2 Deep learning based ambiguity set

In this subsection, we develop a novel ambiguity set for wind power distributions based

on f-GANs. Given infinite capacities of networks G and T, and access to an infinite

amount of wind power data, distribution G is exactly the same as distribution r from

a theoretical point of view [302], and their f-divergence reduces to zero accordingly.

However, in practice, probability distribution G induced by generator network G

might not be perfect due to the finite amount of available training wind power data.

93
Therefore, one needs to construct a family of distributions based on the information

obtained by deep learning, rather than relying on a single probability distribution.

To hedge against the ambiguity of wind power distribution, we develop a set of

distributions using the f-divergence and the deep generator network in f-GAN, which is

expressed by,


  w   D f  w G     (3.17)

where w denotes the probability distribution for the random vector of wind power Wbt,

 represents the set of all probability distributions, and ρ is the divergence tolerance

or radius of divergence ball. Parameter ρ can be used to adjust the size of the ambiguity

set, which reflects the risk-aversion level of decision-makers.

The developed ambiguity set is devised as a f-divergence ball centered around

distribution G induced by the generator network in f-GAN. Unlike typical divergence-

based ambiguity sets that adopt the discrete empirical distribution as a reference

distribution, the proposed ambiguity set utilizes the continuous wind power distribution

induced by generator network, namely G , as a reference. As a result, the proposed one

is effective in encompassing the continuous underlying true distribution r by avoiding

the possession of discrete support.

The proposed deep learning based ambiguity set enjoys several advantages. First,

compared with conventional moment-based ambiguity set, the proposed deep learning

based ambiguity set can accurately capture the spatial-temporal correlations of wind

power at different buses and time periods. Second, the proposed method makes no

assumption on the specific form of wind power distributions owing to the power of deep

94
generative modeling. Another nice feature is stated as follows: if a certain f-divergence

is chosen in f-GAN to learn distribution G , the same type of f-divergence can be

employed for constructing an ambiguity set. Specifically, if χ2-divergence is adopted in

f-GAN, we know from the previous subsection that the learned distribution G should

be quite close to the true data distribution r in terms of χ2-divergence. Thus, employing

the χ2-divergence to characterize the ambiguity set, rather than using other types of f-

divergence, potentially mitigates the conservatism issue of the corresponding

distributionally robust ED solutions.

3.3.3 Data-driven joint chance constrained ED model

Equipped with the data-driven ambiguity set (3.17), a deep learning based ambiguous

joint chance constrained ED optimization model, denoted by (DL-AJCC-ED), is

formulated as follows. For the ease of exposition, the model is represented in a compact

form with matrices and vectors.

min cT x (3.18)

s.t. x  S (3.19)

 
 ai  x  ξ  bi  x  , i  1   ,   
T
(3.20)

where x denotes the vector of decision variables including pit, wbt, and qbt; c denotes the

vector of cost coefficients, and ξ is the vector of uncertain wind power. The objective

function (3.18) represents eq. (3.1), while set S stands for a domain defined by

deterministic constraints without uncertain parameters, including constraints (3.2)-(3.8)

. Notably, data-driven joint chance constraint (3.20) corresponds to constraint (3.9),

95
leveraging the deep learning based ambiguity set  .

The ED problem with a large penetration of wind power is formulated as an ambiguous

joint chance constrained program. However, the resulting optimization problem cannot

be solved directly by any off-the-shelf optimization solvers, since constraint (3.20)

involves an infinite number of probability distributions.

3.4 Solution methodology

To address the computational challenge of solving large-scale (DL-AJCC-ED) problem,

we develop an efficient solution method that integrates reformulation and constraint

sampling techniques in this section.

3.4.1 Ambiguous joint chance constraint reformulation

For constraint (3.20), the corresponding joint chance constraints must be satisfied for all

probability distributions within the deep learning based ambiguity set. Therefore, we

consider the worst-case distribution, and ensure that the joint chance constraints are

respected under the worst-case distribution, as dictated in (3.21).


 T

inf  ai  x  ξ  bi  x  , i  1   (3.21)

where the infimum is taken over the wind power distributions in the deep learning based

ambiguity set.

Since the χ2-divergence is appropriate for small risk levels [280], we employ it for

training f-GAN and constructing the ambiguity set for the rest of this chapter. The

ambiguous joint chance constraint (3.21) can be further reformulated based on the

following proposition, of which the proof can be found in [152].

96
Proposition 3.1: Given a χ2-divergence based ambiguity set, the ambiguous joint chance

constraint (3.21) is respected if and only if the following classical joint chance constraint

(3.22) is satisfied.

 
G ai  x  ξ  bi  x  , i  1   
T
(3.22)

where   denotes the adjusted risk level, given as follows.

 2  4      2   1  2  
    (3.23)
2  2

By applying Proposition 1, we transform the ambiguous joint chance constrained

program (DL-AJCC-ED) given in eq. (18) -(20) into an ambiguity-free counterpart,

given in eq. (18), (19) and (22), with probability distribution G induced by the deep

generator network.

3.4.2 Scenario approach and sample complexity

Since f-GAN is an implicit generative model, there is no explicit expression for

distribution G . As a result, it is computationally challenging to further derive an

analytical reformulation for constraint (3.22). To address this challenge, we leverage the

efficient sampling power of GANs, and employ the scenario approach.

The scenario approach, a.k.a. constraint sampling, has been widely used in a variety of

optimization problems. The key idea of the scenario approach is to draw independently

identically distributed random scenarios from the probability distribution and enforce

the constraints with respect to all sampled uncertainty scenarios. It is worth noting that

the scenario approach is well suited for the deep learning based joint chance constraint

97
(3.22), because it only requires the scenario sampling from G rather than the explicit

expression of wind power distribution. Therefore, the scenario approach facilitates the

approximate solutions of the resulting ambiguity-free joint chance-constrained program.

 
Suppose ξ G1 , , ξ G K  are the generated wind power data produced by the generator

network in f-GANs, where K represents the number of data samples. Thus, constraint

(3.22) can be approximated by,

ai  x  ξG   bi  x  , i, k   K 
T k
(3.24)

where set  K   1, 2, , K  .

The scenario approach provides a theoretical guarantee that, with a sufficiently large

value of K, the optimal solution of the corresponding scenario program satisfies the

ambiguity-free chance constraints (3.22) with a high probability. By exploiting the

structure of the multi-period ED problem, we explicitly derive the data complexity for

the generator network in f-GAN, as provided in the following proposition.

Proposition 3.2: Given a confidence level 1     0,1 , and risk level    0,1 for the

(DL-AJCC-ED) problem. If K  N   ,   , the optimal solution of the corresponding

scenario program is guaranteed to satisfy constraint (3.21) with a probability of 1   .

The explicit expression of N   ,   is given in (3.25).

 1 
4  ln  BW  NT  1
N  ,       (3.25)
 2  4      2   1  2  
2 
 1

where BW represents the number of buses having wind farms, NT denotes the total

98
number of time periods, and ρ is the radius of the deep learning based ambiguity set.

Proof. The required number of wind power samples is dependent on the number of

support constraints. Note that the support constraints are defined as those constraints,

the removal of which changes the optimal solution. For a fixed b and t, constraint

wbt  Wbt is active for the scenario with the smallest wind power value at bus b and time

t, among K samples. Similarly, for constraint w


b t bt    b t Wbt , it is supported

by the scenario with the largest total wind power over all buses and time periods.

Therefore, the number of support constraints is at most BW  NT  1 for the multi-period

ED problem.

Based on the notion of support constraints [277, 303], we could obtain the bound on the

data complexity below.

2 1 
N  ,     ln  BW  NT  1 (3.26)
   

According to eq. (3.23), we further have eq. (3.25) when χ2-divergence is employed,

which completes the proof. □

Remark 3.1 The sample complexity in eq. (3.25) is for the generator network in f-GANs.

In contrast to the finite amount of real historical wind power data, the unique merit of f-

GAN lies in its capability of efficiently generating as many data samples as required in

eq. (3.25).

Remark 3.2 We can further leverage a prescreen technique described as follows. After

sampling N(, ) wind power scenarios from the generator network, instead of putting

all scenarios into (3.24), a prescreening technique can be leveraged to select the most

critical uncertainty scenarios. Specifically, the lowest levels of wind power at each bus
99
and time period, as well as the highest level of total wind power over all BW buses and

NT time periods are selected beforehand among N(, ) generated data. This technique

greatly increases the computational efficiency of solving the (DL-AJCC-ED) problem

without affecting the solution quality of the optimal ED decisions.

3.5 Computational experiments

In this section, case studies on the six-bus and IEEE 118-bus systems are presented. To

demonstrate the advantages of the proposed approach, we also implement the

conventional distributionally robust chance constrained ED (DRCCED) method with an

ambiguity set based on first and second-order moment information of 𝑊 , of which the

model summary is given as follows.

min eq. (1)

s.t. eq. (2)-(8), (10)-(11)

All optimization problems are solved with CPLEX 12.8.0, implemented on a computer

with an Intel (R) Core (TM) i7-6700 CPU @ 3.40 GHz and 32 GB RAM. The optimality

tolerance for CPLEX 12.8.0 is set to 0. In case studies, the confidence level is set to be

99.9%, ρ is 0.05, and the risk level is set to be 10%.

3.5.1 Illustrative six-bus system

The six-bus system has three conventional thermal generators and 11 transmission lines

[27]. Additionally, three wind farms are installed at Buses 1-3. To promote a high

penetration of wind energy, the percentage of wind utilization β is set to be 30%. The

load shedding cost is set to be $5/MW [270]. The wind power data we use comes from

the NREL Wind Integration Dataset [304].


100
The system diagram of the six-bus system is shown in Figure 15 [1, 2]. From Figure 15,

we can see that there are three conventional thermal generators and 11 transmission lines.

Additionally, three wind farms are installed at Buses 1-3, as illustrated in Figure 15.

Figure 15. The schematic of the six-bus system.

The training process of the f-GAN in case study on the six-bus system is shown in Figure

16. In Figure 16, the x-axis represents the number of iterations, while the y-axis denotes

the value of losses. From Figure 16, we can readily observe that it achieves a fast

training speed at the beginning of the training process. As the training moves on, the f-

divergence between real wind power distribution and the generated distribution

approaches to zero. This implies that the wind data generated by the generator neural

101
network look as realistic as the real ones, and that they cannot be distinguished by the

discriminator network.

Figure 16. The training process of f-GAN.

Due to the high wind utilization percentage of 30%, the resulting DRCCED with

moment information turns out to be infeasible with both constraint (10) and constraint

(11). For the ease of comparison, we implement different ED methods without

constraint (11) regarding the efficiency of wind utilization. The number of wind power

scenarios used in the proposed approach is determined by Proposition 2. According to

eq. (3.25), N(, )=3,102.48 and the number of scenarios is chosen to be 3,103.

The computational results are provided in Table 4. Compared with DRCCED with

moment information, the proposed ED problem has a larger number of constraints

because it introduces a set of constraints for each generated wind power scenario. As a

result, it consumes 2.6 more CPU seconds of solution time. In terms of economic

performance, the proposed ED method is more cost-effective than the DRCCED method

102
with moment information via slashing the total cost by 33.3%. As can be observed from

the results in Table 4, the proposed method with the prescreening technique significantly

reduces its memory and computational time, which are comparable with those of the

DRCCED method with moment information.

Table 4. Comparisons of problem sizes and computational results for the DRCCED

method with moment information and the proposed ED method with/without

prescreening in six-bus test system.

The proposed ED
DRCCED
The proposed ED method with pre-
with moment
Parameter method screening
information

Cont. Var. 364 364 364

Constraints 985 450,776 985

Min. cost ($) 42,438.8 28,308.3 28,308.3

CPU time (s) 1.2 3.8 1.1

To illustrate the benefit of the derived theoretical bound in Proposition 2, we compare

it with an arbitrary scenario number of 50 in a testing case. The optimal ED decision

determined by this arbitrary scenario number has a constraint violation probability of

28% based on 100 testing wind power scenarios generated by f-GAN. This constraint

violation probability is much higher than the prescribed risk level of 10%, thus

jeopardizing the security of power systems under intermittent wind energy. By contrast,

the theoretical bound leads to an optimal ED decision without constraint violation,

which satisfies the requirement on risk level. Based on the comparison results, the

103
benefit of the derived theoretical bound lies in that it quantitatively dictates the required

number of scenarios to guarantee a predefined risk level.

To take a closer look at the cost breakdowns in case study on the six-bus system, we

present the cost distributions determined by the DRCCED method with moment

information and the proposed approach in Figure 17(a) and Figure 17(b), respectively.

From Figure 17(a) and Figure 17(b), we can readily see that the load shedding costs

account for more than 25% of the total costs for both methods. This is ascribed to the

relatively low load-shedding price. Notably, the percentage of load shedding determined

by the DRCCED method with moment information is 23% higher than that of the

proposed approach. The reason for this is described as follows. The DRCCED method

with moment information is less effective in wind power utilization compared with the

proposed approach. Thus, it more frequently resorts to load shedding as an alternative.

Figure 17. The cost breakdown of (a) the DRCCED method with moment

information, and (b) the proposed ED approach.

104
When the load-shedding price increases from $5/MW to $15/MW, no electricity load is

shed over the entire time horizon (NT=24). The power outputs of conventional

generation units are displayed in Figure 18. From Figure 18, we can see that most of the

conventional power comes from Generator 1. Accordingly, the operating cost of

Generator 1 contributes to 61.24% of the total cost for the proposed ED approach.

Figure 18. The power dispatch of each conventional generator determined by the

proposed ED approach.

3.5.2 Constraints IEEE 118-bus system

In this subsection, we consider a large-scale IEEE 118-bus system to demonstrate the

scalability and effectiveness of the proposed deep learning based ED approach. This

system consists of 118 buses, 54 thermal generators, 186 transmission lines, and 91

loads [305]. Moreover, ten wind farms (denoted by WF1-WF10) are installed at Buses

8, 12, 23, 36, 42, 56, 69, 77, 88 and 93. In this case study, the percentage of wind power

utilization β is set to be 15%, and the load shedding cost is $15/MW.

105
Figure 19. The spatial correlations of the ten wind farm energy outputs for (a) real

wind power data, and (b) wind power data generated by f-GAN. The color darkness of

one single cell represents the level of spatial correlation coefficient for corresponding

two wind farms. Comparison of spatial correlations can be made by focusing on the

darkness patterns of heat maps. The temporal correlations of WF10 for (c) real wind

power data, and (d) wind power data generated by f-GAN. The level of auto-

correlation coefficient is depicted by bar height. Comparison of spatial correlations

can be done by considering the height of each bar for every time lag.

To demonstrate that the adopted f-GAN is able to capture spatial and temporal

correlations, we compute the Pearson correlation and auto-correlation coefficients for

106
the wind farms located at different buses. Figure 19(a) and 19(b) visualize the spatial

correlation results of the real wind power data and generated data, respectively. Note

that darker colors indicate stronger correlations in these heat maps. By comparing

Figure 19(a) and 19(b), we can easily identify the resembling patterns of spatial

correlation for the real data and generated data. For example, from Figure 19(b), we

observe that the wind power output at WF1 have strong positive correlations with the

ones at WF4, WF7 and WF10, which is exactly consistent with the underlying true

information indicated by Figure 19(a). To characterize the degree of temporal

correlations in wind power time series, the autocorrelation coefficients are calculated

for each wind site. The results of real data and generated data for WF10 are displayed

in Figure 19(c) and 19(d). By inspection, the generated wind output series of WF10 has

similar auto-correlation coefficients as the real ones. Note that the auto-correlation

coefficients of generated wind data are close to the real one for other wind farms as well.

Thus, the wind scenarios generated by f-GAN retain both spatial and temporal

correlations among multiple wind farms.

The problem sizes and computational results are summarized in Table 5. By setting

ˆbt  ˆ  0.1 241 based on the Bonferroni approximation, the corresponding DRCCED

problem with moment information is infeasible. Thus, in Table 5, the computational

results for the DRCCED method with moment information ( ˆbt  ˆ  0.1 ) are provided.

Note that N(, ) equals to 9,747.19 based on eq. (3.25). Thus, the number of wind power

scenarios K is set to be 9,748. In contrast to the DRCCED method with moment

information, the proposed approach generates more cost-effective ED solution

($401,477.9 vs $825,403.8). With the prescreening technique, the proposed ED problem


107
not only has much fewer constraints for memory savings, but also can be solved around

one order of magnitude faster.

Table 5. Comparisons of problem sizes and computational results for the DRCCED

method with moment information and the proposed ED method with/without

prescreening in IEEE 118 bus system.

The proposed ED
DRCCED
The proposed ED method with pre-
with moment
Parameter method screening
information

Cont. Var. 7,133 7,133 7,133

Constraints 17,088 28,780,485 17,088

Min. cost ($) 825,403.8 401,477.9 401,477.9

CPU time (s) 22.3 223.7 22.6

Figure 20. The empirical distribution of the wind power utilization efficiency for (a)

DRCCED with moment information, and (b) the proposed approach.

108
To examine the wind energy utilizations of the DRCCED method with moment

information and the proposed ED approach, we compute the percentage of wind

utilization for each generated wind power scenario and obtain its empirical probability

distributions, as shown in Figure 20. As can be observed from Figure 20(a), the wind

utilization percentage for the DRCCED method with moment information ranges from

18.02% to 29.01%. By contrast, the proposed approach promotes a higher penetration

of renewable energy with a minimum wind-utilization percentage being 59.11%, which

is much higher than the prescribed percentage of wind power utilization 15%. Moreover,

the probability distribution in Figure 20(b) is lopsided with more probability mass

locating on higher values. This observation again illustrates that the proposed approach

facilitates an effective use of intermittent renewable wind power.

3.6 Summary

In this work, a novel f-GAN based ambiguous joint chance constrained ED optimization

framework was proposed. The deep learning based ambiguity set well captured the

spatial-temporal correlations of wind power uncertainty. To address the resulting ED

problem, a solution method integrating a reformulation technique with the scenario

approach was developed. Additionally, we derived a theoretical bound on the required

number of generated wind power data, which depended on the number of installed wind

farms and the number of time periods in the ED problem. The comparison results with

an arbitrarily chosen number of scenarios showed that the developed theoretical bound

enjoyed a quantitative guarantee on constraint satisfaction in ED. A prescreening

technique could be further leveraged to speed up the solution process, thus facilitating

109
the scalability of the proposed approach in the large-scale IEEE 118 bus system.

Computational results showed that the proposed approach outperformed the

conventional method by generating less conservative ED solution while ensuring a

predefined risk level.

3.7 Nomenclature

Sets and Indices

b Index for buses

i Index for generators

l Index for transmission lines

t Index for time periods

   Ambiguity set based on deep learning

Parameters

CiV Fuel cost of generator i

CbLS Load shedding cost of bus b

Dbt Load demand located at bust b at period t

Fl Capacity of transmission line l

Kbl Power flow distribution factor for the transmission line l due to

the net injection at bus b

Pimin Minimum power output of generator i

Pimax Maximum power output of generator i

RUi Ramp up rate of generator i

110
RDi Ramp down rate of generator i

𝑊 Uncertain wind power generation at bus b at period t

θ Weights of generator network in f-GANs

ω Weights of discriminator network in f-GANs

β Prescribed percentage of wind power utilization

ε Risk level for ambiguous joint chance constraint

ξ Random vector of wind power parameters

Decision Variables

pit Power output of generator i at time period t

qbt Load shedding amount at bus b at time period t

wbt Power dispatch of wind farm at bus b at time period t

111
CHAPTER 4
ONLINE LEARNING BASED RISK-AVERSE STOCHASTIC MODEL
PREDICTIVE CONTROL OF CONSTRAINED LINEAR UNCERTAIN SYSTEMS

4.1 Introduction

Over the past few decades, model predictive control (MPC) has established itself as a

modern control strategy with theoretical grounding and a wide variety of applications

[306-309], because of its distinguishing feature of addressing multivariate dynamic

systems subject to control input and state constraints. Implemented in a receding horizon

fashion, MPC solves a finite-horizon optimal control problem at each sampling instant

and only performs the first control action. This procedure is repeated at the next instant

with a new measurement update. However, the presence of uncertainty could inflict

severe performance degradation or even loss of feasibility on conventional MPC if the

uncertainty is not explicitly accounted for [310].

Motivated by this fact, a lot of research efforts have been made on designing MPC that

accounts for the uncertainty of prediction. Robust MPC strategies, in which disturbances

are modeled using a bounded and deterministic set, aim to satisfy the hard constraints

of states and control inputs for all possible uncertainty realizations [311-314]. Designing

robust MPC is necessary and efficient when state and input constraints need to be

immunized against uncertainty. By exploiting the probabilistic nature of uncertainty in

controller designs, stochastic MPC methods are capable of tolerating constraint

violations in a systematic way [315]. Additionally, stochastic MPC can increase the

112
region of attraction by the means of chance constraints, thus allowing for the systematic

trade-off between constraint satisfaction and control performance [316].

Due to its attractive feature, stochastic MPC has stimulated considerable research

interest from the control community [315-317]. The existing literature in stochastic

MPC can be typically grouped into two main categories depending on whether the

knowledge of uncertainty distribution is perfectly known or not. The first category

subsumes research that focuses on tightening or adaptive tightening for chance

constraints via the explicit use of uncertainty distributions [318-325]. The conventional

stochastic MPC strategies rely heavily on the assumption that the probability

distribution of disturbance is known a priori. However, such perfect knowledge of

distribution is rarely available in practice. Instead, only partial information regarding

disturbance distribution can be inferred from data. In such data-driven settings, chance

constraints in these stochastic MPC frameworks are no longer ensured because there

always exists a gap between the underlying true distribution and the estimated one. For

the case with unknown disturbance distribution, the probabilistic constraints can be

reformulated using the scenario-based approach [326-328], or by means of the

Chebyshev-Cantelli inequality [329-331]. Specifically, the Chebyshev-Cantelli

inequality based method guarantees constraint satisfaction for any distributions sharing

the same mean-covariance information, which is in the same spirit with an emerging

paradigm called distributionally robust control [159, 332-334]. In the paradigm of

distributionally robust control, an ambiguity set is a set comprising all possible

probability distributions characterized by certain known properties of stochastic

disturbances. Distributionally robust control methodologies have been proposed for

113
linear systems with probabilistic constraints assuming the first and second-order

moment information of uncertainty distribution. However, only global mean and

covariance information is utilized, and the existing ambiguity set is not updated in an

online manner based on newly collected disturbance data.

With ever-increasing availability of data in control systems, there is a growing trend to

leverage the information embedded within data to improve control performance. The

remarkable progress in machine learning and big data analytics leads to a broad range

of opportunities to integrate data-driven systems with model-based control systems [35].

Additionally, the dramatic growth of computing power has enabled such organic

integration. Recently, learning-based MPC has attracted increasing attention from the

control community [335-338]. One such method leveraged statistical identification tools

to build a data-driven system model for enhanced performance, while utilizing an

approximate model with uncertainty bounds for constraint tightening in robust tube

MPC [335, 339]. Along the same research direction, a learning-based robust MPC was

developed to integrate control design with offline system model learning via a set

membership identification [340]. For single-input single-output system under stochastic

uncertainty, an adaptive dual MPC was designed [341], where control served as a

probing role to learn system model [342, 343]. To learn system nonlinearities from data,

several MPC strategies leveraged Gaussian process regression which provides system

dynamic model as well as residual bounds [344, 345]. Learning-based MPC is well

suited for the exploitation of data value to enhance control performance [336], and as

such makes it a practical and appealing control tool in the era of big data [346].

However, most of existing learning-based MPC methods focus on learning system

114
models, which essentially learn a function by the means of regression techniques along

with their error bounds. Although some of these learning-based MPC methods allow for

online or adaptive model learning, most of them focus on the robust control framework.

Few studies have organically integrated online learning with stochastic MPC.

Therefore, from both theoretical and practical standpoints, a systematic investigation on

online learning based stochastic MPC with theoretical guarantees on recursive

feasibility and closed-loop stability is needed.

To fill this research gap, there are several computational and theoretical challenges that

need to be addressed. The first research challenge is how to devise a data-driven

ambiguity set of disturbance distributions in a nonparametric manner such that it is

adaptive to data complexity automatically. In the context of online learning, one can

hardly pin down the complexity of disturbance model at the beginning, since more data

stream in over the runtime of MPC and data complexity can grow over time. Another

key research challenge is how to develop a framework that organically integrates online

learning with MPC for intelligent control. In particular, with more and more data

collected, the online learning method needs to be scalable with sample size in terms of

both memory and computational time. The third challenge lies in the development of a

computationally tractable constraint tightening method which hedges against

distributional ambiguity in stochastic MPC while leveraging structural properties of

uncertainty distribution. This calls for the theory extension in distributionally robust

optimization with the nonparametric ambiguity set, since there are no theoretical results

available for direct use. The fourth key research challenge is how to guarantee recursive

feasibility and stability of online learning-based stochastic MPC. This challenge arises

115
from the integration between online learning with stochastic MPC. Specifically, the

online update of disturbance distribution information could jeopardize the property of

recursive feasibility.

This work proposes an online learning-based risk-averse stochastic MPC framework for

linear time-invariant systems affected by additive disturbance. Instead of assuming

perfect knowledge of disturbance probability distribution, we consider a more realistic

setting where the distribution can be partially inferred from data. To immunize the

control strategy against the distributional ambiguity, Conditional Value-at-Risk (CVaR)

constraints are required to be observed for all candidate distributions in an ambiguity

set. As opposed to the conventional ambiguity set using global mean-covariance

information, a nonparametric data-driven ambiguity set is constructed by taking

advantage of a Dirichlet process mixture model (DPMM) [347]. Specifically, we devise

the ambiguity set based on the structural property, namely multimodality, along with

local first and second-order moment information of each mixture component, the

number of which is automatically derived from disturbance data. During the runtime of

the controller, real-time disturbance data are exploited to adapt the uncertainty model in

an online fashion based on an incremental variational inference algorithm. The

developed distribution-learning-while-control scheme alternates between learning

ambiguity set from real-time disturbance data and controlling the system with updated

uncertainty information. Afterwards, we propose a constraint tightening technique for

the resulting distributionally robust CVaR constraints over the DPMM-based ambiguity

set. Specifically, the constraints are equivalently reformulated as Linear Matrix

Inequality (LMI) constraints, which are amenable for efficient computation.

116
Additionally, we introduce a safe online update scheme for ambiguity set such that the

recursive feasibility and closed-loop stability are ensured. Numerical simulation and

comparison studies show that the proposed MPC method enjoys less-conservative

control performance compared with the conventional distributionally robust control that

uses global mean and covariance information of disturbance. Additionally, thanks to the

online learning scheme, the proposed MPC is advantageous in terms of low constraint

violation percentage under time-varying disturbance distribution.

The major contributions of this work are summarized as follows:

 A novel online Bayesian learning based risk-averse stochastic MPC framework that

improves control performance based on real-time data;

 An online data-driven approach with DPMM to devise ambiguity sets that are self-

adaptive to the underlying structure and complexity of real-time disturbance data;

 A novel constraint tightening method for risk-averse stochastic MPC, in which data-

driven CVaR constraints over the DPMM-based ambiguity set are equivalently

reformulated as computationally tractable LMIs;

 Theoretical guarantees on recursive feasibility and stability of the proposed MPC

with the introduction of a novel safe update scheme for ambiguity sets.

One important contribution of this work is a novel and organic integration of online

learning with stochastic MPC under time-varying uncertainty. Moreover, a novel

nonparametric ambiguity set based on the DPMM is first developed in this manuscript.

Note that the objective of the constraint tightening method is to facilitate the solution of

the proposed MPC. To the best of our knowledge, few research works investigate

theoretical guarantees on recursive feasibility and stability of stochastic MPC in the face

117
of time-varying disturbance distribution. Therefore, the establishment of recursive

feasibility and stability with the safe update scheme is another novel contribution,

although some widely used control techniques are employed.

Notation: The notation used in this chapter is standard. For sets  and  ,

 
    a  b a  , b   and     a a     denote the Minkowski set

addition and Pontryagin set difference, respectively. [A]j represents the j-th row of the

matrix A, and [a]j denotes the j-th entry of the vector a. For matrices A and B, their

Frobenius inner product is represented as A  B . The concatenated vector of

c0 k ,  , c N 1 k is denoted by c N k . For integers p and q, the notation [p, q] represents all

the integer numbers between p and q. For a real number α,    max  , 0  .


4.2 Problem setup and preliminaries

In this section, we describe the problem setup, including system dynamics,

distributionally robust CVaR constraints on system states, hard constraints on control

inputs, objective function, as well as some basics on risk-averse stochastic MPC.

Consider the following linear, time-invariant uncertain system with additive stochastic

disturbance.

xk 1  Axk  Buk  wk (4.1)

where xk   n is system state, uk   m denotes control input, wk   n represents


additive disturbance. Let   x  n Hx  h  
and   u   m Gu  g  be the

polytopic constraints on state and input, both of which contain the origin in the interior.

We make the following assumptions regarding the system and additive disturbance.
118
Assumption 4.1 The measurement of system state xk is available at time k.

This is a common assumption. At each sampling time t+1, the realization of disturbance

at previous time instant can be obtained by wt  xt 1  Axt  But . Therefore, we have

access to real-time disturbance data for online learning which refines the knowledge of

uncertainty.

Assumption 4.2 The matrix pair (A, B) is controllable.

Assumption 4.3 The additive disturbance w has a bounded and convex support set

  w Ew  f  , which contains the origin in the interior.

Given the state measurement at time k, the predicted model in MPC is given by,

xl 1 k  Axl k  Bul k  wl k , x0 k  xk (4.2)

where xl k and ul k represent the l-step ahead state and control input predicted at time k,

respectively. Disturbance wl k denotes the l-step ahead stochastic disturbance with

distributional properties available at time k.

By leveraging the probability distribution of disturbance w, stochastic MPC allows

constraint violation via the use of chance constraints. Following the literature [318, 348,

349], we consider the chance constraints of states, as in (4.3). The advantage of such

“one-step-ahead” chance constraint is that it facilitates the recursive feasibility analysis

and ensures the closed-loop performance.

 
  H i xl 1 k   h i xl k  1   i , i  1, p  (4.3)

119
where H  pn , h p , parameter [ε]i is a pre-specified risk level for i-th constraint

on system state. Note that (4.3) becomes hard constraints when the probability mass of

all disturbances is strictly greater than zero and [ε]i is set to be zero.

Chance constraints ensure the constraints be satisfied with a probability of at least 1−[ε]i,

relying on the assumption that the disturbance distribution is perfectly known.

Stochastic MPC can enlarge the feasible region of the corresponding finite-horizon

optimal control problem via tunable risk levels in chance constraints, thus improving

upon objective performance. By assuming disturbance distribution, the tightening

parameters can be computed offline by the inverse of the cumulative distribution. In

practice, however, such precise knowledge of probability is rarely available, and only

partial information can be inferred from historical as well as real-time disturbance data.

Due to the finite amount of uncertainty data, the assumed disturbance probability 

could deviate from the underlying true distribution. Consequently, the actual constraint

violation resulted from conventional stochastic MPC could become worse than the pre-

specified one. Additionally, the chance constraints per se (in the form of (4.3)) focus on

the frequency of constraint violation and fail to account for the violation magnitude

without penalizing the magnitude in cost function as implemented in soft constrains

[159, 350]. Therefore, we introduce the definition of CVaR and distributionally robust

CVaR constraints as follows.

Definition 4.1 (Conditional Value-at-Risk). For a given measureable loss L: n   ,

probability  on n , and tolerance level    0,1 , the Conditional Value-at-Risk

120
(CVaR) of random loss function L at level  with respect to probability  is defined

below.

    1
 -CVaR  L   inf    L   
 
      

(4.4)

where   denotes the expectation with respect to probability  . The CVaR can be

interpreted as the conditional expectation of loss L above the 1   quantile of

probability distribution of L.

To address the issue of conventional chance constraints, we propose to use a

 
distributionally robust CVaR version of constraints   H i xl 1 k   h i xl k  1   i ,

which is provided as follows.

sup -CVaR  
  
k i
 H  x
i l 1 k 
  h i  0 (4.5)

where   k  is defined as an ambiguity set constructed based on disturbance data

information available up to time k, and denotes the conditional probability given xl k

.Unlike the chance constraints, the distributionally robust CVaR constraints

sup -CVaR  
  
k i
 H  x
i l 1 k 
  h i  0 not only hedge against the distributional

ambiguity, but also penalize severe constraint violations that could be detrimental to

system safety.

Meanwhile, hard constraints are imposed on control inputs due to the physical limitation

of actuators, as shown below.

Gul k  g (4.6)

where G   qm and g   q .


121
Following the common practice in stochastic MPC, we split the predicted state into its

nominal part and stochastic error part as follows:

xl k  zl k  el k (4.7)

where zl k and el k denote the nominal part and stochastic error part of predicted state

xl k , respectively.

An MPC strategy with a predictive horizon N is considered. By employing an error

feedback, the predicted input for the uncertain system can be represented by

ul k  Kel k  vl k (4.8)

where K is a stabilizing feedback gain, vl k represents the predicted control input for the

nominal system corresponding to zl k . The nominal control vl k can be formulated as

follows.

vl k  Kzl k  cl k (4.9)

where cl k   m , l  0,, N  1 are decision variables in the receding horizon optimal

control problem, and cl k  0, l  N .

With predictive control laws ul k  Kel k  vl k and vl k  Kzl k  cl k , the system dynamics

for nominal states and stochastic error are given below.

zl 1 k  Azl k  Bvl k , z0 k  xk (4.10)

el 1 k   el k  wl k , e0 k  0 (4.11)

where Φ=A+BK is Schur stable with the stabilizing feedback gain K.

We consider the cost for the nominal system in this work. Specifically, the control

objective is to minimize the infinite horizon cost at sampling time k, as given below.
122
 

J    zlTk Qzl k  vlTk Rvl k (4.12)
l 0

where Q   nn , Q  0 , R mm , and R  0 .

Furthermore, the following assumption on detectability is made such that there exists

an LQR solution.

Assumption 4.4 The matrix pair (A, Q1/2) is detectable.

Suppose the feedback gain matrix K is chosen to be LQR optimal, we can further rewrite

(4.12) as (4.13). The detailed derivation is provided in Appendix A.

N 1
J    clTk  R  BT PB  cl k  xkT Pxk (4.13)
l 0

where P denotes the solution of the Lyapunov equation T P  Q  K T RK  P . Note

that the second term xkT Pxk is a constant. A quadratic finite-horizon cost is given by

 
N 1
J N c N k   clTk  R  BT PB  cl k (4.14)
l 0

 
T
where c N k  c0 k T ,, cN 1 k T represents the decision vector at time k with a horizon

of length N.

4.3 Online learning based risk-averse stochastic MPC

In this section, we propose a novel risk-averse stochastic MPC framework based on

online Bayesian learning. We first develop a data-driven approach to construct an

ambiguity set for the stochastic disturbance based on the DPMM. Then, an efficient

constraint tightening method for the CVaR constraints on system states over the

ambiguity set is developed for the synthesis of stochastic predictive controller. Finally,

123
based on an online safe update scheme, the predictive control algorithm that organically

integrates online learning with risk-averse stochastic MPC is described.

4.3.1 Online Bayesian learning for streaming disturbance measurement

data

To automatically decipher the structural property of the disturbance distribution, we

employ a nonparametric Bayesian model, called DPMM, which is briefly described as

follows.

The Dirichlet process (DP) constitutes a fundamental building block for the DPMM that

relies on mixtures to characterize data distribution. The DP is technically a probability

distribution over distributions. Suppose a random distribution G follows a DP

parameterized by a concentration parameter α and a base measure G0 over space Θ0,

denoted as G ~ DP  , G0  . For any fixed partitions (A1, …, Ar) of Θ0, we have the

following

 G ( A1 ), , G ( Ar )  ~ Dir  G0 ( A1 ), ,  G0 ( Ar )  (4.15)

where Dir represents the Dirichlet distribution.

Following the stick-breaking procedure [351], a random draw from the DP can be

expressed by G   k 1  k  k  , where  k   k  j 1 1   j  is the weight, φk is


 k 1

sampled from G0, and δ(φk) denotes the Dirac delta function at k . The parameter

𝛽̅ represents the proportion being broken from the remaining stick, and follows a Beta

distribution, denoted as 𝛽̅ ~ Beta (1, α).

124
The Bayesian nonparametric model, i.e. DPMM, employs k as the parameters of some

data distribution. Based on the DP, we summarize the basic form of a DPMM as follows

[352, 353]:

 k , k k 1 ~ DP  , G0 

ln ~ Mult   (4.16)
on ~ F (ln )

where Mult denotes a multinomial distribution, ln is the label indicating the component

or cluster of observation on, n is an index ranging from 1 to Nd, and data o1, …, oNd are

distributed according to a family of distributions F. Based on the stick-breaking process,

G is discrete with probability one. Such discreteness further induces the clustering of

data.

Due to its computational efficiency, the variational inference has become a method of

choice for approximating the conditional distribution of latent variables in the DPMM

given observed data [353]. In the variational inference, the problem of computing the

posterior distribution is formulated as an optimization problem, which can be solved

using a coordinate ascent method. Following the literature [352], we use the mixtures of

 
Gaussian in this work. Therefore, we can choose n ~ NW k , k , k ,  k 1 , where

k  k , H k  includes mean vector and precision matrix and NW represents the normal

Wishart distribution. A variational distribution is used to approximate the true posterior

in terms of Kullback–Leibler divergence [353, 354].

For the online learning setting, suppose that the real-time data wt is collected for the

control system. To learn from the streaming data, we employ an online variational

125
inference algorithm in this work [347]. This algorithm features faster computation and

bounded memory requirement for each round of learning. It is well-suited to learning the

distribution online from real-time data over the runtime of MPC. The algorithm iterates

between the model building phase and the compression phase. In the model building

phase, clump constraint Cs is the set of indices satisfying that i, j  Cs , disturbance

data wi and wj are generated from the same mixture component. Disturbance data within

the same clump are summarized via the average sufficient statistics, which encapsulates

all the information needed for the purpose of inference [347]. The new disturbance data

at time t are used to update the inference results, then they are discarded to reduce the

memory overhead. As a result, the algorithm is attractive in terms of both bounded

memory requirement and fast computation. By introducing the compression phase, the

algorithm not only is computationally efficient, but also requires bounded memory

space. The clump constraints are determined in the compression phase in a top-down

recursive fashion. Specifically, the computation burden at each model update using new

disturbance data does not grow with the processed disturbance data amount. For more

details on this online learning algorithm for the DPMM, we refer the readers to [347].

Based on the online variational inference results available up to time k, we propose a

data-driven ambiguity set, as given in Definition 4.2, by leveraging both multimodality

and moment information of each mixture component.

Definition 4.2 (Data-driven ambiguity set). The ambiguity set based on the DPMM,

denoted as  , is defined as follows.

m 
k

j 1

    j   j ,  j  , j 
k k k
 (4.17)

126
where   w Ew  f  is the support set of the additive disturbance under

Assumption 4.3, m(k) denotes the number of mixture components, the mixing weight

 jk  indicates the occurring probability of each mixture component,  j represents a

basic ambiguity set,  j


k
and j  , respectively, denote the mean and covariance
k

estimates for the j-th mixture component obtained from the online learning algorithm.

Note that the support set plays a key role in ensuring recursive feasibility by the means

of terminal set.

The data-driven ambiguity set for the stochastic disturbance based on the DPMM is

devised as a weighted Minkowski sum of several basic ambiguity sets, the number of

which is automatically determined from disturbance data using the online variational

inference algorithm. Each basic ambiguity set  j is cast as follows.

 
    d   1
 
 

 j  ,  j  , j 
k k
           d   

 
j
k
 (4.18)
 
 
T

  T    d   jk    jk   jk  
  

where  represents the set of positive Borel measures on n , and ρ is a positive

measure.

There are several highlights of the proposed data-driven ambiguity set. First, the

ambiguity set is devised in a nonparametric manner so that it automatically

accommodate its complexity to the underlying structure and complexity of disturbance

data. Second, each basic ambiguity set is devised using the mean and covariance

information, which endows the resulting stochastic MPC with enormous computational

127
benefits. Third, the proposed ambiguity set leverages the fined-grained distribution

information, namely local moment information. This feature implies that the resulting

stochastic MPC enjoys a less conservative control performance compared against the

control method with a conventional ambiguity set based on global moment information.

4.3.2 A novel constraint tightening based on distributionally robust CVaR

constrained optimization

In this section, we present a novel constraint tightening technique for distributionally

robust CVaR constraints sup -CVaR  


  
k i
 H  x
i l 1 k 
  h i  0 over the data-driven

ambiguity set  , as well as constraint tightening for input constraints Gul k  g .

The purpose of constraint tightening is to obtain constraints for system states of the

nominal system, such that the states of the uncertain systems are satisfied with a

frequency of at least 1−[ε]i.

For state constraints, the corresponding distributionally robust CVaR constraints can be

equivalently reformulated as LMI constraints on the predicted nominal system states, as

given in the following theorem.

Theorem 4.1 (Constraint tightening). The system satisfies the data-driven

distributionally CVaR constraints if and only if the nominal system satisfies the

tightened constraints zl 1 k    k    l
i 1    k  given below.
 i  , with 


  k   z   n Hz  h    k   (4.19)

where [η(k)]i is the optimal objective value of the following problem (4.20).

128
min 

 
m
s.t.  i   i    j tij   j T ij    j   j  Tj   ij  0
j 1

 1 
 ij
2
 ij  E T ij  
   0, j (4.20)
 1   ET T
  ij ij  tij  f ij 
T

2 

 ij
1
2
 
ij   H i  E T ij 
T

   0, j

 1    H  T  ET

T
 ij ij tij   i    f ij 
T

2 i

ij  0, ij , ij  0, j

Note that we drop the index k from m  k  ,  j  ,  j  and j  for notational simplicity.
k k k

The constraints in (4.20) are LMI constraints. The proof of Theorem 4.1 is provided in

Appendix B.

Remark 4.1 The support set is assumed to be a polyhedron in Assumption 4.3, because

it facilitates the LMI reformulation in the proposed constraint tightening. Specifically,

constraints with a polyhedral support set can be reformulated as computationally

efficient LMI constraints in the proposed MPC. In principle, the assumption on

polyhedral support can be relaxed to a general compact convex set. In this case, those

constraints still admit robust counterparts, yet more complicated than LMIs.

The optimization problem in (4.20) is a computationally tractable semi-definite program

(SDP), which can be solved efficiently using the off-the-shelf optimization solvers, such

as SEDUMI and MOSEK. The distribution information of disturbance is learned online

through the incremental variational inference algorithm, and is further incorporated to

(4.20) to perform constraint tightening, thus improving the control performance in an

online fashion.
129
For hard constraints on control inputs Gul k  g , the corresponding constraint

tightening result is obtained in as follows, based on the tube-based strategy.

vl k  l    K  l 1
i 0 i   (4.21)


where   u   m Gu  g . 

4.3.3 The proposed online learning based risk-averse stochastic MPC

algorithm with a safe update scheme

The proposed online learning based risk-averse stochastic MPC In this section, we first

introduce the finite horizon optimal control problem of the proposed MPC with a safe

ambiguity set update scheme. Then, the overall description of the resulting online-

learning based risk-averse stochastic MPC algorithm is provided in detail.

In the online learning based risk-averse stochastic MPC paradigm, the finite horizon

optimal control problem is solved repeatedly online. The optimal control problem

needed to be solved at time k is denoted as (OL-SMPCk), given in (4.22)-(4.27).

min J N c N k
cN k
  (4.22)

s.t. zl 1 k  Azl k  Bvl k , z0 k  xk (4.23)

vl k  Kzl k  cl k (4.24)

zl 1 k   l 1 , l   0, N  1
k
(4.25)

vl k  l , l   0, N  1 (4.26)

zN k  f 
k
(4.27)

130
where sets  l k 1 and  l k 1 are defined using a safe update scheme described as follows.

In the developed update scheme, the condition is identified under which it remains safe

to utilize the tightened constraints with the updated ambiguity set in the risk-averse

stochastic MPC. Specifically, if a candidate solution satisfies the tightened constraints

resulted from current results of online variational inference, one can safely incorporate

the newly-learned uncertainty information into the control problem; otherwise, one

resorts to the tightened constraints from the previous sampling time.

To describe the condition checked by the safe update scheme, we need the definition of

candidate solution given as follows.

Definition 4.3 (Candidate solution) Given an optimal solution

 
c*N k  c0* k , c1*k , , c*N 1 k to the MPC problem at time k, the candidate solution at time

instant k+1 is defined by


c k 1  c1*k , , c*N 1 k , 0  (4.28)

Note that it is common to employ the candidate solution as the shifted optimal input

augmented by zero [314, 355]. This work employs the dual mode prediction paradigm

[356], namely Mode 1 corresponds to vl k  Kzl k  cl k , l  0, , N  1 , while Mode 2

corresponds to vl k  Kzl k , l  N . As a result, the terminal controller vl k  Kzl k for the

nominal system zl 1 k  Azl k  Bvl k is able to steer the nominal system state (in the

terminal set) to the origin. The last term of zero in the candidate solution actually comes

from the shifted solution at Mode 2 ( c N k  0 ). Therefore, it is not restrictive to require

the last term to be zero. The explicitly given candidate solution plays a critical role not

131
only in establishing recursive feasibility, but also in proving closed-loop stability [314,

355].

Based on the candidate solution, the safe update scheme checks the following

conditions:

ˆ  k 1
zl k 1   (4.29)
l

zN k 1  ˆ f 
k 1
(4.30)

where zl k 1 denotes the predicted state of nominal system corresponding to the

candidate solution c k 1 , set ˆ l k 1    k 1   l


i 1  ˆ  k 1 is defined using
 i  , and set  f

terminal constraints.

ˆ  k 1 , we define (maximal) robust positively invariant


To formally define terminal set  f

set below [357].

Definition 4.4 (Robust positively invariant set) A set Ω is a robust positively invariant

set for system xk 1  f  xk , wk  and constraint set  ,   , if    and

xk 1  , wk   for every xk   .

Definition 4.5 (Maximal Robust Positively Invariant set) A set Ω is a maximal robust

positively invariant (MRPI) set for system xk 1  f  xk , wk  and constraint set   ,   ,

if Ω is a robust positively invariant set and contains all robust positively invariant sets.

Remark 4.2 The MRPI sets are computed by using a standard approach based on the

recursion of predecessor sets, a.k.a. backward reachable sets [357, 358].

132
ˆ  k  is defined as the Maximal Robust
Definition 4.6 (Terminal set) The terminal set  f

Positively Invariant (MRPI) set for the following system zN k 1  zN k   N wk that

ˆ  k  and Kz   .
satisfies the tightened constraints z N k   N Nk N

Based on Definition 4.6, the terminal set should be MRPI set which satisfies

ˆ  k   N  
 ˆ  k  and z  ˆ  k   z  ˆ  k  , Kz   .
f f f N N

Note that the terminal controller using the feedback gain K respects state and input

constraints, when operated in the terminal set.

It is safe to update the ambiguity set when an indicator called flag equals to 1; otherwise,

when flag=0, updating the ambiguity set could jeopardize the recursive feasibility of the

proposed MPC. Therefore, the corresponding tightened constraints adopted in the online

learning based stochastic MPC is given by

l
k 1 ˆ
 flag   l
k 1
 1  flag    
l
k
(4.31)

Similarly, we can safely update the terminal set in the following way:

f
k 1 ˆ  k 1  1  flag    k 
 flag   (4.32)
f f

The proposed MPC algorithm is detailed in Figure 21. The online-learning based risk-

averse stochastic MPC algorithm can be roughly divided into two blocks: (i) offline

computation of the sets, ambiguity set construction based on historical disturbance data,

and (ii) online learning from real-time disturbance data and online optimization. At each

time, the MPC strategy only implements the first control action. From Figure 21, we

can see that it alternates online optimal control of the system, and online learning for

ˆ  k 1
real-time disturbance data. If the candidate solution satisfies the conditions zl k 1   l

133
ˆ  k 1 , the newly learned uncertainty information is incorporated into the
and zN k 1   f

predictive control strategy to improve the control performance over its runtime. In the

MPC framework, a finite-horizon optimal control problem is solved at each sampling

time, and only the first control action is performed as control input uk . The

corresponding first control action u0* k  Ke0 k  v0* k . Since e0 k  0 , we can further have

u0* k  0  v0* k  v0* k . Therefore, we have control input uk  v0* k as in Step 5 of the

algorithm.

Algorithm Online-learning based stochastic MPC algorithm


Offline: Given the initial state x0, construct an ambiguity set in (17)
 0
1: from historical data, and determine  l using (19)-(20), l in (21),
and f0 based on Definition 6.
2: Online:
3: for k=0,1…do
4: Solve the optimal control problem (OL-SMPCk) in (22)-(27);
Apply control policy in (8) for l=0, i.e. uk  v0 k ;
*
5:
6: Measure the current system state xk 1 and obtain wk using (1);
7: Run online learning for DPMM in (16) with real-time data wk ;
Update ambiguity set in (17), and obtain ˆ l k 1 using (19)-(20) and
8:
ˆ f  based on Definition 6;
k 1

9: if zl k 1  ˆ l k 1 in (29) and zN k 1  ˆ fk 1 in (30)


10: flag=1;
11: else
12: flag=0;
13:   end 
14: Determine set  l k 1 and fk 1 using (31)-(32);
15: end

Figure 21. The pseudocode of the proposed online-learning based risk-averse

stochastic MPC algorithm.

134
Remark 4.3 The upper bound of memory requirement of the online learning algorithm

 n 2  3n 
is   1  N c  nN s , where N c is the number of clumps, N s is the number of
 2 

singlets, and n denotes the data dimension. Note that computational cost for storing the

 n 2  3n 
sufficient statistics for each clump is   1  . Compared with the batch learning
 2 

algorithm, the computational complexity of the adopted online learning algorithm is

only O  K  N c  N s  1  during the model building phase [347], where K denotes the

maximum number of components.

4.4 The theoretical properties of the proposed online learning based

risk-averse stochastic MPC

In this section, the properties of the proposed online learning based risk-averse

stochastic MPC algorithm, namely recursive feasibility (formally defined in Definition

4.7) and closed-loop stability, are derived.

Definition 4.7 (Recursive feasibility) If the finite-horizon optimization problem of MPC

is initially feasible, it remains feasible for all the subsequent sampling instant.

As pointed out in Section 4.1, the important property of MPC, namely recursive

feasibility, might be compromised by the time-varying probabilistic constraints with

online updated ambiguity set of disturbance distributions. To this end, a novel safe

update scheme for the ambiguity set is developed for the MPC framework, along with

the terminal set or terminal constraints [357].

135
By employing the safe ambiguity set update scheme developed in Section 4.3.3, the

recursive feasibility and closed-loop stability are ensured despite that the disturbance

distribution might be time-varying. Note that disturbance support  is assumed to be

not time-varying for the ease of exposition, as indicated in Assumption 4.3 where

matrices E and f are not indexed by time k. For the time-varying support, the derivation

and conclusion of proposed constraint tightening in Theorem 4.1 are still valid. The

only issue with the varying support is that the MRPI set could become empty when the

support becomes sufficiently large, which further leads to the infeasibility issue.

A standard assumption underpinning tube based MPC on the terminal set is made as

follows [312, 313]..

Assumption 4.5 There exists a nonempty terminal set *f for the tightened constraints

based on a worst-case scheme, i.e. ˆ *l   *   l


i1 i   with

 *   z  n Hz  h   where    max  H   .
 0 0 i i
 

Proposition 4.1. Under Assumption 4.5, the terminal MRPI set f  is always
k

nonempty based on the constraint tightening in Theorem 4.1.

The proof of Proposition 4.1 is given in Appendix C.

Remark 4.4 Based on the proof in Appendix C, we can see that set *f is a robust

positively invariant set, not necessarily the MRPI set, for the updated tightened

constraints. Therefore, in the case where very limited computational budget is imposed,

the offline-computed set *f can serve as the terminal set to guarantee recursive

feasibility and stability without re-computing the MRPI set at each time step.

136
First, we prove the recursive feasibility of the proposed MPC, as given in the following

theorem.

Theorem 4.2 (Recursive feasibility). Let   xk  denote the feasible region of the finite

horizon optimal control problem (OL-SMPCk) for state xk. If   x0    , then given

Assumptions 4.1-4.5, we have   xk    , k .

The proof of Theorem 4.2 is provided in Appendix D.

Remark 4.5 Since the distribution of stochastic disturbance can be arbitrarily time-

varying, the feasibility of the candidate solution based on the adaptive constraints

(without using the safe update scheme) cannot hold universally. To the best of our

knowledge, the developed safe update scheme presents the first attempt to successfully

address such recursive feasibility issue of stochastic MPC subject to time-varying

disturbance distributions.

Before proving the stability of the proposed MPC, we provide the definition of minimal

RPI set as follows.

Definition 4.8 (minimal robust positively invariant set) Minimal robust positively

invariant set R∞ is defined as follows.

R  limli 0  i  (4.33)
l 

The following theorem establishes the stability of the closed-loop system with the

proposed MPC strategy.

Theorem 4.3 (closed-loop stability). Given that   x0    , the closed-loop system

state asymptotically converges to a neighborhood of the original under the proposed

online-learning based risk-averse stochastic MPC.


137
The proof of Theorem 4.3 is provided in Appendix E.

4.5 Numerical examples

In this section, we apply the proposed online-learning based stochastic MPC to

numerical examples to illustrate its effectiveness and advantages. We also implement

the risk-averse stochastic MPC using global mean and covariance in the ambiguity set

and the risk-averse stochastic MPC without online learning, in addition to the proposed

MPC approach for the purpose of comparison. The online learning algorithm and the

MPC control strategies are implemented in MATLAB R2018a. The computational

experiments are performed on a computer with an Intel (R) Core (TM) i7-6700 CPU @

3.40 GHz and 32 GB RAM. We use the YALMIP toolbox in MATLAB R2018a [359].

The GUROBI 8.0 solver is adopted to solve the finite-horizon optimal control problem,

and SeDuMi 1.3 is employed to solve the constraint tightening problem (4.20). The

related sets, including terminal sets and the robust positively invariant set, are obtained

via multi-parametric toolbox 3.0 [360]. The prediction horizon N is set to be nine. Note

that the distributionally robust control method (Van Parys et. al, 2016) considers a

setting similar to ours, e.g. the risk-averse stochastic control setting where the

disturbance distribution is only partially known. This is why we consider the

comparison with this existing distributionally robust control method.

4.5.1 Example with disturbance data distribution having multimodality

In this section, we use a benchmark numerical example, control of the constrained

sampled double integrator [312], to demonstrate the effectiveness of the proposed MPC.

The system dynamics in this benchmark example is defined by


138
 1 1  0.5 
xk 1    xk    uk  wk (4.34)
 0 1 1 

The state and control constraints in the risk-averse stochastic MPC is given as follows.

  
k
 
sup  -CVaR 0.2  0 1 xl 1 k  2  0 (4.35)


  u u  5  (4.36)

The initial condition of system states x0   5, 2  , and the support set of disturbance
T

 
is   w w   0.6 . In the numerical example, the matrix gain K is decided to be the

1 0
unconstrained LQR solution with matrix Q    and R=0.01. Specifically, the
0 1

 0.6696 0.3370 
matrix gain K   0.6609  1.3261 , so     . Set R∞ can be
 0.6609 0.3261

computed using an approximation method [361], in which an upper bound on the

approximation error can be specified a priori.

To demonstrate control performance and computational time, 100 closed-loop

simulations are performed with a simulation horizon of 20 time steps. The closed-loop

cost function is J cost   ks1  xkT Qxk  ukT Ruk  , where simulation horizon length Ts=20.
T

Compared with the risk-averse stochastic MPC with a mean-covariance ambiguity set,

the proposed MPC method exploits the fine-grained uncertainty information, and is less

conservative via reducing the closed-loop cost by an average of 9.77% over all

simulation runs. To take a closer look at the computational time breakdown of the

proposed MPC, we present the average computational time for online learning,

constraint tightening through solving (4.20), online control via solving (OL-SMPCk),
139
and computing the MRPI set in Figure 22. The average values of computational time

are calculated over 2,000 (20 100) time steps. From the results, we can see that the

average computational time for the online learning is merely 0.344 s. It is

computationally efficient to update pamater η online, and it takes 0.215 s on average to

solve optimization problem (4.20). Even though additional computation is needed at

each time step, the proposed MPC not only enables the incorporation of updated

distribution information to improve control performance, but also features an acceptable

computational cost. Since the example is a benchmark problem, the computational time

comparison is representative of the computational performance for the proposed

approach.

Remark 4.6 The adopted approach of computing the MRPI set is computationally

tractable, because the MRPI set is guaranteed to be computed in a finite number of

recursions [358]. In the above numerical example, it takes only 0.057s on average to

compute the MRPI set, and the longest time of computing the MRPI set is 0.105s.

140
Figure 22. The average computational times of the proposed online learn-ing based

risk-averse stochastic MPC method over 2,000 time steps.

4.5.2 Example with a time-varying disturbance distribution

To demonstrate the significance of online learning in the proposed MPC approach, we

consider an example with a time-varying disturbance distribution. Specifically, the

standard deviation used to generate historical disturbance data is 0.005, while the

standard deviation increases to 0.3 for the real-time disturbance data generation. In this

1 0
example, matrix Q    , R=1, and the support set of disturbance is
0 1

 
  w w   0.10 , and the distributionally CVaR constraints is given below.

  
k
 
sup  -CVaR 0.15  0 1 xl 1 k  1.2  0 (4.37)

141
To take a closer look at constraint violations under time-varying disturbance

distributions, 100 runs of simulations are performed with different realizations of

disturbance sequences. Figure 23 shows a set of state trajectories using the proposed

MPC for a simulation horizon of 20 time steps. Note that the prediction horizon N=9.

For the proposed MPC strategy, the average constraint violation in the first nine steps

over all simulations is 7.2%, even if the disturbance variation becomes significant

because of the larger standard deviation. In constrast, the performance of risk-averse

stochastic MPC using distributionally robust CVaR constraints without online learning

deteriorates, and the corresponding average constraint violation increases to 22.3%.

This constraint violation is higher than the prescribed tolerance of 15.0%, thus

jeopardizing the safety of the control system. Notably, the risk-averse stochastic MPC

without online learning scheme implements constraint tightening offline using historical

disturbance data and its parameter η remains constant over time.

142
Figure 23. (a): The closed-loop trajectories of system states for the proposed online

learning based risk-averse stochastic MPC with 100 realizations of disturbance

sequences, (b): The zoom-in view of state trajectories near the upper limit of x(2).

By leveraging the online update of the ambiguity set, the time-varying distribution is

well captured by the proposed MPC method, and the constraint tightening is adaptive to

the distribution accordingly. Figure 24 shows the parameter η in the constraint

tightening of the proposed MPC over time in a simulation run. From Figure 24, we can

readily see that the effect of adaptation in the proposed MPC is evident. Specifically, η

increases from 0.016 to 0.10 based on the updated information of the stochastic

143
disturbance. The values of flag equal to one over the entire horizon, which indicates that

the adaptive solution is activated at each time step.

Figure 24. The online adaption of constraint tightening parameters in the proposed

MPC for time-varying disturbance distribution in a simulation.

4.6 Summary

In this chapter, an online learning-based risk-averse stochastic MPC framework was

developed for the control of linear time-invariant systems affected by additive

disturbance. It incorporated online learning into MPC with desirable theoretical control

guarantees and less conservative control performance. Based on the DPMM, a

systematic approach to construct the ambiguity set from real-time disturbance data was

developed, which leveraged the structural property of multimodality and local moment

information. Additionally, the exact reformulation for the distributionally robust CVaR

144
constraints was derived as LMI constraints to facilitate constraint tightening. The online

adaptation of ambiguity set with real-time measurements helped to improve control

performance by actively learning disturbance distribution online. We introduced the

safe update scheme to ensure the recursive feasibility and stability of the resulting MPC.

The computational results demonstrated that the control performance of the developed

MPC is less conservative compared with the one using mean and covariance information

of disturbance distribution. Additionally, for the case with time-varying disturbance

distribution, the deterioration of constraint satisfaction was ameliorated in the proposed

online learning based MPC framework.

4.7 Appendix A: The derivation of control objective

Based on the dual-mode prediction dynamics, the nominal system

zl 1 k  Azl k  Bvl k , z0 k  xk can be characterized by an autonomous system shown

below [356].

yl 1 k   yl k , l  0

(4.38)

 z0 k 
 
c0 k    B 
where the initial state y0 k   , and the system matrix    0   , matrix
   
cN 1 k 
 

 0 Im  0 
 
    
   I m 0  0  , and matrix    .
 0 0  Im 
 
0 0  0 

145
Based on the autonomous system dynamics, the stage cost of

 

J    zlTk Qzl k  vlTk Rvl k can be written below.
l 0

  R  Kz 
T
zlTk Qzl k  vlTk Rvl k  zlTk Qzl k  Kzl k  cl k lk
 cl k
(4.39)
ˆ
 ylTk Qylk

 Q  K T RK K T R 
where matrix Qˆ   .
  RK  T R 
T

Thus, we can rewrite J  as follows.


J    ylTk Qy
ˆ  yT y
lk 0k 0k (4.40)
l 0

where  is the positive definite solution of the Lyapunov equation   T   Qˆ .

  zc 
We express matrix  as    z  . By substituting  ,  and Q̂ into the
  cz c 

equation   T   Qˆ , we have

 z  T  z   Q  K T RK (4.41)

cz  T cz   T  BT z   RK  (4.42)

c   B  z  B   T cz B   B   zc   T c   T R
T T
(4.43)

Since feedback gain K is the LQR optimal solution and (4.41), we have  is the unique

solution of the Riccati equation and  z  P . Based on the expression of LQR solution

K    BT  z B  R  BT  z A , we can further have


1

146
BT  z   RK

 1

 BT  z A  B  B T  z B  R  B T  z A  R  BT  z B  R  BT  z A
1

(4.44)
 B  z A   B  z B  R  B  z B  R  B  z A
T T T 1 T

0

According to (4.42) and (4.44), we have cz  T cz   0 , which implies cz  0 is

the solution. Since matrix  is symmetric, we can further have  zc  0 . Therefore, the

equation (4.42) leads to the following.

c  T c   T  R  BT PB   (4.45)

 c1,1  c1, N  
 
Let matrix c       . By plugging the block matrix into (4.45), we
  N ,1 
 c  cN , N  

arrive at the following expression:

 R  BT PB, i j
c
i, j 
 (4.46)
0, i j

N 1
 
Based on (4.40) and (4.46), we have J    clTk R  BT PB cl k  xkT Pxk .
l 0

4.8 Appendix B. Proof of Theorem 4.1

Proof. Based on xl k  zl k  el k and el 1 k   el k  wl k , we have

xl 1 k  zl 1 k   el k  wl k (4.47)

Note that conditional distributionally robust CVaR constraints need to be hold for any

reachable el k . Therefore, we leverage the tube-based method, and the original

distributionally robust CVaR constraints can be cast as follows.

147
zl 1 k  ˆ l k 1    k    l
i 1 i   (4.48)

  k  is explicitly given by,


where set 


     z sup  -CVaR  
k

  k  i
 H  z   h   H  w   0, i  1, p
i i i lk
(4.49)

We will provide a constraint tightening method to reformulate the following constraint.

sup  -CVaR  
  
k i
 H  z   h    H  w   0
i i i lk
(4.50)

We can further reformulate the distributionally robust CVaR constraint, according to

the definition of CVaR and the stochastic minimax theorem [139, 362, 363].

sup  -CVaR  
  
k i
 H  z   h    H  w 
i i i lk

k  
   i

 sup inf   i 

1

 i   H  z  h   H  w
i i i lk
 i  
 


(4.51)

i 

 inf   i 

1
sup 
 i k    H  z  h   H  w
i i i lk
 i 
 



where   
 represents the expectation with respect to probability . The first equality

holds based on Definition 1. The second equality is based on a stochastic minimax

theorem.

The worst-case expectation problem sup  


 
k
 H  z  h   H  w
i i i lk
 i   can be

rewritten as the following optimization problem.

148
m

    H  z   h    H     

sup j  i i i i  j  d 
1 ,, m  j 1

  j  d   1 
  (4.52)

s.t.    j  d    j  j  1, , m


  T  j  d    j   j  Tj 
 

By taking the dual of optimization problem (4.52), we have (4.53)-(4.56).

  t 
m
min j ij   j T ij    j   j  Tj   ij (4.53)
tij ,ij ,ij
j 1

s.t. tij   T ij   T  ij  0,    , j (4.54)


  H i z   h i   H i   i  tij  (4.55)
  T ij   T ij  0,   , j

ij  0, j (4.56)

where tij, ωij, and Ωij are the dual variables corresponding to the constraints in the

ambiguity set. Since constraints (4.54)-(4.55) are semi-infinite constraints, some further

reformulations are needed.

Constraint tij   T ij   T  ij  0,    , j can be reformulated as follows.

min tij   T ij   T ij   0, j (4.57)


 

We can further reformulate constraint min tij   T ij   T ij   0, j as the


 

following LMI constraints, based on the duality of convex quadratic programs and

Schur complement [172].

149
 1 
 ij
2
 ij  E T ij  
   0, j  1, , m (4.58)
 1   ET T
  ij ij  tij  f ij 
T

2 

where ij  0 is a vector of dual variables corresponding to the constraints E  f .

Similarly, constraints (4.55) can be recast as LMI constraints below.


 ij
1
2

ij   H i  E T ij
T
 

   0, j (4.59)

 1    H  T  ET
 tij   i   h i   H i z  f ij 
T
T
 ij ij
2 i

where ij  0 is a vector of dual variables corresponding to the constraints E  f .

Thus, the distributionally robust CVaR constraints

sup  -CVaR  
  
k i
 H  z   h   H  w   0 over the DPMM-based ambiguity set is
i i i lk

reformulated as follows.

 i  i    j tij   jT ij    j   j  Tj   ij   0


m

j 1


 ij
1
2
ij  E T ij  
   0, j
 1   ET T
  ij ij  tij  f ij 
T

2  (4.60)

 ij
1
2
 
ij   H i  E T ij

T

   0, j

2
ij i ij ij i i 
 1    H  T  ET T t     h   H  z  f T 
i ij 

ij  0, ij  0 , ij  0, j

We consider the following constraint:

 H i z   hi    k  i (4.61)

where
150
  k    min 
 i 
(4.62)
s.t. sup  -CVaR  
 
k i
 H i wl k    0
The constraint in (4.62) can be converted into constraints (4.20). Then, we convert

(4.61) and (4.62) as follows.

 H i zl 1 k  
max

h    
i
k 
i
(4.63)
  k    sup  -CVaR
 i
  k 
 
 i  H i wl k   0 


We reformulate constraint (4.63) into (4.60) using the same reformulation technique.

This completes the proof. □

4.9 Appendix C: Proof of Proposition 4.1

Proof. First, we introduce an ambiguity set *         d   1 . Consider the


constraint tightening based on ambiguity set * below.

 i  min 
s.t. sup  -CVaR  
*
i
 H  w i lk
  0 (4.64)

By using the similar technique in Appendix B, we can reformulate (4.64) as follows.

 i  min 
s.t.  i  i  ti  0
(4.65)
ti  0
ti    i  max  H i 
 

 1 
According to constraints in (4.65), we can further have i  max  H i     1 ti .
     
 i 

Since ti  0 , we have  i  max  H i   0 i .


 

151
Since   k    * , it holds that 0 i      , i, k . Given inequality 0 i      ,
k k
i i

we can further have ˆ *l   l k  , k . Based on ˆ *l   l k  , k and Definition 4.6, *f is

a robust positively invariant set (not necessarily MRPI set) for the updated tightened

constraints. Since *f is nonempty under Assumption 4.5, there always exists a

nonempty MRPI set according to Definition 4.5. This completes the proof. □

4.10 Appendix D: Proof of Theorem 4.2

Proof. Given that   xk    , we want to show   xk 1    . Suppose

 
c*N k  c0* k , c1*k , , c*N 1 k is the optimal solution to problem (OL-SMPCk) at time k.

Hence, the corresponding optimal nominal state evolution is given below.

zl*1 k  zl*k  Bcl*k , z0* k  xk (4.66)

Based on the optimal solution, an explicit candidate solution can be constructed as

 
c k 1  c1*k , , c*N 1 k , 0 using Definition 4.3. The nominal state under control input c k 1

is shown as follows.

z0 k 1  xk 1  Axk  Buk  wk


  A  BK  xk  Bc0* k  wk (4.67)
 z1*k  wk

Based on z0 k 1  z1*k  wk , we further have the following equalities.

z1 k 1  z0 k 1  Bc0 k 1

 
  z1*k  wk  Bc1*k (4.68)
 z2* k  wk

152
According to z0 k 1  z1*k  wk and z1 k 1  z2* k  wk , we have the following relation by

induction.

zl k 1  zl*1 k  l wk , l   0, N  1 (4.69)

Similarly, we have the relationship between the optimal solution at time k and the

candidate at time k+1, as presented by,

vl k 1  vl*1 k  K l wk , l   0, N  1 (4.70)

For the optimal control problem (OL-SMPCk+1), we can consider two scenarios,

namely flag=0 and flag=1. For the scenario in which flag=0, we have  l k 1   l k 

according to l
k 1 ˆ  k 1  1  flag    k  . Based on z  z*  l w and
 flag   l l l k 1 l 1 k k

vl k 1  vl*1 k  K l wk , we have the following:

zl*1 k  l 1  zl k 1  l 


k k 1
(4.71)

vl*1 k  l 1  vl k 1  l (4.72)

Next, we derive the candidate nominal state at the end of the horizon as follows.

zN k 1  AzN 1 k 1  BvN 1 k 1

  
 A z *N k   N 1wk  B Kz *N k  K  N 1wk  (4.73)
 z *N k   N wk

Note that f


k 1
 f  because flag=0. Based on the definition of the terminal set, we
k

have the following.

z*N k  f   zN k 1  f 


k k 1
(4.74)

153
Up to now, we have checked all the constraints, we can conclude that the candidate

solution is feasible for the problem (OL-SMPCk+1) if flag=0.

For the scenario where flag=1, it implies that the constructed solution satisfies the

constraints in (OL-SMPCk+1), according to the safe update scheme in Section 4.3.3,

which completes the proof. □

4.11 Appendix E: Proof of Theorem 4.3

Proof. We define the optimal objective value for the problem (OL-SMPCk) as Jk below.

N 1 2
Jk  V *
N  xk    cl*k 1 
 R  B PB T
(4.75)
l 0

Since the candidate solution is feasible and the objective function remains the same for

both scenarios where flag=0 and flag=1, we have

J k 1  VN  xk 1 
N 1 2 N 1 2 2 (4.76)
  cl k 1 
  cl*k 
 J k  c0* k 
  
l 0 l 1

where VN  xk 1  represents the objective value corresponding to the candidate solution.

2
By rearranging J k 1  J k  c0* k 
, we have the following inequality.

2
J k 1  J k   c0* k 
(4.77)

2
By adding J k 1  J k   c0* k 
from k=0, we have the follows.

 2
c
k 0
*
0k 

 J0  J (4.78)

154
 2
Based on c
k 0
*
0k 

 J 0  J  , we have lim c0* k  0 for both scenarios where flag=0
k 

and flag=1. Additionally, the convergence of the state to a neighbor of the origin is

further established.

Then, we consider the asymptotical dynamic behavior of system state as follows.

 k k

lim xk  lim   k x0    i 1 Bc0* k 1    i 1wk i 
k  k 
 i 1 i 1 
(4.79)
 k

 lim    i 1wk i 
k 
 i 1 

Note that the second equality holds because Φ is Shur and lim c0* k  0 .
k 

 k 
According to lim xk  lim    i 1wk i  , we have the following relation.
k  k 
 i 1 

lim xk  R  i0  i  (4.80)


k 

Based on lim xk  R , the system state converges to a neighborhood of the origin,


k 

namely minimal RPI set, under the proposed online-learning based risk-averse

stochastic MPC strategy. This completes the proof. □

155
CHAPTER 5
A TRANSFORMATION-PROXIMAL BUNDLE ALGORITHM FOR SOLVING
LARGE-SCALE MULTISTAGE ADAPTIVE ROBUST OPTIMIZATION
PROBLEMS

5.1 Introduction

In recent years, robust optimization has become an increasingly popular methodology

to immunize optimization problems against uncertainty among both control and

optimization communities [98, 334, 364-367]. Robust optimization can be roughly

classified into three categories: static robust optimization, two-stage Adaptive Robust

Optimization (ARO), and multistage ARO. In static robust optimization, all the

decisions are made prior to observing uncertainty realizations [368]. By contrast, two-

stage ARO allows recourse decisions to be adaptive to realized uncertainties [106], thus

typically generating less conservative solutions than static robust optimization [369].

As a result, the two-stage ARO method has a variety of applications [370]. To overcome

the limitation of two-stage structures, multistage ARO emerges as a practical yet more

computationally challenging paradigm for sequential decision making processes under

uncertainty [110]. In the multistage setting, the decision maker can dynamically adjust

decisions based on the observed uncertainty realizations [371]. Multistage ARO

problems are prevalent in various control problems, including constrained robust finite-

horizon optimal control [364], multiperiod portfolio optimization [372-374], and robust

feedback model predictive control problems [375, 376].

Despite its attractiveness in modeling dynamic decision making under uncertainty, ARO

problems in general are notoriously demanding to solve [377]. To this end, extensive

156
research effort has been made toward solution techniques for ARO problems. One

popular approach is the affine control policy (the so called affine decision rule)

approximation [106], in which recourse decisions are restricted to be affine functions of

uncertainty realizations [378, 379]. In this way, ARO problem reduces to a static (single-

stage) robust optimization problem, which can be further addressed efficiently using

duality-based reformulation or constraint generation [112, 380]. However, affine

control policy sacrifices optimality for tractability [182, 381-383]. Instead of relying on

control policies, the K-adaptability method devises K contingency plans beforehand and

picks a best one among these preselected plans after observing uncertainty realizations

[384, 385]. Reformulation-approximation methods conservatively express two-stage

ARO problem as a single-level optimization problem [386, 387]. The Benders

decomposition and extreme point enumeration were proposed as two exact solution

techniques exclusively suitable for two-stage ARO problems [103, 388-390]. Despite

the broad application scope of the multistage setting, solution techniques for multistage

ARO problems are limited based on the existing literature, and they usually suffer from

an unsatisfactory trade-off between solution quality and computational tractability.

Hence, the research objective of our work is to propose a general algorithmic strategy

for multistage ARO and demonstrate its use in robust optimal control.

This chapter proposes a novel multi-to-two transformation-proximal bundle algorithm

to solve Multistage Adaptive Robust Mixed-Integer Linear Programs (MARMILPs). In

a multistage decision-making setting, decision variables can be partitioned into two

different groups, namely state decision variables and control/local decision variables

[391, 392]. We first propose a novel multi-to-two transformation scheme that converts

157
the multistage ARO problem into an equivalent two-stage counterpart. Specifically, by

enforcing only state decision variables to be affine functions of uncertainty, the original

MARMILP is reduced into a Two-stage Adaptive Robust Mixed-Integer Linear

Program (TARMILP). The proposed transformation scheme frees control decision

variables from the affine control policy restriction, thereby leading to a higher-quality

robust optimization solution [393]. We perform theoretical analysis to prove that such

transformation is valid if state decisions follow causal control policies, such as affine

and piecewise affine control policy [106, 381]. The multi-to-two transformation scheme

is general enough to be combined with existing two-stage ARO solution algorithms for

solving MARMILPs. Specifically, we adopt a proximal bundle algorithm for the exact

solution of the resulting TARMILP. Since the worst-case recourse function in the two-

stage ARO problem lacks an analytical expression and can be non-smooth, the bundle

method is employed with an oracle evaluating the function value and its sub-gradients

at a query point [394]. The Moreau-Yosida regularization is leveraged to determine the

next iteration in a decomposition framework [395]. Notably, the assumption on stage-

wise independence of uncertainty is not required for the multi-to-two transformation

scheme. As a result, the proposed algorithmic framework can accommodate temporal

dynamics exhibited by uncertainties across different time stages. Convergence analysis

of the proposed algorithm is presented for any types of uncertainty sets. Compared with

existing multistage ARO solution methods, including the affine control policy method

[106] and the piecewise affine control policy approach [396], the proposed algorithm

enjoys a more attractive trade-off between solution quality and computational

tractability. The affine control policy method assumes that both state and control

158
decision variables are affine functions of uncertainty [106], which is stronger than the

assumption adopted in this work. Chen & Zhang (2009) splits the uncertainty into the

positive and negative parts, and applies affine control policies to the parameterized

uncertainties. Thus, the piecewise affine control policy developed by Chen & Zhang

(2009) essentially assumes a piecewise affine dependence of uncertainty for both state

and control decision variables. Additionally, these approaches do not require an oracle

for evaluating the function value and its sub-gradients [106, 396], whereas the proposed

method needs such oracle to obtain cutting planes. To test and evaluate the performance

of the proposed algorithm, an application to constrained robust optimal control of

dynamic inventory systems is presented. Although this chapter only presents the

applications to robust optimal inventory control and process network planning, it

focuses on the development of a general methodology for multistage ARO, which can

be potentially applied to a variety of control problems.

The major contributions of this work are summarized as follows.

 A novel multi-to-two transformation scheme along with its theoretical analysis for

the solution of multistage ARO problems by applying decision rules only to

adjustable state decisions;

 A transformation-proximal bundle algorithm for solving multistage ARO problems

that provides an attractive trade-off between solution quality and tractability;

 An efficient procedure to construct lower bounds of MARMILPs based on a

scenario-tree problem with uncertainty scenarios generated by the cutting-plane

based proximal bundle algorithm;

159
 Application to the constrained robust optimal control of dynamic inventory systems

under demand uncertainty alongside comparisons with affine and piecewise affine

disturbance-feedback control policies.

5.2 The multi-to-two transformation scheme

In this section, we propose a novel multi-to-two transformation scheme for multistage

ARO problems. By employing the affine control policy only to state decision variables,

the proposed scheme can transform the original multistage ARO problem into its

equivalent two-stage ARO counterpart. First, a general model formulation of an

MARMILP is presented. We then develop the transformation scheme, in which affine

control policies are only applied to state decision variables. Finally, a theoretical

analysis is performed to prove that the MARMILP is converted to an equivalent two-

stage ARO problem via this multi-to-two transformation scheme.

In multistage ARO problems, decisions are made sequentially, and uncertainties are

revealed over time stages. The MARMILP in its general form is shown as follows:

T
min max cx   dtst  ut   fty t  ut  
x , st  , y t   uU
t 1
 
s.t . Tt x  A t st  ut   Bt st 1  ut 1  (5.1)
 Wt y t  u   h  H t ut , u  U , t
t 0
t

Lt x  Et st  ut   G t y t  ut   m t0  M t ut , u  U , t

where T is the total number of time stages, u1, …, uT are uncertainties revealed over T

stages, x is a vector of “here-and-now” decisions made prior to any uncertainty

realizations, s1, …, sT are adjustable state decision variables, and y1, …, yT are adjustable

control decision variables. Note that the “here-and-now” decisions x include continuous

160
and integer variables, while the adjustable or recourse decisions involve continuous

decision variables. The prime symbol ′ stands for the transpose of a generic vector. Let

vector ut=[u1′, …, ut′]′ be the concatenated vectors of past uncertainty realizations, and

u=[u1′, …, uT′]′. c is the vector of cost coefficients corresponding to “here-and-now”

decisions. dt and ft are the vectors of cost coefficients corresponding to state decisions

and control decisions made at stage t, respectively. The state decision variables link

optimization problems of successive stages, while control decisions are only involved

in the current time stage [391]. Also note that a large class of multistage ARO problems

can be reformulated in this form through the introduction of additional variables and

constraints [392].

Remark 5.1 Constrained robust optimal control problems of linear systems with

disturbance-feedback control policies can be regarded as special instances of the

multistage ARO problem (1) without “here-and-now” decision x. The equality

constraints in (1) describe the state dynamics of a discrete-time linear system subject to

additive disturbance.

Decisions st(ꞏ) and yt(ꞏ) are general control policies or mappings, enabling the recourse

actions to be fully adaptive to observed uncertainty realizations. The multistage ARO

problem given in (1.6) is computationally intractable due to the infinite dimensions of

the mappings or policies. To this end, affine control policy is resorted to as a tractable

approximation technique that restricts both st(ꞏ) and yt(ꞏ) to be affine functions of

uncertainty realizations. However, such computational tractability induced by the

conventional decision-rule-based method is usually obtained at a huge expense of

solution quality. Note that, in conventional robust optimal control, the control policies

161
st(ꞏ) and yt(ꞏ) are restricted to be affine with respect to uncertainty or disturbance. The

key idea of the proposed multi-to-two transformation scheme is to restrict only state st(ꞏ)

to follow an affine control policy as shown in (5.2), while endowing control variables

yt(ꞏ) with full adjustability to the observed uncertainty realizations.

st  ut   Pu
t
t
 qt (5.2)

where Pt and qt are the coefficients of the affine function and must be determined before

uncertainty realizations. Note that Pt is a matrix, qt is a vector, and they are of

appropriate dimensions. The above control policy is causal or non-anticipative, because

it only depends on the past uncertainty realization ut instead of the future ones. After

plugging the control policy (5.2) into the multistage ARO problem (1.6), the

MARMILP under the multi-to-two transformation scheme can be formulated as

follows:

 T
 T
min max  cx   dtq t    d tPt u t  fty t  u t  
x , Pt , qt uU
y t    t 1  t 1  

s.t . A t  Pt ut  qt   B t  Pt 1ut 1  qt 1  (5.3)


 Wt y t  u t   h t0  H t u t  Tt x, u  U , t
Et  Pt u t  qt   G t y t  ut   m t0  M t u t  Lt x, u  U ,t

where control decision yt(ꞏ) is a general function of uncertainty realizations.

For the ease of exposition, we present the nested formulation of multistage ARO

problem (5.3) in (5.4).

  
min  f 0  xˆ   max min 1  f1  y1    max min T fT  yT   (5.4)
u1U1 y11  xˆ ,u  uT UT yT T  xˆ ,u 
 
xˆ0

162
where xˆ  x, Pt , q t  is an aggregated “here-and-now” decisions, set Ω0 represents its

feasible region, and set Ωt( x̂ , ut) is the feasible region of adjustable control decisions

at stage t as given in (5.5). U1, …, UT denote uncertainty sets of uncertain parameters at

different time stages.

 At  Pt ut  qt   Bt  Pt 1ut 1  qt 1  
 
 
t  xˆ , ut   y t  Wt y t  ht0  H t ut  Tt x  (5.5)
 
 Et  Pt u  qt   G t y t  mt  M t ut  Lt x 
t 0

The objective functions in the nested multistage ARO formulation (5.4) at different time

stages are explicitly defined in (5.6).

 T

 0 
f ˆ
x  cx   dtqt
(5.6)
 t  1

 f y  f y  d P ut , t  1,  , T
 t t t t t t

Since we do not assume uncertainty to be stage-wise independent, uncertainty set U in

the MARMILP can be treated as a “joint” uncertainty set. In this sense, the uncertainty

set for stage t is given by (5.7),


U t  Projut U u1 ,,u t 1  (5.7)

where Ut is defined as the projection of uncertainty set U onto ut given the values of u1

to ut-1.

By employing affine control policies only to state decisions, the multi-to-two

transformation scheme converts (1.6) into problem (5.4). The following theorem

provides a theoretical proof that the multistage ARO problem (5.4) is equivalent to a

two-stage ARO problem. Therefore, the multistage ARO problem is reduced to a two-

stage ARO problem through the proposed transformation scheme.


163
Theorem 5.1 If the affine control policy (5.2) is used only for adjustable state decisions,

the multistage ARO problem (1.6) is transformed into a two-stage ARO problem given

below.

 

 
T T
min  cx   dtqt   max min  dtPt ut  fty t  (5.8)
xˆ 0
 t 1   t t    t 1
uU y y y  xˆ ,ut , t


where y=[y1′, …, yT′]′ be the concatenated control decisions.

Proof. Since the multistage ARO problem (1.6) is reformulated as (5.4) by applying

affine control policy in (5.2), we only need to concern the equivalence between

optimization problems (5.4) and (5.8). Considering the max-min optimization problem

in (5.4) at t=T−1, we have

 
max min  fT 1  y T 1   umax min T fT  y T  
 T 1
uT 1UT 1 yT 1T 1 xˆ ,u  T UT yT 1  x
ˆ ,u 

 
 max  min fT 1  y T 1   max min T fT  y T  
uT 1UT 1 yT 1T 1  xˆ ,uT 1  uT UT yT T  xˆ ,u 
 
(5.9)
 
 max max  min T 1 fT 1  yT 1   min T fT  y T  
uT 1UT 1 uT UT yT 1T 1  xˆ ,u  yT T  xˆ ,u 
 
 T 
 max   minˆ t f t  y t  
uT 1 , uT ProjuT 1 , uT  U u1 ,,uT 2  t T 1 y t t  x ,u  

The first equality in (5.9) is based on the fact that the optimization problem at t=T does

not involve control decisions at stage T-1. The second equality in (5.9) is valid because

the feasible region of yT-1 and fT-1(yT-1) do not depend on uT. The above derivation can

be performed backward until t=1, and as a result, the nested formulation collapses.

Therefore, we can further rewrite the nested formulation (5.4) as follows.

164
 T 
min  f 0  xˆ   max   min t ft  y t  
u1 ,,uT Proju1 ,,uT  U  t 1 y t t  xˆ ,u 
 
xˆ 0

 T 
 min  f 0  xˆ   max  min t ft  y t   (5.10)
xˆ 0

uU
 t 1 yt t  xˆ ,u  
 T 
 min  f 0  xˆ   max
xˆ 0
min
 t t    t 1
uU y y y  xˆ ,ut , t
 ft  y t  
 

The first equality in (5.10) is due to the definition of projection. The second equality is

valid because the inner minimization problem can be decoupled by stage, given x̂ and

u. According to (5.6) and (5.10), the multistage ARO problem (5.4) is equivalent to a

two-stage ARO problem (5.8), which concludes the proof. □

Remark 5.2 Following a similar procedure, we can readily prove that such

transformation scheme is still valid if the adjustable state decision variables follow other

types of causal control policies. The proposed scheme is general enough to embrace

more advanced control policies, such as piecewise affine and polynomial control

policies.

Remark 5.3 One highlight of the proposed transformation scheme lies in its capability

of being employed in conjunction with existing two-stage ARO solution algorithms for

solving MARMILPs. Accordingly, the multi-to-two transformation scheme opens a new

avenue for a variety of multistage ARO solution algorithms.

5.3 Transformation-proximal bundle algorithm

In this section, a proximal bundle method is first adopted for solving the resulting two-

stage ARO problem (5.8). We then propose an algorithmic framework for solutions of

multistage ARO problems by combining the proposed multi-to-two transformation

165
scheme with the proximal bundle method and present the convergence analysis of the

proposed solution algorithm.

5.3.1 A multistage robust optimization solution algorithm

The proximal bundle algorithm has proved to be an efficient solution method in various

optimization areas, such as non-smooth optimization [397], robust optimization [398],

and stochastic programming [399]. In the following, we present the proximal bundle

method for solving two-stage ARO problem (5.8).

The worst-case recourse function of the two-stage ARO problem, denoted as Q  xˆ  , is

shown in (5.11).

 
T
Q  xˆ   max
 t
min
t  
uU y y y  xˆ ,ut , t


t 1
d P u
t t
t
 ft y t (5.11)

where the “max-min” optimization problem is often referred to as an adversarial

optimization problem.

Based on the definition of the worst-case recourse function, two-stage ARO problem

(5.8) can be considered as a minimization problem whose objective function is given by

(5.12).

 T

F  xˆ    cx   dtqt   Q  xˆ  (5.12)
 t 1 

where F  xˆ  is the objective function of two-stage ARO problem (5.8).

Due to the multi-level optimization structure, the objective function F  xˆ  does not

have an analytical expression and is computationally expensive to evaluate. As a class

of regularized cutting plane methods, the proximal bundle method is proved to be

166
suitable for addressing this type of optimization setting [400]. In the proximal bundle

method, bundle information includes the past query points xˆ l (l=1, .., k), their

 
corresponding function values F xˆ l , and sub-gradients of function F at these query

points. We need to solve the max-min optimization problem in (5.11) to obtain the

function value and a sub-gradient at one query point. To this end, the two-level

optimization problem (5.11) is equivalently reformulated into a single-level one by

replacing the inner minimization problem in (5.11) using KKT conditions [401]. The

resulting sub-problem denoted as (SUP) is given by

  d P u 
T
max t t
t
 ft y t
uU
y t , φt , πt t 1

s.t . Wtφt  G t πt  ft , t
πt  0, t (SUP)
y t  t  xˆ , ut  , t
G t y t  Et  Pt ut  qt   mt0  M t ut  Lt x   πt   0, t , i
 i i

where φt and πt are the dual variables corresponding to the constraints in (5.5) at stage

t, and i denotes the element index of a vector.

To address bilinear terms in the complementary slackness constraints in (SUP), we

introduce binary variables and linearize these constraints into (5.13) by using the big-M

method, which is a standard technique used in literature [402].

 πt i  M   wt i , t , i
(5.13)
G t y t  Et  Pt ut  qt   mt0  M t ut  Lt x   M  1   w t   , t , i
 i  i

167
where wt represents a vector of binary decision variables, and M is a large positive

number. If uncertainty set U is a polyhedral set, the reformulation of (SUP) is a Mixed-

Integer Linear Program (MILP).

With sub-gradients and function values, we build the optimality cutting plane model for

F  xˆ l  shown in (5.14) [398].


Fk  xˆ   max F  xˆ l   g l , xˆ  xˆ l
l 1,, k
 (5.14)

where Fk  xˆ  is the optimality cutting plane model at the k-th iteration. gl is one sub-

gradient of the objective function F at the l-th query point and can be obtained using

optimal dual variables in the same way as the Benders decomposition [389]. Notably,

Fk  xˆ  is a lower piecewise linear approximation of function F. Note that the two-stage

ARO problem (5.8) may not satisfy the relative complete recourse assumption.

Therefore, for some query point xˆ l , there exist certain uncertainty realizations that

 
render the second-stage optimization problem infeasible. This implies F xˆ l   or

equivalently xˆ l  dom F , where dom represents the domain of a function. In the

proximal bundle method, for a given xˆ l , we either derive a lower linearization of

function F (optimality cut) or obtain a cutting plane that separates xˆ l and dom F

(feasibility cut) [398]. To check whether xˆ l  dom F or not, the following Feasibility

Problem (FP) needs to be solved.

168
max min
 
uU y t ,α t ,α t ,βt
 1α
t

t  1α t  1βt 

s.t . At  Pt ut  qt   Bt  Pt 1ut 1  qt 1   Wt y t  α t  α t
 ht0  H t ut  Tt x, t (FP)
Et  Pt u  qt   G t y t  βt  m  M t ut  Lt x, t
t 0
t

α t , α t , βt  0, t

where αt+, αt−, and βt are slack variables, and 1 is the vector of ones in an appropriate

dimension. Similar to problem (SUP), we reformulate problem (FP) in the max-min

form into a single-level optimization problem using the KKT condition and linearize

complementary slackness constraints using the big-M method [402]. Let  xˆ l denote  
 
the optimal value of problem (FP) associated with a query point xˆ l . If  xˆ l  0 , there

exist feasible second-stage decisions for any uncertainty realizations in uncertainty set

 
U. Thus, we have xˆ l  dom F and only need optimality cuts. If  xˆ l  0 , the worst-

case uncertainty realization can lead to the nonexistence of feasible recourse decisions.

As a result, the feasibility cut is required.

To determine the next query point, we consider the Moreau-Yosida regularization of

Fk  xˆ  given by (5.15),

1
G  xˆ   Fk  xˆ  
2
xˆ  z k (5.15)
2tk

where zk is the stability center for the k-th iteration and tk is the positive proximal

parameter [394, 400]. Note that the stability center represents the best current iterate.

The proximal bundle method uses the regularization term to make sure that the next

iterate is not far away from the stability center. In the proximal bundle algorithm, we

169
iteratively refine the cutting plane models by adding new query points on the fly. The

optimal solution of the following Master Problem (MP) provides the next query point.

1 2
min   xˆ  z k
xˆ 0 ,  2tk
s.t .   F  xˆ l   g l , xˆ  xˆ l , l  Lo (MP)
0  F  xˆ l   g lf , xˆ  xˆ l , l  L f

where η is an auxiliary variable. Lo and Lf denote the index sets of optimality and

feasibility cuts, respectively. Akin to the Benders decomposition, constraint

  F  xˆ l   gl , xˆ  xˆ l corresponds to an optimality cut, while

0  F  xˆ l   glf , xˆ  xˆ l is a feasibility cut. Besides the cuts derived in the dual space,

optimality cuts in the primal space can be added as well [103]. It is worth noting that

the above master problem is a Mixed-Integer Quadratically Constrained Program

(MIQCP), which can be solved efficiently by using the off-the-shelf optimization

solvers such as CPLEX and GUROBI.

In the proximal bundle method, the expected decrease δk defined in equation (5.16) is

used to determine whether to update the stability center or remain at the current stability

center. Also, the expected decrease is used to check the stopping criterion in the

proximal bundle algorithm.

1 k 1 k
 k  F  z k   Fk  xˆ k 1  
2
xˆ  z (5.16)
2tk

where xˆ k 1 is an optimal solution to (MP). To circumvent unnecessary moves, the

proximal bundle method updates the stability center only when the objective is

 
sufficiently decreased, i.e. F xˆ k 1  F z k  m k .  
170
The proximal bundle method is adopted for the two-stage ARO problem and designed

in a decomposition framework to transform the original tri-level optimization problem

into a single-level problem. The cutting-plane models are refined gradually in each

iteration, and the stability center is guaranteed to converge to the optimal solution. The

convergence analysis of the proposed transformation-proximal bundle algorithm is

presented in Section 5.3.2.

The pseudocode of the proposed transformation-proximal bundle algorithm for solving

multistage ARO problems is shown in Figure 25. The proposed algorithmic framework

is comprised of two primary blocks connected in series. The first block is the multi-to-

two transformation step to convert the multistage ARO problem into a two-stage ARO

problem. The second block is the proximal bundle method, which is employed to

address the resulting two-stage ARO problem. The proposed algorithm iteratively

solves a master problem, a feasibility problem, and a subproblem, until the expected

decrease reaches its predefined tolerance δtol. The transformation-proximal bundle

algorithm provides an attractive trade-off between solution quality and computational

tractability by organically integrating the multi-to-two transformation scheme with the

regularized cutting-plane machinery.

171
Algorithm. Transformation-proximal bundle algorithm
1: Step 1 (Initialization)
2: Set k  0 , flag  0 , m , tk and tol ;
3: Step 2 (Transformation step)
4: Substitute adjustable state decisions with affine control policy in (2);
5: While flag  0
6: Step 3 (Master problem)
7: Solve master problem (MP) to obtain xˆ k 1 , k 1 ;
1 k 1 k 2
8:  
Update  k  F z k  Fk xˆ k 1  2tk

xˆ  z ;

9: Step 4 (Stopping test)


10: if  k   tol then
11: flag  1 and stop the while loop;
12: end
13: Step 5 (Call oracle)
14: Solve feasibility problem (FP) to obtain   xˆ k 1  and g kf 1 ;
15: Solve subproblem problem (SUP) to obtain F  xˆ k 1  and gk1 ;
16: Step 6 (Update stability center)
17:    
if F xˆ k 1  F z k  m k then
18: 
Update stability center z k 1  xˆ k 1 , and F z k 1  F xˆ k 1 ;  
19: else
20: 
Remain the stability center z k 1  z k , and F z k 1  F z k ;   
21: end
22: Step 7 (Update cutting-plane models)
23: If   xˆ k 1   0 then

24: 
Update Fk 1  xˆ   max Fk 1  xˆ  , F  xˆ k 1   g k 1 , xˆ  xˆ k 1 ;
25: else
26: Update the feasibility cutting plane model;
27: end
28: k  k 1;
29: end
k
30: Return z and F z ;
k
 

Figure 25. The pseudocode of transformation-proximal bundle.

172
5.3.2 Convergence analysis

In this subsection, we present the convergence analysis of the proposed algorithm.

Proposition 5.1 F  xˆ  in (5.12) is a convex function in x̂ .

Proof. Based on (5.12), we can see that F  xˆ  is the sum of the linear function in x̂

and Q  xˆ  . So, we only need to show the convexity of Q  xˆ  . We rewrite function Q  xˆ 

as (5.17).

T T 
Q  xˆ   max   dtPt ut  min  t tf  y
uU
 t 1 yy y t t  xˆ ,ut , t t 1  (5.17)
T 
 max   dtPt ut  R  xˆ , u  
uU
 t 1 

 dtPt ut is a linear function in x̂ . Let    0, 1 , y1t* be the optimal


T
For  u  U , t 1

solution for the minimization problem involved in R  xˆ 1 , u  , and y*2t be the optimal

solution for minimization problem involved in R  xˆ 2 , u  . Therefore, we can have

T
R  xˆ 1  1    xˆ 2 , u    ft  y1*t  1    y*2t 
t 1 (5.18)
   R  xˆ 1 , u   1     R  xˆ 2 , u 

The inequality in (5.18) is based on the fact that  y1*t  1    y *2 t is feasible for the

minimization problem involved in R  xˆ 1  1    xˆ 2 , u  . Based on (5.18), R  xˆ , u  is a

convex function in x̂ for  u  U . Following the pointwise maximum property, F  xˆ 

is a convex function. □

To facilitate exposition, we define the linearization errors at the stability center zk in

(5.19).

173
el  F  z k    F  xˆ l   g l , z k  xˆ l  , l (5.19)

Based on convexity of F  xˆ  , we have el  0 . With the definition of el, we can rewrite

Fk  xˆ  in (5.20).


Fk  xˆ   max F  z k   el  gl , z k  xˆ l  gl , xˆ  xˆ l
l 1,, k

(5.20)

 F  z k   max el  gl , xˆ  z k
l 1,, k

To prove the convergence of the proposed algorithm, we first present five lemmas and

one proposition as follows [394, 403].

Lemma 5.1 Consider the following Regularized Optimization Problem (ROP):

1
min Fk  xˆ  
2
xˆ  z k (ROP)
xˆ 2tk

Then, the dual problem is shown in (5.21).


2
tk k k


maxk 
F  z k
 
2
 l  gl   l  el (5.21)
αR  l 1 l 1 l 1
k


 
l 1 

Proof. By using epigraph reformulation, we have (5.22).

1 2
min   xˆ  z k
xˆ ,  2tk (5.22)
s.t .   F  z k
e 
l g , xˆ  z , l  1,.k
l k

The Lagrangean function is

 
k k
1
     l    l  F  z k   el  g l , xˆ  z k
2
L   xˆ  z k
2tk l 1 l 1

Based on KKT conditions of this problem, we have

174
 L 1 k

 xˆ t 
    l  gl  0

k
ˆ
x z
 k l 1

 L  1    0
k

  
l 1
l

Thus, the dual objective function is given in (5.23),

 
k
1
   l    F  z k   el  g l , xˆ  z k
2
 xˆ  z k
2tk l 1
k
1
 F  z k     l  el  tk
2 2
tk  l 1 l  g l 
k k
   gl
l 1 l
(5.23)
2tk l 1

k 2
t k
 F  z     l  el  k
k
 l g l

l 1 2 l 1

which completes the proof. □

Lemma 5.2 Suppose α is an optimal solution to the optimization problem in (5.21).

Then, we have

(i) gˆ k Fk  xˆ k 1 

Fk  xˆ k 1   F  z k   tk gˆ k
2
(ii)  eˆk

tk k 2
(iii) k  gˆ  eˆk
2

(iv) gˆ k  eˆk F  z k 

where gˆ k   l 1 l  g l and eˆk   l 1 l el .


k k

Proof. (i) We can obtain

1 k 1 k
gˆ k  
tk
 xˆ  z  (5.24)

 1 
Since xˆ k 1 is an optimal solution to (ROP), we have 0   Fk  xˆ    xˆ  z k   .
 tk 

175
Therefore, based on (5.24), we arrive at (i).

(ii) Based on strong duality, we have,

1 k 1 k tk k
Fk  xˆ k 1    F  z k   eˆk 
2 2
xˆ  z gˆ (5.25)
2tk 2

Based on (5.25), we further have

t 1
Fk  xˆ k 1   F  z k   eˆk  k gˆ k
2 2
 tk gˆ k
2 2tk (5.26)
 F  z k   eˆk  tk gˆ k 2

(iii) According to (5.16) and (5.26), we can have (5.27).


 k  F  z k   F  z k   eˆk  tk gˆ k
2
  21t tk gˆ k
2

k
(5.27)
t 2
 eˆk  k gˆ k
2

(iv) Since F  xˆ  is convex based on Proposition 5.1 and Lemma 5.2 (i), we have the

following

F  xˆ   Fk  xˆ k 1   gˆ k , xˆ  xˆ k 1

 F  z k   tk gˆ k
2
 eˆk  gˆ k , xˆ  z k  gˆ k , z k  xˆ k 1 (5.28)
 F  z k   gˆ k , xˆ  z k  eˆk

The first equality is based on Lemma 5.2 (ii), and the second equality is based on the

equation (5.24). □

Definition 5.1 (Serious Steps). For the proposed algorithm, serious steps refer to those

steps in which the stability center is changed.

Lemma 5.3 Suppose F* be the optimal value of min F  xˆ  and F*>−∞. Then, we have

inequality (5.29).

176
F  z0   F *

kLs
k 
m
 (5.29)

where Ls denotes the set of iteration having serious steps.

Proof. Based on the serious step, we have

F  z k   F  xˆ k 1   F  z k   F  z k 1   m k

By taking a summation over the set of serious steps, we arrive at

F  z0   F *    F  z   F  z    m
k k 1
k (5.30)
kLs kLs

By rearranging (5.30) and noting that F*>−∞, we have (5.29), which completes the

proof. □

Assumption 5.1 For the infinite number of serious steps, i.e. Ls   , the sequence

F  z  k
k Ls
is assumed to converge and limkLs F z k  F*   .  
Note that here we do not assume the converged value F* is the optimal value of the two-

stage ARO problem (5.8), which is denoted by another symbol F * . Theorem 5.2 will

prove that F* is indeed the optimal value.

Lemma 5.4 For an infinite number of serious steps, we have

(i) If t
kLs
k   , then liminf gˆ  0 ;
k

(ii) If 0  tk  c and arg min F  xˆ    , then z k  is bounded.


xˆ k Ls

Proof. (i) Using Lemma 5.2 (iii), we have inequality (5.31).

F  z 0   F*
2
tk gˆ k

kLs 2
 
kLs
k 
m
 (5.31)

177
Since t k   , we can conclude that zero is a cluster point of gˆ 
k
k Ls
.
kLs

(ii) Let xˆ *  arg min F  xˆ  , for k  Ls , we have the following

2 2 2
xˆ *  z k 1  xˆ *  z k  z k  z k 1  2 xˆ *  z k , z k  z k 1

  tk  gˆ k
2 2 2
 xˆ *  z k  2tk xˆ *  z k , gˆ k
 t 2
 2tk  F  xˆ *   F  z k   eˆk  k gˆ k 
2
 xˆ *  z k (5.32)
 2 

 2tk F  xˆ *   F  z k    k 
2
 xˆ *  z k
2
 xˆ *  z k  2tk  k

Summing (5.32) over set Ls leads to the following inequality:

k 1
xˆ *  z k 1  xˆ *  z 0  2 tl l  xˆ *  z 0  2c    l
2 2 2
(5.33)
l 1 lLs

Based on (5.33) and Lemma 3, then we have (ii). □

Definition 5.2 (Null Steps). For the proposed algorithm, null steps are those steps in

which the stability center remains the same.

Lemma 5.5 If there is a finite number of serious steps, i.e. Ls   , let k0 be the index

of last serious step, xˆ k  be the sequence of null steps, and z k0 be the stability center
k  k0

generated by the last serious step. Then, we have (k>k0),

1
 
2
F z k0   k  xˆ  xˆ k 1
2tk
(5.34)
1
 Fk  xˆ k 1   gˆ k , xˆ  xˆ k 1 
2
xˆ  z k0
2tk

Proof. Starting from the right-hand side of (5.34), we have (5.35).

178
1  1 k 1 k0 
F  z k0   xˆ  xˆ k 1   F  z k0   Fk  xˆ k 1  
2 2
xˆ  z 
2tk  2tk 
 Fk  xˆ k 1  
1
2tk  2
xˆ  xˆ k 1  xˆ k 1  z k0
2

(5.35)
   
 xˆ  xˆ k 1  xˆ k 1  z k0
 
2
1
 Fk  xˆ k 1   
2tk  
ˆ ˆ k 1 ˆ k 1 k0
 2 x  x , x  z 
 Fk  xˆ k 1  
1
2tk  xˆ  z k0 2
 2 xˆ  xˆ k 1 ,  tk gˆ k   RHS
The first equality is based on (5.16), while the third equality is based on the equation

(5.24). □

Lemma 5.6 For the proposed algorithm, the following equality and inequality hold.

   
(i) Fk xˆ k 1  gˆ k , xˆ k  2  xˆ k 1  F z k  gˆ k , xˆ k  2  z k  eˆk

  
(ii) Fk xˆ k 1  gˆ k , xˆ k  2  xˆ k 1  Fk 1 xˆ k  2 
Proof. (i) Starting from the right-hand side of Lemma 5.6 (i), we have (5.36).

F  z k   gˆ k , xˆ k  2  z k  eˆk
 F  z k   gˆ k , xˆ k  2  xˆ k 1  gˆ k , xˆ k 1  z k  eˆk
 F  z k   gˆ k , xˆ k  2  xˆ k 1  gˆ k ,  tk gˆ k (5.36)

  Fk  xˆ k 1   F  z k   tk gˆ k 
2

 
 Fk  xˆ   gˆ , xˆ  xˆ
k  1 k k  2 k 1

where the second equality holds according to (5.24) and Lemma 5.2 (ii).

(ii) Based on the expressions of gˆ k and eˆk in Lemma 5.2, we can have

179
F  z k   gˆ k , xˆ k  2  z k  eˆk


 F  z k    l 1 l el  g l , xˆ k  2  z k
k

(5.37)

 F  z k   max el  g l , xˆ k  2  z k
l 1,, k

 Fk  xˆ k  2   Fk 1  xˆ k  2 


k
The first inequality is based on the fact that α  Rk and l 1
 l  1 , the second equality

holds because of (5.20), and the last inequality is according to (5.14).

Based on Lemma 5.6 (i) and (5.37), we have Lemma 5.6 (ii).

Assumption 5.2 The sequence t k  is positive and nonincreasing.

Propositon 5.2 If there is a finite number of serious steps, let k0 be the index of last

serious step, xˆ k  the sequence of null steps, and z k0 is the stability center generated
k  k0

by the last serious step. Then δk→0.

Proof. Using Lemma 5.5 and xˆ  xˆ k  2

1 k 2
 
2
F z k0   k  xˆ  xˆ k 1
2tk
1 k 2
 Fk  xˆ k 1   gˆ k , xˆ k  2  xˆ k 1 
2
xˆ  z k0
2tk
(5.38)
 Fk 1  xˆ   21t xˆ k 2  z k0
2
k 2

1
 Fk 1  xˆ k  2    
2
xˆ k  2  z k0  F z k0   k 1
2tk 1

where the first equality is based on Lemma 5.5, the first inequality is according to

Lemma 5.6, the second inequality is valid because of Assumption 5.1, the last equality

1 k  2 k 1 2
is based on (5.16). By rearranging (5.38), we have  k   k 1  xˆ  xˆ .
2tk
180
Using one more time Lemma 5, we can have xˆ  z k0 and (5.39).

1 k0
  z  xˆ k 1  Fk  xˆ k 1   gˆ k , z k0  xˆ k 1
2
F z k0   k 
2tk (5.39)
 Fk z   F z 
k0 k0

2
Therefore, we have z k0  xˆ k 1  2 k tk  2 k0 tk0 due to fact that δk is decreasing and tk

is nonincreasing. Thus, xˆ k   is bounded. Since the serious steps fail for any steps
beyond k0, we have m k  F xˆ k 1  F z 0 .
k
   
 
Based on (5.16), we can have  k  F z 0  Fk xˆ k 1 . Therefore, we have
k
 
1  m   k  F  xˆ k 1   Fk  xˆ k 1 
(5.40)
  F  xˆ k 1   F  xˆ k     Fk  xˆ k   Fk  xˆ k 1    2 xˆ k 1  xˆ k

 
The equality in (5.40) is based on fact that F xˆ k  Fk xˆ k . Therefore, we can obtain  
1  m   2  1  m   2
2 2
1 k 2 2
 k   k 1  xˆ  xˆ k 1  k 1 (5.41)
8 2tk 8  2 t k0
k
2tk

1  m
2

By summing (5.41) over k≥k0, we have


8 t02 
k  k0
k
2
   k   k 1    k0 .
k  k0

Thus, δk→0. □

Theorem 5.2 For δtol=0, the proposed transformation-proximal bundle algorithm

converges to the globally optimal solution of (5.8) asymptotically; for  tol  0 , it is

guaranteed to converge in finite steps.

181
Proof. For δtol=0, the transformation-proximal bundle algorithm loops forever. There

are two exclusive scenarios: (1) The algorithm implements an infinite number of serious

steps; (2) After a finite number of serious steps, the algorithm implements only null

steps.

If there is an infinite number of serious steps, we have eˆk  0 . Therefore, we have

gˆ k F  z k  , k   . Based on Lemma 5.4, we can further have that

0 F  z k  , k   , which implies that the algorithm converges to globally optimal

solution of (5.8) asymptotically.

Under scenario 2, we have  k  0 based on Proposition 5.2. Also,  k  0 implies

eˆk  0 according to Lemma 5.2 (iii). Thus, the algorithm still converges to globally

optimal solution of (5.8) asymptotically. For  tol  0 , suppose the algorithm does not

converge in finite number of iterations, then we have  k   tol  0, k , which

contradicts δk→0. based on Lemma 5.3 and Proposition 5.2. □

Remark 5.4 Since el defined in (5.19) is the linearization errors for a convex function,

both el and eˆk   l 1 l el ( l  0 ) are nonnegative. According to Lemma 5.2 (iii), we
k

have  k  0, k . If δtol= 0, that stopping criterion becomes δk< 0. Consequently, the

stopping condition will never be met, and the algorithm will loop forever.

Remark 5.5 Since  k  0, k , the series  kLs


 k is nondecreasing. According to

Lemma 5.3, the series  kLs


 k is also bounded from above. Therefore, the series has

a limit, and its convergence is guaranteed.

182
5.4 The lower bounding technique

In this section, we devise a lower bounding technique, which serves to assess solution

quality of multistage ARO solution algorithms. Both affine control policy and the

proposed transformation-proximal bundle algorithm are approximation solution

approaches for solving computationally intractable MARMILPs, and they both yield

upper bounds on the optimal value of the original multistage robust optimization

problem. To measure the loss of optimality, we leverage the proposed solution algorithm

developed in the previous section in conjunction with the scenario-tree based method

[404]. The proposed lower bounding technique is presented in this section.

There are in general two types of lower bounds, namely a priori bound and posteriori

bound. A priori lower bounding methods evaluate the worst-case bound for any problem

instances of MARMILPs. However, this type of lower bound might be too pessimistic

for a specific problem instance. As such, we focus on posteriori lower bounding

techniques, which can provide a lower bound for the optimal value of a specific

MARMILP instance. Posterior results fit our purpose to assess and compare loss of

optimality incurred by different multistage solution algorithms at computational

experiments in the next section.

The idea of scenario-tree-based lower bounding approach is to replace the uncertainty

set in MARMILPs with a finite number of uncertainty scenarios. The resulting scenario-

tree problem yields a lower bound, because it is a relaxation of the original MARMILP.

It is worth noting that the quality of lower bounds depends heavily on the choice of the

scenario set. Motivated by this observation, we resort to the uncertainty scenario set

constructed within the transformation-proximal bundle algorithmic framework.


183
Specifically, uncertainty scenarios are directly constructed from the subproblem (SUP)

and the feasibility problem (FP) during the oracle calling. This yields optimality or

feasibility cuts, which are then fed back to the master problem in each iteration. When

the proposed solution algorithm converges, the scenario set can be obtained by

collecting all uncertainty scenarios. The resulting scenario-tree counterpart is shown as

follows.
T
min max cx   dtst  ut   fty t  u t  
x , st , y t uU
t 1
 
s.t . A t st  u t   Bt st 1  ut 1   Wt y t  ut 
 ht0  H t ut  Tt x, u  U , t
(5.42)
Et st  u   G t y t  u   m  M t ut  Lt x, u  U , t
t t 0
t

       
ut i   ut j   st ut i   st u t j  , y t uti   y t ut j  i, j , t

U  u  , , u  
1 N

where u(i) is an element of the scenario set U and N denotes the total number of

uncertainty scenarios. Note that additional constraints are introduced to model the non-

anticipativity restriction in the multistage decision-making setting [45]. To be more

specific, if the trajectories of two uncertainty scenarios are the same up to stage t, the

corresponding recourse decisions cannot be distinguished. Using epigraph

reformulation, we can equivalently transform (5.42) into the following Scenario-Tree

Multistage Adaptive Robust Mixed-Integer Linear Program (STMARMILP), as shown

in (5.43).

184
min cx  
x , st , y t , 

   
T
s.t .    d tst ut i   fty t u t i  , i  1, , N 
t 1
 
A s  u    B s  u    W y  u  
t
t t 1
t 1 t
t t i i t t i (5.43)
 h  H t u  i   Tt x, i  1, , N  , t
0
t
t

   
Et st u t i   G t y t u t i   m t0  M t u t i   L t x, i  1, , N  , t

u    u    s  u     s  u    , y  u     y  u    i, j , t
t
i
t
j t
t
i t
t
j t
t
i t
t
j

The above scenario-based problem constitutes an MILP problem, which can be solved

to global optimality by employing the branch-and-cut methods implemented in

optimization solvers like CPLEX and GUROBI. In this sense, obtaining the lower

bound boils down to solving a computationally efficient STMARMILP, in which

critical uncertainty realizations are identified through the proposed solution algorithm.

We quantitatively assess the solution quality of different algorithms using the relative

UB  LB
optimality gap defined by, where UB denotes the upper bound, and LB
0.5 UB  LB 

represents the lower bound obtained via the STMARMILP. Note that this gap is an

indication of solution quality: a small gap implies a near-optimal solution, while a large

gap suggests a significant loss of optimality. Before closing this section, we summarize

the inequality relationship between bounds in the following theorem.

Theorem 5.3 For any specific problem instance of MARMILPs, the following

inequalities (5.44) hold.

v S  v*  vTPB  v ADR (5.44)

where νS, ν*, νTPB, and νADR present the optimal values of STMARMILP, MARMILP,

TARMILP, and the affinely adjustable robust counterpart.

185
Proof. Since the scenario set is a subset of the uncertainty set ( U  U ), the scenario-

tree counterpart STMARMILP is a relaxation of the original multistage ARO problem

by satisfying only a subset of constraints. Hence, the objective value of STMARMILP

provides a lower bound for the original multistage ARO problem (νS ≤ ν*).

In the original MARMILP, the recourse decisions are general functions of uncertainty.

In both the affine control policy and the proposed transformation proximal bundle

algorithm, all or some of the recourse variables are restricted to a fixed functional form

of uncertainty realizations, thus providing upper bounds to the optimal value of the

original multistage ARO problems (ν* ≤ νADR and ν* ≤ νTPB). Additionally, any feasible

solution of the affinely adjustable robust counterpart is also feasible for the TARMILP

due to the fact that state decisions are restricted to affine control policy and control

decisions are general functions of uncertainty in the proposed solution algorithm.

Therefore, we have νADR ≤ νTPB. □

Remark 5.6 The proposed algorithm provides an upper bound on ν* for the following

 
reason. First, the upper bound is based on F z k , instead of the lower piecewise linear

 
approximation Fk z k . Second, although we add the feasibility cuts on-the-fly into

problem (MP) to obtain a candidate solution xˆ k , this candidate solution must be feasible

in order to become a stability center. This is because if it is not feasible, i.e. its

 
corresponding objective value of (FP)  xˆ k  0 , the condition for the serious step

 
(Line 17 of the algorithm pseudocode) cannot be met due to F xˆ k   .

186
5.5 Applications

5.5.1 Application 1: Robust optimal inventory control

In this section, we apply the proposed transformation-proximal bundle algorithm to

robust finite-horizon optimal inventory control problems. Extensive comparisons

between the affine control policy [106], the piecewise affine control policy [396, 405],

and the proposed multi-to-two transformation-based algorithm are made in terms of

solution quality and computational efficiency. All optimization problems are solved

with CPLEX 12.8.0, implemented on a computer with an Intel (R) Core (TM) i7-6700

CPU @ 3.40 GHz and 32 GB RAM. The optimality tolerance for CPLEX 12.8.0 is set

to be 0. The tolerance for expected decrease δtol is set to be 0.1.

Inventory control plays a critical role in improving customer services as well as in

boosting profits. Due to the market fluctuations, customer demands are inevitably

subject to uncertainty [406-408]. These uncertainties are typically revealed sequentially

over the entire time horizon. In this application, we consider a single-item multiperiod

robust optimal inventory control problem under demand uncertainty [381, 409, 410]. In

such a problem, a decision maker needs to serve customer demand as far as possible at

a minimum cost. There are two types of orders, standard orders and express orders, that

can be placed after knowing uncertainty realization at the beginning of each period. A

standard order of product arrives at the end of the time period, while the costlier express

orders arrive immediately. Any excess inventories are stored in a warehouse and incur

the holding cost. If customer demands are backlogged, the backlog cost should be paid.

187
The robust finite-horizon optimal inventory control problem under demand uncertainty

is shown as follows. The objective is to minimize the total cost, which is given in (5.45)

. The total cost includes ordering, holding, and backlog costs incurred over all time

periods. The constraints can be classified into inventory dynamics constraints (5.46),

ordering bound constraints (5.47)-(5.48), and real-valued mapping constraints (5.49).

Uncertainty set for demand is given in (5.50).

c1 xt  ξ t   c2 yt  ξ t  
T
 
min max    (5.45)
xt , yt  , I t   ξU
t 1   cH  I t  ξ    cB   I t  ξ   
t t
  

     
s.t. I t ξt  It 1 ξt 1  xt 1 ξt 1  yt ξt  t ξ U , t   (5.46)

xt  ξt   0, ξ U , t (5.47)

yt  ξt   0, ξ U , t (5.48)

It  ξt  , xt  ξt  , yt  ξt   , t (5.49)

 T
max max 
U  ξ lt  t  ut , t ,  t  2
 
2 
(5.50)
 t 1

where xt is a decision variable for standard order of the product at the beginning of time

period t, yt denotes a decision on express order of the product at the beginning of time

period t, and It is the inventory level at time period t. Moreover, ξt denotes the uncertain

demand at time period t, and ξt=[ξ1′, …, ξt′]′ represents the uncertainty realizations

available up to time period t. T denotes the total length of the time horizon. c1 and c2

represent the unit costs of standard and express orders, respectively. cH and cB are the

unit holding and unit backlogging costs, respectively. In the uncertainty set, the lower

188
and upper bounds of uncertain product demand are denoted by lt and ut, respectively.

Constant  max represents the highest possible level of product demand for each time

period. The operator [.]+ in (5.45) represents max( . ,0), and can be tackled by the

following epigraph reformulations (5.51) and (5.52).

tH  ξt   It  ξt  , tH  ξt   0 (5.51)

tB  ξt    It  ξt  , tB  ξt   0 (5.52)

By employing the variable substitution Iˆt  I t  xt , the above robust finite-horizon

optimal inventory control problem assumes the same formulation as the multistage ARO

problem. As a result, Iˆt is a state decision variable, while xt, yt, ηtH and ηtB are control

variables. The multi-to-two transformation scheme is utilized by applying the affine

control policy to the state decision variables.

In this application, 25 instances are randomly generated to compare the performances

of different control strategies. The number of time periods T is set to be 5. The initial

inventory of the product is assumed to be zero. The unit costs for standard order, express

order, backlog, and holding are chosen randomly following the uniform distributions:

c1~Unif(0, 5), c2~Unif(5, 10), cB ~Unif(0, 10), and cH ~Unif(0, 5). Lower and upper

bounds of the product demand are generated according to the following distributions: lt

~Unif(0, 15) and ut ~Unif(75, 100). Note that the notation of Unif denotes the uniform

distribution. The highest value of product demand  max is set to be 100.

The computational results are summarized in Appendix A. For each problem instance,

UB  LB
the relative gap is calculated as . Note that LB is the lower bound
0.5 UB  LB 

189
obtained using the proposed scenario-tree-based lower bounding technique, so it is the

same for different control policies in a specific instance. Accordingly, a large value of

the relative gap implies a high value of UB, which means a large loss of optimality

incurred by the corresponding control policy. In the application, the affine control policy

suffers from severe suboptimality. Its largest relative gap can reach as high as 53.43%,

and the average relative gap is 25.72%. By contrast, the control policy determined by

the proposed transformation-proximal bundle algorithm outperforms against both the

affine control policy and the piecewise affine control policy consistently across all the

problem instances. More specifically, the control policy resulting from the proposed

algorithm has a relative gap of 1.33% on average, while its highest relative gap is merely

4.27%. Additionally, it can yield near-optimal control strategies for Instances 13, 16, 17

and 21 with relative gaps below 0.02%. In terms of computational time, the robust

optimal inventory control problems using affine control policy and piecewise affine

control policy are more efficient to solve compared to the proposed approach, since they

involve solving only one linear programming problem. However, the proposed

approach solves the robust optimal inventory control problem instances within only 20.8

seconds on average. Note that the average computational times for solving the

reformulated (SUP) and (FP) are 0.41 seconds and 0.25 seconds, respectively. It is worth

noting that the inventory plan is typically made in a large time scale of days and weeks

[408]. Therefore, the computational time difference between affine control policy and

the proposed approach is insignificant. The solution quality in terms of optimality gap

is paramount, as long as the computational time is within a reasonable range. In this

190
sense, it provides an attractive trade-off between solution quality and computational

tractability.

To better understand the inventory management decisions, we present the results of a

single problem instance (Instance 13) determined by the affine control policy and the

control policy determined by the proposed algorithm in Figure 26. In this particular

instance, we show the inventory profiles over the entire time horizon. From the figure,

we can observe that the affine control policy tends to keep much higher inventory levels

of the product than the proposed transformation-proximal bundle method does.

Specifically, the inventory levels at period 3 and period 4 determined by the affine

control policy are more than double those of the proposed control policy, respectively.

As a result, the excessive inventory incurs additional costs, rendering the induced robust

optimal inventory control policy over-conservative.

Figure 26. Inventory profiles determined by different control policies under the worst-

case uncertainty realization.

191
We present the cost breakdowns determined by the affine control policy and the control

policy determined by proposed algorithm in Figure 27. From the pie charts, we can

observe that a major part of the total cost comes from ordering standard delivery of

products for both control policies. Although express orders can more promptly serve the

customer demands, it is too expensive to be adopted by both control policies. Notably,

the percentage of holding cost determined by the affine control policy is 14% higher

than that of the proposed one due to their different inventory levels.

We further compare the proposed algorithmic framework with a data-driven approach

that samples uncertainty scenarios from the uncertainty set following the uniform

distribution [404, 411]. It is worth noting that the data-driven approach, which relies on

scenario sampling, only provides a lower bound of the original multistage ARO problem

due its relaxation. To guarantee a fair comparison, we employ the STMARMILP in the

proposed framework, since it also provides a lower bound. Additionally, the same

number of uncertainty scenarios is used in the data-driven approach as that of

STMARMILP. We present the computational results in Figure 28, where the X-axis

denotes the index of instances and the Y-axis represents the lower bounds of total cost

in multiperiod inventory control. As can be observed from the figure, the proposed

approach compares favorably against the data-driven approach by generating tighter

lower bounds in each instance.

192
Figure 27. Cost breakdowns determined by (a) the affine control policy, (b) the

proposed control policy.

Figure 28. Lower bounds of multi-period inventory cost determined by the proposed

method and the data-driven approach.

To investigate the impacts of the number of uncertainty scenarios on the data-driven

approach, we select Instance 1 and plot lower bound ratios and computational times

193
under different numbers of scenarios in Figure 29. Note that the lower bound ratio is

defined as the ratio between the lower bounds generated by the data-driven approach

and STMARMILP. From the figure, we can see that the computational time of the data-

driven approach increases significantly as the total number of scenarios increases.

Although its corresponding lower bound becomes tighter when using more uncertainty

scenarios, the data-driven approach consumes 27.1 times more computational time than

the proposed method and still generates a less tight lower bound (lower bound ratio is

0.932) when the total number of scenarios is 10,000.

Figure 29. The impacts of the number of uncertainty scenarios on the generated lower

bound of the original multistage ARO problem and computational time in the data-

driven approach.

To further investigate the performance of the proposed algorithm under different

number of time stages, we implement computational experiments with T=10 and T=15.

194
For each value of T, 25 randomly generated robust optimal inventory control instances

are used to evaluate and compare different control policies as before. The computational

results for each problem instance with T=10 and T=15 are presented in Table A2 and

Table A3 of Appendix A, respectively. From these tables, we can see that the solution

qualities of both the affine control policy and piecewise affine control policy deteriorate

remarkably as the number of time stages increases. Specifically, their average relative

gaps soar significantly from 25.72% to 34.88% when the value of T changes from 5 to

15, while the largest relative gap changes from 53.43% to 111.20%. In stark contrast,

the average gap of the proposed control policy is increased by only 0.35%, which

demonstrates its consistent performance across different numbers of time periods.

Notably, the largest relative gap of the proposed solution algorithm becomes 6.29%

from 4.27% when the value of T increases from 5 to 15. It is worth mentioning that the

proposed control policy compares favorably against the other two control policies in all

problem instances. Moreover, the average computational time of the proposed algorithm

increases from 20.8s to 493.2s, which is still a reasonable amount of time for inventory

control problems. Since the obtained solution from the proposed algorithm is a stability

center, it indicates the feasibility of the obtained inventory control policy according to

Remark 5.6.

5.5.2 Application 2: Process network planning

In this section, a multi-period strategic planning problem of process networks is

presented to demonstrate the applicability of the proposed multi-to-two transformation-

based algorithm. Chemical manufacturers often build integrated chemical complexes

195
that consist of interconnected processes and various chemicals [412]. The objective of

the process network planning is to maximize the net present value (NPV) over the

strategic planning horizon. The considered chemical process network, which is shown

in Figure 30, consists of five chemicals (A-E) and three processes (P1, P2, and P3). In

the figure, chemicals A-C represent raw materials, which can be either purchased from

suppliers or produced by certain processes. For example, Chemical C can be either

manufactured by process P3 or purchased from a supplier. Chemicals D and E are final

products, which are sold to the markets. In this application, we consider five time

periods over the 10-year planning horizon, and the duration of each period is two years.

It is assumed that all the processes do not have initial capacities, and they can be

installed at the beginning of the planning horizon. For the demand uncertainty set,

d 0jt  100 ,  jt  0.85 , and   0.6 . The mass balance relationships are given in Table

6.

Figure 30. The schematic of a small-scale process network.

196
Table 6. Mass balance relationships for different processes.

Process Mass balance relationship

Process 1 0.63 A + 0.58 C → E

Process 2 0.64 A → D

Process 3 1.25 B→0.90 C + E

The process network planning determines the purchase levels of feedstocks, sales of

final products, capacity expansion, and production profiles of processes at each time

period, in order to maximize the NPV over the strategic planning horizon. The

multistage ARO model for process network planning under demand uncertainty is

formulated as follows. The objective is to maximize the NPV, which is given in (5.53).

The constraints can be classified into capacity expansion constraints (5.54)-(5.55), mass

balance constraints (5.56), production level constraints (5.57), supply and demand

constraints (5.58)-(5.59), non-negativity constraints (5.60)-(5.64), and integrity

constraints (5.65). The data-driven uncertainty set of demand is defined in (5.66)

following the literature [413]. The “here-and-now” decision is binary decision Yit, while

all the other continuous decisions constitute the “wait-and-see” decisions. Based on

definitions of state and control decision variables, Qit is the adjustable state decision,

while QEit, Wit, Pjt, and Sjt are the adjustable control decisions. A list of indices/sets,

parameters and variables is given in the Nomenclature section.

197

max min  jt  S jt  dt    c1it  QEit  dt    c 2it  Yit
QEit ,Qit ,Yit , dU
Wit , Pjt , S jt  j t i t i t
(5.53)

 c3it Wit  d    c 4 jt  Pjt  d  
t t

i t j t 

s.t. qeitL  Yit  QEit  dt   qeitU  Yit , d U , i, t (5.54)

Qit  dt   Qit 1  dt   QEit  dt  , d U , i, t (5.55)

Pjt  dt     ij Wit  dt   S jt  dt   0, d U , j, t (5.56)


i

Wit  dt   Qit  dt  , d U , i, t (5.57)

Pjt  dt   su jt , d U , j, t (5.58)

S jt  dt   d jt , d U , j, t (5.59)

QEit  dt   0 d U , i, t (5.60)

Qit  dt   0 d U , i, t (5.61)

Pjt  dt   0, d U , j, t (5.62)

S jt  dt   0, d U , j, t (5.63)

Wit  dt   0, d U , i, t (5.64)

Yit  0,1 , i, t (5.65)

 
U  d jt 1   jt  d 0jt  d jt  1   jt  d 0jt j, t ,  1     d
jt
0
jt   d jt  (5.66)
 j t j t 

198
The above multistage adaptive robust process network planning problem is

computationally intractable because all the “wait-and-see” decisions are expressed as

general functions of demand uncertainty. To this end, we employ the multi-to-two

transformation scheme and restrict only state decision Qit to follow affine control policy.

As a result, the above multistage robust process network planning problem is

transformed into a two-stage ARO problem. In contrast, the affine and piecewise affine

control policies restrict all the adjustable decisions Qit, QEit, Wit, Pjt, and Sjt to be affine

and piecewise affine functions of demand uncertainty realizations, respectively.

The computational results are provided in Table 7. In this application, the proposed

solution algorithm increases the NPV by 6.27% (from $121.2MM to $128.8MM)

compared with affine control policy. In terms of solution quality, the proposed solution

algorithm demonstrates a superior performance than the other two approaches and

generates a high-quality solution with a relative gap of 3.36%. Notably, the proposed

computational algorithm can solve this multistage ARO problem within merely 24.2

seconds, which is a reasonable amount of time given its high solution quality. It can be

concluded that the proposed multi-to-two transformation-based algorithm can provide

an attractive trade-off between solution quality and computational tractability. The

optimal design and planning decisions at time period 5 determined by the affine decision

rule method and the proposed solution method are shown in Figure 31 (a) and Figure

31 (b), respectively. In Figure 31, the optimal total capacities are displayed under

operating processes.

199
Table 7. Computational results of different methods in the process network planning

application.

The affine The Transformation-proximal Scenario-tree


control policy piecewise bundle algorithm problem
method control policy
method
Master Subproblem
problem
Binary decisions 15 15 15 130 15
Cont. decisions 12,213 14,588 1,952 181 2,282
Constraints 6,957 13,682 3,079 467 4,367
Max. NPV ($MM) 121.2 121.2 128.8 133.2
CPU time (s) 0.8 1.6 24.2 0.4

To illustrate the optimal capacity expansion activities, we present the capacity profiles

during the entire planning horizon determined by the affine decision rule method and

the developed solution algorithm in Figure 32 (a) and Figure 32 (b), respectively. As

can be observed from Figure 32 (a), Process 3 is expanded at the beginning of time

period 1, and it is further expanded at the second time period in the solution determined

by the affine decision rule approach. By contrast, Process 3 is not selected to be

expanded from time period 2 in the solution determined by the transformation-proximal

bundle algorithm. Additionally, the optimal total capacities of some processes

determined by the two solution methods are different. For example, the optimal total

capacity of Process 2 is 123.4 kt/y at the end of planning horizon determined by the

affine decision rule approach, while the corresponding capacity is 30.6 kt/y larger for

the developed solution method.

200
Figure 31. The optimal design and planning decisions at the end of the planning

horizon determined by (a) the affine decision rule method, and (b) the transformation-

proximal bundle algorithm.

201
Figure 32. Optimal capacity expansion decisions over the entire planning horizon

determined by (a) the affine decision rule method, and (b) the transformation-proximal

bundle algorithm.

The details on revenues and cost breakdown, including fixed investment cost, variable

investment cost, operating cost, and purchase cost, are shown in Figure 33. As can be

observed from the bar charts, the proposed approach generates $27.53MM higher

revenues than the conventional affine decision rule method, which demonstrates that the

proposed approach is less conservative by allowing the control decisions to be fully

202
adjustable to demand uncertainty realizations. From the pie charts in Figure 33, we can

see that more than 40% of the total cost comes from purchasing feedstock for both

network planning solution methods. In addition, the percentage of the variable

investment cost for the developed solution algorithm is 5% higher than that determined

by the affine decision rule method, because optimal process capacities determined by

the proposed transformation-proximal bundle method are larger in its optimal network

structure.

Figure 33. Revenues and cost break down determined by the affine decision rule

method and the transformation-proximal bundle algorithm.

Next, we consider a larger-scale petrochemical process network, which includes 10

chemicals, six processes, four suppliers, and six markets [412]. The detailed network

schematic is depicted in Figure 34, where specific chemical names are listed. Chemicals

A-D represent feedstock, which can be purchased from suppliers or manufactured by

203
certain processes. For instance, Chemical D (Nitric Acid) can be either purchased from

a supplier or produced by Process 3. Chemicals E-J are products, which are sold to

markets for earning revenue. This complex process network has such flexibility that

many manufacturing options are available. For example, Chemical F is a type of product

and can be manufactured by Process 1 and Process 3. Chemical B serves as a feedstock

to Process 3, Process 4, and Process 6. In this case study, we consider four time periods

over the planning horizon, and the duration of each time period is two years. It is

assumed that all processes can be installed at the beginning of time period 1. For the

demand uncertainty set, d 0jt  200 ,  jt  0.8 , and   0.6 .

Figure 34. The schematic of a large-scale petrochemical process network where

chemical names are listed.

204
Unlike the affine decision rule method which restricts all adaptive decisions to be affine

functions of demand uncertainty realizations, the proposed solution algorithm allows

for full adjustability in the local decisions, thus boosting the NPV by 4.43% (increases

from $646.2MM to $674.8MM) for nominal demand uncertainty realizations. This

computational result illustrates that the developed solution algorithm is capable of

generating superior planning decisions compared with conventional decision rule

method. Figure 35 shows more details on NPV results, including revenue and cost break

down in each time period, for the affine decision rule method (represented by ADR for

notation brevity in the figure) and the transformation-proximal bundle algorithm

(denoted by TPB in the figure), respectively. We can observe from Figure 35 that

investment costs occupy 48.6% of total costs in the first time period for both solution

methods. This result can be well explained by the fact that most chemical processes are

expanded or built within the first time period. During the last three time periods, the

majority of costs come from process operation and feedstock purchase. Notably, the

revenues determined by the optimal planning decisions of the proposed solution method

are 3.27%, 5.02%, and 4.48% higher, compared with the affine decision rule approach,

in Period 2, Period 3, and Period 4, respectively.

205
Figure 35. Revenues and cost break down at each time period determined by the affine

decision rule method (denoted by ADR in the figure) and the transformation-proximal

bundle algorithm (denoted by TPB in the figure).

To illustrate the optimal capacity expansion activity, we present the capacity profiles

during the entire planning horizon for the proposed approach in Figure 36. From Figure

36, we can see that a total of five processes are selected to be built at time period 1 in

the optimal process network determined by the proposed approach. By comparing

capacity expansions of different processes, we can conclude that the optimal expansion

frequency of Process 4 is the highest among all processes. This is partially ascribed to

the fact that a total of three products (Chemical J, Chemical H, and Chemical I) are

closely connected to Process 4. Additionally, the optimal total capacity of Process 4

reaches to 180.0 kt/y at the end of the planning horizon.

206
Figure 36. Optimal capacity expansion decisions over the entire planning horizon

determined the transformation-proximal bundle algorithm.

In Figure 37 (a) and Figure 37 (b), we further present the optimal purchase levels of

feedstock determined by the conventional affine decision rule method and the proposed

solution algorithm, respectively. From the bar charts, we can observe a similar trend for

both solution methods that the purchase level of a feedstock increases as the time period

evolves. This is because the feedstock purchase gets enlarged to accommodate

manufacturing need when process capacities expand over the planning horizon. By

comparing Figure 37 (a) and Figure 37 (b), a notable difference lies in that the total

purchase amount of Chemical B determined by the proposed approach is 6.99% larger

than that of the affine decision rule method. In addition, the proposed solution algorithm

increases the total purchase amount of Chemical A during the entire planning horizon

by 2.43% compared against the conventional decision rule method.

207
Figure 37. Optimal feedstock purchase at each time stage determined by (a) the affine

decision rule method, and (b) the transformation-proximal bundle algorithm.

To take a closer look at the optimal adjustable decisions on product sale, we present the

results for the affine decision rule method and the proposed approach as spider charts

shown in Figure 38 (a) and Figure 38 (b), respectively. Among all the products

(Chemicals E-J), there exist significant increases in sale amount of Chemical G at Period

2, Period 3, and Period 4. The sale level of Chemical G could reasonably be expected

to rise when the corresponding feedstock to Process 6 increases (as shown in Figure 37).

Compared with the optimal sale level of Chemical E in Figure 38 (a), the optimal sale

amount of Chemical E determined by the proposed approach decreases slightly by

2.01% at Period 2, while increases remarkably by 7.55% at Period 3 and by 7.55% at

Period 4.
208
Figure 38. Spider charts showing optimal sale quantities (kt/y) of final products at

each time stage determined by (a) the affine decision rule method, and (b) the

transformation-proximal bundle algorithm.

5.6 Summary

In this work, a novel transformation-proximal bundle algorithmic framework was

proposed for solving a broad class of MARMILP problems efficiently. We first

proposed a multi-to-two transformation scheme, in which only state decision variables

were restricted to be affine functions. By employing the proposed scheme, the original

multi-stage ARO problem was proved to be transformed into an equivalent two-stage

ARO problem. The proximal bundle algorithm was further developed as an efficient

global optimization algorithm of the resulting two-stage ARO problem. Since the local

decisions were exempt from the affine decision rule restriction, the proposed solution

algorithm sacrificed less optimality for the computational tractability compared with

conventional decision rule methods. The computational results showed that the

209
proposed transformation-proximal bundle algorithm significantly outperformed the

conventional solution methods in terms of solution quality.

5.7 Appendix: Tables of computational results in Application 1

Table 8. Computational performances of different solution algorithms in the

multistage robust inventory control problem under demand uncertainty for T=5.

Instance The affine control The piecewise affine Transformation-


No. policy method control policy method proximal bundle algorithm
Time Relative Time Relative Time Relative
(s) Gap (%) (s) Gap (%) (s) Gap (%)
1 0.2 53.43 0.2 53.43 14.4 4.14
2 0.2 12.67 0.2 12.67 24 0.25
3 0.2 20.22 0.2 20.22 12.2 0.37
4 0.2 29.98 0.2 29.98 18.8 4.27
5 0.2 32.70 0.2 32.70 18 2.65
6 0.2 14.63 0.2 14.63 17.9 2.54
7 0.2 35.95 0.2 35.95 71.1 0.26
8 0.2 6.84 0.2 6.84 20.7 0.04
9 0.2 7.56 0.2 7.56 16.9 1.14
10 0.2 37.58 0.2 37.58 14.7 3.49
11 0.2 33.69 0.2 33.69 15.4 0.33
12 0.2 32.72 0.2 32.72 19.5 0.26
13 0.2 18.88 0.2 18.88 17.7 0.01
14 0.2 36.89 0.2 36.89 17.9 0.67
15 0.3 26.60 0.2 26.60 11.8 1.96
16 0.2 9.27 0.2 9.27 13.1 0.02
17 0.2 42.24 0.2 42.24 20.6 0.02
18 0.2 9.27 0.2 9.27 42.7 0.07
19 0.2 35.00 0.2 35.00 19.1 0.28
20 0.3 27.00 0.2 27.00 13.4 3.70
21 0.2 36.82 0.2 36.82 24.1 0.02
22 0.2 18.57 0.3 18.57 15.8 3.63
23 0.2 25.71 0.2 25.71 21.5 0.46
24 0.2 2.50 0.2 2.50 17.6 0.34
25 0.2 36.27 0.2 36.27 21.3 2.33

210
Table 9. Computational performances of different solution algorithms in the

multistage robust inventory control problem under demand uncertainty for T=10.

Instance The affine control The piecewise affine Transformation-


No. policy method control policy method proximal bundle algorithm
Time Relative Time Relative Time Relative
(s) Gap (%) (s) Gap (%) (s) Gap (%)
1 0.3 12.21 0.3 12.21 106.9 0.00
2 0.2 94.38 0.3 94.38 49.2 5.61
3 0.3 74.34 0.3 74.34 64.2 2.14
4 0.3 27.81 0.3 27.81 45.4 0.16
5 0.3 64.86 0.3 64.86 28.4 3.28
6 0.4 2.29 0.4 2.29 95.6 0.16
7 0.3 45.81 0.3 45.81 275.5 4.03
8 0.3 16.86 0.3 16.86 42.6 0.55
9 0.2 12.83 0.3 12.83 109.3 3.17
10 0.3 59.16 0.4 59.16 78.9 1.88
11 0.2 55.61 0.3 55.61 55.7 2.32
12 0.3 50.81 0.3 50.81 107.3 1.12
13 0.3 99.36 0.4 99.36 31.6 4.31
14 0.3 12.52 0.3 12.52 112.7 2.22
15 0.3 12.94 0.4 12.94 21.7 0.17
16 0.3 14.36 0.4 14.36 52.6 0.15
17 0.3 67.94 0.3 67.94 261.7 4.42
18 0.3 19.73 0.3 19.73 52.4 1.15
19 0.3 49.75 0.3 49.75 92.9 2.64
20 0.3 14.82 0.3 14.82 33.7 0.24
21 0.3 2.28 0.3 2.28 26.0 0.22
22 0.3 17.29 0.3 17.29 73.9 0.18
23 0.3 72.57 0.4 72.57 55.1 0.73
24 0.3 20.11 0.3 20.11 188.1 0.67
25 0.3 1.02 0.3 1.02 21.6 0.03

211
Table 10. Computational performances of different solution algorithms in the

multistage robust inventory control problem under demand uncertainty for T=15.

Instance The affine control The piecewise affine Transformation-


No. policy method control policy method proximal bundle algorithm
CPU Relative CPU Relative CPU Relative
(s) Gap (%) (s) Gap (%) (s) Gap (%)
1 0.6 33.46 1.0 33.46 188.8 1.30
2 0.7 85.11 1.1 85.11 70.5 4.01
3 0.7 111.20 0.9 111.20 36.8 6.04
4 0.6 8.85 0.8 8.85 383.8 0.21
5 0.6 18.29 1.1 18.29 47.2 0.19
6 0.6 4.98 1.9 4.98 56.7 1.87
7 0.6 69.21 1.0 69.21 127.7 2.92
8 0.6 6.22 0.7 6.22 457.3 1.29
9 0.6 19.06 0.9 19.06 32.7 0.21
10 0.6 42.42 0.7 42.42 103.9 1.45
11 0.8 27.12 1.0 27.12 4450.5 1.34
12 0.7 62.52 1.3 62.52 503.9 4.06
13 0.6 26.68 1.0 26.68 1061.6 0.84
14 0.8 39.73 0.9 39.73 57.3 1.03
15 0.6 43.04 0.9 43.04 137.5 6.29
16 0.7 26.40 1.0 26.40 569.4 1.67
17 0.8 43.50 1.0 43.50 84.7 1.87
18 0.6 17.75 1.0 17.75 42.5 0.08
19 0.6 25.49 0.7 25.49 959.6 1.31
20 0.7 24.17 1.0 24.17 1853.7 2.18
21 0.6 35.44 0.8 35.44 64.9 0.43
22 0.8 50.92 1.0 50.92 203.4 0.53
23 0.7 4.68 1.1 4.68 639.2 0.60
24 0.6 9.10 1.0 9.10 117.3 0.22
25 0.7 36.62 1.1 36.62 78.7 0.06

5.8 Nomenclature

Robust optimal inventory control

Sets/indices

t index of time periods

Parameters

c1 unit cost of standard order

c2 unit cost of express order

212
cB unit backlog cost

cH unit holding cost

lt lower bound of product demand at the beginning time period t

ut upper bound of product demand at the beginning time period t

ξt uncertain product demand at the beginning time period t

 max maximum level of product demand

Continuous variables

It inventories of products at time period t

xt stand order of product at the beginning time period t

yt express order of product at the beginning time period t

Process network planning

Sets/indices

I set of processes indexed by i

J set of chemicals indexed by j or k

T set of time periods indexed by t or n

Parameters

c1it variable investment cost for process i in time period t

c2it fixed investment cost for process i in time period t

c3it unit operating cost for process i in time period t

c4it purchase price of chemical j in time period t

djt demand of chemical j in time period t

qeitL lower bound for capacity expansion of process i in time period t

qeitU upper bound for capacity expansion of process i in time period t


213
sujt supply of chemical j in time period t

vjt sale price of chemical j in time period t

κij mass balance coefficient for chemical j in process i

Binary variables

Yit binary variable that indicates whether process i is chosen for expansion

in time period t

Continuous variables

Pjt purchase amount of chemical j in time period t

Qit total capacity of process i in time period t

QEit capacity expansion of process i in time period t

Sjt sale amount of chemical j in time period t

Wit operation level of process i in time period t

214
CHAPTER 6
CONCLUSIONS

The data-driven optimization under uncertainty has been investigated with emphasis on

four main aspects, namely the two-stage adaptive distributionally robust optimization,

deep-learning-based ambiguous chance constrained optimization, a learning-while-

optimizing framework, and the algorithm design for large-scale multistage robust

optimization problems, in this dissertation. A series of novel contributions are made on

data-driven optimization modeling frameworks, efficient solution algorithms, along

with applications. We believe that the research in this dissertation lays a solid foundation

for future studies in this area. Additionally, the proposed frameworks and solution

algorithms are general enough to deal with a variety of applications on optimization

under uncertainty, such as those for supply chain management, energy systems, and

process control. The summary of the dissertation as well as future research directions

are provided in the following.

We propose a novel data-driven Wasserstein distributionally robust optimization model

for hedging against uncertainty in the optimal biomass with agricultural waste-to-energy

network design under uncertainty. Instead of assuming perfect knowledge of probability

distribution for uncertain parameters, we construct a data-driven ambiguity set of

candidate distributions based on the Wasserstein metric, which is utilized to quantify

their distances from the data-based empirical distribution. Equipped with this ambiguity

set, the two-stage distributionally robust optimization model not only accommodates

the sequential decision making at design and operational stages, but also hedges against

the distributional ambiguity arising from finite amount of uncertainty data. A solution

215
algorithm is further developed to solve the resulting two-stage distributionally robust

mixed-integer nonlinear program. To demonstrate the effectiveness of the proposed

approach, we present a case study of a biomass with agricultural waste-to-energy

network including 216 technologies and 172 compounds. Computational results show

that the data-driven Wasserstein distributionally robust optimization approach has a

better out-of-sample performance in terms of a 5.7% lower average cost and a 37.1%

smaller cost standard deviation compared with the conventional stochastic

programming method.

We propose a novel deep learning based ambiguous joint chance constrained ED

framework for high penetration of renewable energy. By leveraging a deep GAN, an f-

divergence-based ambiguity set of wind power distributions is constructed as a ball in

the probability space centered at the distribution induced by a generator network.

Specifically, wind power data are utilized to train f-GAN, in which its discriminator

network criticizes the performance of the generator network in terms of f-divergence.

Consequently, the proposed framework closely links the training objective of deep

learning with the characterization of ambiguity set via the same type of divergence.

Additionally, the GAN is well suited for capturing the complicated temporal and spatial

correlations among renewable energy sources. Based upon this ambiguity set, a data-

driven joint chance constrained ED model is developed to hedge against distributional

uncertainty present in multiple constraints regarding wind power utilization. To

facilitate its solution process, the resulting distributionally robust chance constraints are

equivalently reformulated as ambiguity-free chance constraints, which are further

tackled using a scenario approach. This scenario approach leverages the sampling

216
efficiency of the generator network due to the feedforward nature of neural networks.

Theoretical a priori bound on the required number of synthetic wind power data

generated by f-GAN is explicitly derived for the multi-period ED problem to guarantee

a predefined risk level. By exploiting the ED problem structure, a prescreening technique

is employed to greatly boost both computational and memory efficiencies. The

effectiveness and scalability of the proposed approach are demonstrated through the six-

bus and IEEE 118-bus systems. Computational results show that the proposed approach

is more cost-effective compared against the conventional distributionally robust chance

constrained optimization method.

We investigate the problem of designing data-driven stochastic MPC for linear time-

invariant systems under additive stochastic disturbance, whose probability distribution

is unknown but can be partially inferred from data. We propose a novel online learning-

based risk-averse stochastic MPC framework in which CVaR constraints on system

states are required to hold for a family of distributions called an ambiguity set. The

ambiguity set is constructed from disturbance data by leveraging a Dirichlet process

mixture model that is self-adaptive to the underlying data structure and complexity.

Specifically, the structural property of multimodality is exploited, so that the first and

second-order moment information of each mixture component is incorporated into the

ambiguity set. A novel constraint tightening strategy is then developed based on an

equivalent reformulation of distributionally robust CVaR constraints over the proposed

ambiguity set. As more data are gathered during the runtime of controller, the ambiguity

set is updated online using real-time disturbance data, which enables the risk-averse

stochastic MPC to cope with time-varying disturbance distributions. The employed

217
online variational inference algorithm obviates learning all collected data from scratch,

and therefore the proposed MPC is endowed with the guaranteed computational

complexity of online learning. The guarantees on recursive feasibility and closed-loop

stability of the proposed MPC are established via a safe update scheme. Numerical

examples are used to illustrate the effectiveness and advantages of the proposed MPC.

We develop a novel transformation-proximal bundle algorithm for MARMILPs. By

partitioning recourse decisions into state and control decisions, the proposed algorithm

applies affine control policy only to state decisions and allows control decisions to be

fully adaptive to uncertainty. In this way, the MARMILP is proved to be transformed

into an equivalent two-stage ARO problem. The proposed multi-to-two transformation

remains valid for other types of causal control policies besides the affine one.

Importantly, this transformation scheme is general enough to be employed with any

two-stage ARO solution algorithms for MARMILPs, thus opening a new avenue for a

variety of multistage ARO solution algorithms. The proximal bundle method is

developed for the resulting two-stage problem. We theoretically show finite

convergence of the proposed algorithm with any positive tolerance. To quantitatively

assess solution quality, we develop a scenario-tree-based lower bounding technique.

The proposed generic approach can be applied to a variety of control problems, such as

constrained robust optimal control. A robust optimal inventory control application is

presented to demonstrate its effectiveness and computational scalability. In this

application, the affine disturbance-feedback control policy suffers from a severe

suboptimality with an average gap of 34.88%, while the proposed algorithm generates

near-optimal solutions with an average gap of merely 1.68%.

218
The future research directions include closed-loop data-driven optimization and the

data-driven optimization framework incorporating “prior” knowledge. The framework

of data-driven optimization under uncertainty could be considered as a “hybrid” system

that integrates the data-driven system based on machine learning to extract useful and

relevant information from data, and the model-based system based on the mathematical

programming to derive the optimal decisions from the information. Existing data-driven

optimization approaches adopt a sequential and open-loop scheme, which could be

further improved by introducing feedback steps from the model-based system to data-

driven system. A “closed-loop” data-driven optimization paradigm that explores the

information feedback to fully couple upper-stream machine learning and downstream

mathematical programming could be a more effective and rigorous approach. In

addition to uncertainty data, some available domain-specific knowledge or “prior”

knowledge could serve as another informative input to the data-driven system. Relying

solely on the data to develop the uncertainty model could unfavorably influence the

downstream mathematical programming. The prior knowledge depicts what the

decision maker knows about the uncertainty, and it can come in different forms. For

example, the prior knowledge could be the structural information of probability

distributions, upper and lower bounds of uncertain parameters or certain correlation

relationship among uncertainties. Incorporating such “prior” knowledge in the data-

driven optimization framework could be substantially useful and provides more reliable

results in the face of messy data.

Another future research direction on data augmentation driven optimization under

uncertainty deserves more efforts. Imbalanced volume of different uncertainty data

219
sources and a small data regime beget new challenges to the existing data-driven

decision making under uncertainty frameworks. The imbalance of datasets would lead

to deteriorating performance of data-driven optimization. Specifically, this data

imbalance has two main adverse effects. First, decision makers would lose much

information imbedded within the majority data class, if they synthesize both minority

and majority uncertainty data through down-sampling. Based on the research works in

this thesis, the inefficient use of uncertainty data information would negatively

influence the solution quality of data-driven optimization under uncertainty. Second, if

one builds data-driven uncertainty models for the minority and majority datasets

separately, correlation information between uncertainty data sources from different

subsystems is discarded. Additionally, the existing data-driven robust optimization

methods tend to suffer severely from the issue of “small data”, which has a direct impact

on uncertainty set construction. The small data regime could under-fit machine learning

models, thereby comprising the performance of data-driven optimization. Given more

and more brand new systems employed, this type of small data regime can be frequently

encountered in data-driven decision making under uncertainty.

The integration of data-driven robust optimization with data augmentation is a

promising general framework for coping with the limited amount of uncertainty data

and potentially improves upon the generalization performance of machine learning in

data-driven decision making. The key idea of a tentative methodology is to generate

uncertainty data from the existing uncertainty data. As these newly generated data are

totally unseen, an uncertainty model built from augmented data is more likely to have a

superior generalization property. Moreover, the data augmentation techniques can

220
increase the volume of minority data class, and therefore is well suited for addressing

the imbalanced uncertainty data issue. In the literature of machine learning, data

augmentation is becoming increasingly popular, and has witnessed various successful

applications [414, 415]. First, data augmentation is useful for data imbalance during

training. Second, real data could convey some private information and therefore using

artificial generated data is capable of protecting data privacy. One potential method for

data augmentation to use is resampling techniques. The resampled uncertainty data are

used to augment the minority data class and to make the overall dataset balanced.

Another promising way is employing deep learning, especially deep generative models,

to generate synthetic uncertainty data for the purpose of data augmentation. The

complicated and unseen data patterns can be potentially captured by the powerful deep

learning techniques, and are seamlessly incorporated into robust optimization. To be

more specific, a data-driven uncertainty set would be constructed from a hybrid use of

the majority dataset and the augmented minority dataset. Then, this data-driven

uncertainty set could be further integrated into dynamic robust optimization, which

holds a great promise to witness various applications in data-driven design, operation,

and control for uncertain systems.

221
REFERENCES

[1] L. T. Biegler and I. E. Grossmann, "Retrospective on optimization," Comput.


Chem. Eng., vol. 28, no. 8, pp. 1169-1192, 2004, doi:
https://doi.org/10.1016/j.compchemeng.2003.11.003.
[2] I. E. Grossmann and L. T. Biegler, "Part II. Future perspective on optimization,"
Comput. Chem. Eng., vol. 28, no. 8, pp. 1193-1218, 2004, doi:
https://doi.org/10.1016/j.compchemeng.2003.11.006.
[3] V. Sakizlis, J. D. Perkins, and E. N. Pistikopoulos, "Recent advances in
optimization-based simultaneous process and control design," Comput. Chem.
Eng., vol. 28, no. 10, pp. 2069-2086, 2004, doi:
https://doi.org/10.1016/j.compchemeng.2004.03.018.
[4] N. V. Sahinidis, "Optimization under uncertainty: State-of-the-art and
opportunities," Comput. Chem. Eng., vol. 28, no. 6–7, pp. 971-983, 2004.
[Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0098135403002369.
[5] M. L. Liu and N. V. Sahinidis, "Optimization in Process Planning under
Uncertainty," Ind. Eng. Chem. Res., vol. 35, no. 11, pp. 4154-4165, 1996/01/01
1996, doi: 10.1021/ie9504516.
[6] J. Acevedo and E. N. Pistikopoulos, "Stochastic optimization based algorithms
for process synthesis under uncertainty," Comput. Chem. Eng., vol. 22, no. 4, pp.
647-671, 1998, doi: http://dx.doi.org/10.1016/S0098-1354(97)00234-2.
[7] Z. Li and M. G. Ierapetritou, "Process scheduling under uncertainty: Review and
challenges," Comput. Chem. Eng., vol. 32, no. 4–5, pp. 715-727, 2008, doi:
http://dx.doi.org/10.1016/j.compchemeng.2007.03.001.
[8] A. Ben-Tal and A. Nemirovski, "Robust optimization – methodology and
applications," Math. Program., journal article vol. 92, no. 3, pp. 453-480, 2002,
doi: 10.1007/s101070100286.
[9] I. E. Grossmann, R. M. Apap, B. A. Calfa, P. García-Herreros, and Q. Zhang,
"Recent advances in mathematical programming techniques for the optimization
of process systems under uncertainty," Comput. Chem. Eng., vol. 91, pp. 3-14,
2016, doi: http://dx.doi.org/10.1016/j.compchemeng.2016.03.002.
[10] E. N. Pistikopoulos, "Uncertainty in process design and operations," Comput.
Chem. Eng., vol. 19, pp. 553-563, 1995, doi: http://dx.doi.org/10.1016/0098-
1354(95)87094-6.
[11] Y. Chen, Z. H. Yuan, and B. Z. Chen, "Process optimization with consideration
of uncertainties-An overview," Chin. J. Chem. Eng., vol. 26, no. 8, pp. 1700-
1706, Aug 2018, doi: 10.1016/j.cjche.2017.09.010.
[12] S. John Walker, "Big data: A revolution that will transform how we live, work,
and think," ed: Taylor & Francis, 2014.
[13] S. Yin and O. Kaynak, "Big Data for Modern Industry: Challenges and Trends,"
Proceedings of the IEEE, vol. 103, no. 2, pp. 143-146, 2015, doi:
10.1109/JPROC.2015.2388958.

222
[14] S. J. Qin, "Process data analytics in the era of big data," AIChE J., vol. 60, no.
9, pp. 3092-3100, 2014. [Online]. Available:
http://dx.doi.org/10.1002/aic.14523.
[15] V. Venkatasubramanian, "DROWNING IN DATA: Informatics and modeling
challenges in a data-rich networked world," AIChE J., vol. 55, no. 1, pp. 2-8,
2009. [Online]. Available: http://dx.doi.org/10.1002/aic.11756.
[16] J. Li et al., "Data-driven mathematical modeling and global optimization
framework for entire petrochemical planning operations," AIChE J., vol. 62, no.
9, pp. 3020-3040, 2016, doi: 10.1002/aic.15220.
[17] S. Yin, X. Li, H. Gao, and O. Kaynak, "Data-Based Techniques Focused on
Modern Industry: An Overview," IEEE Transactions on Industrial Electronics,
vol. 62, no. 1, pp. 657-667, 2015, doi: 10.1109/TIE.2014.2308133.
[18] V. Venkatasubramanian, "The promise of artificial intelligence in chemical
engineering: Is it here, finally?," AIChE J., vol. 65, no. 2, pp. 466-478, 2019, doi:
doi:10.1002/aic.16489.
[19] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning. MIT
press Cambridge, 2016.
[20] I. E. Grossmann, "Advances in mathematical programming models for
enterprise-wide optimization," Comput. Chem. Eng., vol. 47, pp. 2-18, 2012.
[Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0098135412002220.
[21] M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and
prospects," Science, vol. 349, no. 6245, pp. 255-260, 2015. [Online]. Available:
http://science.sciencemag.org/content/sci/349/6245/255.full.pdf.
[22] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," Nature, vol. 521, p. 436,
2015, doi: 10.1038/nature14539.
[23] D. Bertsimas, V. Gupta, and N. Kallus, "Data-driven robust optimization," arXiv
preprint arXiv:1401.0212, 2013.
[24] D. Bertsimas and A. Thiele, "Robust and data-driven optimization: Modern
decision-making under uncertainty," INFORMS tutorials in operations research:
models, methods, and applications for innovative decision making, pp. 95-122,
2006.
[25] B. A. Calfa, A. Agarwal, S. J. Bury, J. M. Wassick, and I. E. Grossmann, "Data-
Driven Simulation and Optimization Approaches To Incorporate Production
Variability in Sales and Operations Planning," Ind. Eng. Chem. Res., vol. 54, no.
29, pp. 7261-7272, 2015. [Online]. Available:
http://dx.doi.org/10.1021/acs.iecr.5b01273.
[26] B. A. Calfa, A. Agarwal, I. E. Grossmann, and J. M. Wassick, "Data-driven
multi-stage scenario tree generation via statistical property and distribution
matching," Comput. Chem. Eng., vol. 68, pp. 7-23, 2014. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S009813541400129X.
[27] B. A. Calfa, I. E. Grossmann, A. Agarwal, S. J. Bury, and J. M. Wassick, "Data-
driven individual and joint chance-constrained optimization via kernel
smoothing," Comput. Chem. Eng., vol. 78, pp. 51-69, Jul 2015, doi:
10.1016/j.compchemeng.2015.04.012.

223
[28] T. Campbell and J. P. How, "Bayesian nonparametric set construction for robust
optimization," in American Control Conference (ACC), 2015, 1-3 July 2015
2015, pp. 4216-4221, doi: 10.1109/ACC.2015.7171991.
[29] R. Jiang and Y. Guan, "Data-driven chance constrained stochastic program,"
Mathematical Programming, journal article vol. 158, no. 1, pp. 291-327, 2015,
doi: 10.1007/s10107-015-0929-7.
[30] R. Levi, G. Perakis, and J. Uichanco, "The data-driven newsvendor problem:
new bounds and insights," Operations Research, vol. 63, no. 6, pp. 1294-1306,
2015.
[31] Y. Zhang, Y. Feng, and G. Rong, "Data-driven chance constrained and robust
optimization under matrix uncertainty," Ind. Eng. Chem. Res., vol. 55, no. 21,
pp. 6145-6160, 2016. [Online]. Available:
http://dx.doi.org/10.1021/acs.iecr.5b04973.
[32] C. Ning and F. You, "Data-driven adaptive nested robust optimization: General
modeling framework and efficient computational algorithm for decision making
under uncertainty," AIChE J., vol. 63, no. 9, pp. 3790-3817, 2017, doi:
10.1002/aic.15717.
[33] C. Ning and F. You, "Data-Driven Adaptive Robust Unit Commitment under
Wind Power Uncertainty: A Bayesian Nonparametric Approach," IEEE Trans.
Power Syst., vol. (doi: 10.1109/TPWRS.2019.2891057), 2019, doi:
10.1109/TPWRS.2019.2891057.
[34] C. Ning and F. You, "Data-driven decision making under uncertainty integrating
robust optimization with principal component analysis and kernel smoothing
methods," Comput. Chem. Eng., vol. 112, pp. 190-210, 2018, doi:
https://doi.org/10.1016/j.compchemeng.2018.02.007.
[35] C. Shang and F. You, "A data-driven robust optimization approach to stochastic
model predictive control," Journal of Process Control, vol. 75, pp. 24-39, 2019.
[36] C. Shang, W.-H. Chen, A. D. Stroock, and F. You, "Robust Model Predictive
Control of Irrigation Systems with Active Uncertainty Learning and Data
Analytics," arXiv preprint arXiv:1810.05947, 2018.
[37] W. C. Rooney and L. T. Biegler, "Optimal process design with model parameter
uncertainty and process variability," AIChE J., vol. 49, no. 2, pp. 438-449, Feb
2003, doi: 10.1002/aic.690490214.
[38] P. M. Verderame, J. A. Elia, J. Li, and C. A. Floudas, "Planning and Scheduling
under Uncertainty: A Review Across Multiple Sectors," Ind. Eng. Chem. Res.,
vol. 49, no. 9, pp. 3993-4017, May 2010, doi: 10.1021/ie902009k.
[39] A. Mesbah, "Stochastic Model Predictive Control AN OVERVIEW AND
PERSPECTIVES FOR FUTURE RESEARCH," IEEE Control Systems
Magazine, vol. 36, no. 6, pp. 30-44, Dec 2016, doi: 10.1109/mcs.2016.2602087.
[40] A. Krieger and E. N. Pistikopoulos, "Model predictive control of anesthesia
under uncertainty," Comput. Chem. Eng., vol. 71, pp. 699-707, Dec 2014, doi:
10.1016/j.compchemeng.2014.07.025.
[41] D. W. Griffith, V. M. Zavala, and L. T. Biegler, "Robustly stable economic
NMPC for non-dissipative stage costs," Journal of Process Control, vol. 57, pp.
116-126, Sep 2017, doi: 10.1016/j.jprocont.2017.06.016.

224
[42] T. Y. Chiu and P. D. Christofides, "Robust control of particulate processes using
uncertain population balances," AIChE J., vol. 46, no. 2, pp. 266-280, Feb 2000,
doi: 10.1002/aic.690460207.
[43] N. V. Sahinidis, "Optimization under uncertainty: state-of-the-art and
opportunities," Comput. Chem. Eng., vol. 28, no. 6-7, pp. 971-983, Jun 2004,
doi: 10.1016/j.compchemeng.2003.09.017.
[44] I. E. Grossmann, R. M. Apap, B. A. Calfa, P. Garcia-Herreros, and Q. Zhang,
"Recent advances in mathematical programming techniques for the optimization
of process systems under uncertainty," Comput. Chem. Eng., vol. 91, pp. 3-14,
Aug 2016, doi: 10.1016/j.compchemeng.2016.03.002.
[45] J. R. Birge and F. Louveaux, Introduction to stochastic programming. Springer
Science & Business Media, 2011.
[46] J. R. Birge, "State-of-the-Art-Survey—Stochastic Programming: Computation
and Applications," INFORMS J. Comput., vol. 9, no. 2, pp. 111-133, 1997, doi:
10.1287/ijoc.9.2.111.
[47] A. Gupta and C. D. Maranas, "Managing demand uncertainty in supply chain
planning," Comput. Chem. Eng., vol. 27, no. 8, pp. 1219-1227, 2003, doi:
http://dx.doi.org/10.1016/S0098-1354(03)00048-6.
[48] R. M. Vanslyke and R. Wets, "L-SHAPED LINEAR PROGRAMS WITH
APPLICATIONS TO OPTIMAL CONTROL AND STOCHASTIC
PROGRAMMING," SIAM Journal on Applied Mathematics, vol. 17, no. 4, pp.
638-+, 1969, doi: 10.1137/0117061.
[49] G. Laporte and F. V. Louveaux, "THE INTEGER L-SHAPED METHOD FOR
STOCHASTIC INTEGER PROGRAMS WITH COMPLETE RECOURSE,"
Oper. Res. Lett., vol. 13, no. 3, pp. 133-142, Apr 1993, doi: 10.1016/0167-
6377(93)90002-x.
[50] F. Oliveira, V. Gupta, S. Hamacher, and I. E. Grossmann, "A Lagrangean
decomposition approach for oil supply chain investment planning under
uncertainty with risk considerations," Comput. Chem. Eng., vol. 50, pp. 184-195,
Mar 2013, doi: 10.1016/j.compchemeng.2012.10.012.
[51] S. Küçükyavuz and S. Sen, "An introduction to two-stage stochastic mixed-
integer programming," in Leading Developments from INFORMS Communities:
INFORMS, 2017, pp. 1-27.
[52] C. C. Caroe and R. Schultz, "Dual decomposition in stochastic integer
programming," Oper. Res. Lett., vol. 24, no. 1-2, pp. 37-45, Feb-Mar 1999, doi:
10.1016/s0167-6377(98)00050-9.
[53] S. Ahmed, M. Tawarmalani, and N. V. Sahinidis, "A finite branch-and-bound
algorithm for two-stage stochastic integer programs," Math. Program., vol. 100,
no. 2, pp. 355-377, Jun 2004, doi: 10.1007/s10107-003-0475-6.
[54] C. Li and I. E. Grossmann, "An improved L-shaped method for two-stage
convex 0–1 mixed integer nonlinear stochastic programs," Comput. Chem. Eng.,
vol. 112, pp. 165-179, 2018/04/06/ 2018, doi:
https://doi.org/10.1016/j.compchemeng.2018.01.017.

225
[55] M. G. Ierapetritou and E. N. Pistikopoulos, "DESIGN OF MULTIPRODUCT
BATCH PLANTS WITH UNCERTAIN DEMANDS," Comput. Chem. Eng.,
vol. 19, pp. S627-S632, 1995, doi: 10.1016/0098-1354(95)00130-t.
[56] A. Bonfill, M. Bagajewicz, A. Espuña, and L. Puigjaner, "Risk Management in
the Scheduling of Batch Plants under Uncertain Market Demand," Ind. Eng.
Chem. Res., vol. 43, no. 3, pp. 741-750, 2004, doi: 10.1021/ie030529f.
[57] A. Bonfill, A. Espuña, and L. Puigjaner, "Addressing Robustness in Scheduling
Batch Processes with Uncertain Operation Times," Ind. Eng. Chem. Res., vol.
44, no. 5, pp. 1524-1534, 2005, doi: 10.1021/ie049732g.
[58] J. Steimel and S. Engell, "Conceptual design and optimization of chemical
processes under uncertainty by two-stage programming," Comput. Chem. Eng.,
vol. 81, pp. 200-217, Oct 2015, doi: 10.1016/j.compchemeng.2015.05.016.
[59] P. Liu, E. N. Pistikopoulos, and Z. Li, "Decomposition Based Stochastic
Programming Approach for Polygeneration Energy Systems Design under
Uncertainty," Ind. Eng. Chem. Res., vol. 49, no. 7, pp. 3295-3305, 2010, doi:
10.1021/ie901490g.
[60] X. Peng, T. W. Root, and C. T. Maravelias, "Optimization-based process
synthesis under seasonal and daily variability: Application to concentrating solar
power," AIChE J., vol. (doi:10.1002/aic.16458), no. 0, doi:
doi:10.1002/aic.16458.
[61] J. Y. Gao and F. Q. You, "Deciphering and handling uncertainty in shale gas
supply chain design and optimization: Novel modeling framework and
computationally efficient solution algorithm," AIChE J., vol. 61, no. 11, pp.
3739-3755, Nov 2015, doi: 10.1002/aic.15032.
[62] F. Q. You, J. M. Wassick, and I. E. Grossmann, "Risk Management for a Global
Supply Chain Planning Under Uncertainty: Models and Algorithms," AIChE J.,
vol. 55, no. 4, pp. 931-946, Apr 2009, doi: 10.1002/aic.11721.
[63] B. H. Gebreslassie, Y. Yao, and F. You, "Design under uncertainty of
hydrocarbon biorefinery supply chains: Multiobjective stochastic programming
models, decomposition algorithm, and a Comparison between CVaR and
downside risk," AIChE J., vol. 58, no. 7, pp. 2155-2179, 2012, doi:
10.1002/aic.13844.
[64] L. J. Zeballos, C. A. Méndez, and A. P. Barbosa-Povoa, "Design and Planning
of Closed-Loop Supply Chains: A Risk-Averse Multistage Stochastic
Approach," Ind. Eng. Chem. Res., vol. 55, no. 21, pp. 6236-6249, 2016, doi:
10.1021/acs.iecr.5b03647.
[65] X. Li, A. Tomasgard, and P. I. Barton, "Nonconvex Generalized Benders
Decomposition for Stochastic Separable Mixed-Integer Nonlinear Programs," J.
Optim. Theory Appl., vol. 151, no. 3, pp. 425-454, Dec 2011, doi:
10.1007/s10957-011-9888-1.
[66] V. Gupta and I. E. Grossmann, "A new decomposition algorithm for multistage
stochastic programs with endogenous uncertainties," Comput. Chem. Eng., vol.
62, pp. 62-79, Mar 2014, doi: 10.1016/j.compchemeng.2013.11.011.

226
[67] V. Goel and I. E. Grossmann, "A Class of stochastic programs with decision
dependent uncertainty," Math. Program., vol. 108, no. 2-3, pp. 355-394, Jan
2007, doi: 10.1007/s10107-006-0715-7.
[68] A. Prékopa, "Stochastic programming, volume 324 of Mathematics and its
Applications," ed: Kluwer Academic Publishers Group, Dordrecht, 1995.
[69] A. Charnes and W. W. Cooper, "CHANCE-CONSTRAINED
PROGRAMMING," Manage. Sci., vol. 6, no. 1, pp. 73-79, 1959, doi:
10.1287/mnsc.6.1.73.
[70] P. Li, H. Arellano-Garcia, and G. Wozny, "Chance constrained programming
approach to process optimization under uncertainty," Comput. Chem. Eng., vol.
32, no. 1, pp. 25-45, 2008/01/01/ 2008, doi:
https://doi.org/10.1016/j.compchemeng.2007.05.009.
[71] B. L. Miller and H. M. Wagner, "CHANCE CONSTRAINED
PROGRAMMING WITH JOINT CONSTRAINTS," Oper. Res., vol. 13, no. 6,
pp. 930-&, 1965, doi: 10.1287/opre.13.6.930.
[72] X. Liu, S. Kucukyavuz, and J. Luedtke, "Decomposition algorithms for two-
stage chance-constrained programs," Math. Program., vol. 157, no. 1, pp. 219-
243, May 2016, doi: 10.1007/s10107-014-0832-7.
[73] M. A. Quddus, S. Chowdhury, M. Marufuzzaman, F. Yu, and L. K. Bian, "A
two-stage chance-constrained stochastic programming model for a bio-fuel
supply chain network," International Journal of Production Economics, vol. 195,
pp. 27-44, Jan 2018, doi: 10.1016/j.ijpe.2017.09.019.
[74] J. Luedtke and S. Ahmed, "A SAMPLE APPROXIMATION APPROACH FOR
OPTIMIZATION WITH PROBABILISTIC CONSTRAINTS," SIAM J. Optim.,
vol. 19, no. 2, pp. 674-699, 2008, doi: 10.1137/070702928.
[75] L. J. Hong, Y. Yang, and L. W. Zhang, "Sequential Convex Approximations to
Joint Chance Constrained Programs: A Monte Carlo Approach," Oper. Res., vol.
59, no. 3, pp. 617-630, May-Jun 2011, doi: 10.1287/opre.1100.0910.
[76] F. E. Curtis, A. Wachter, and V. M. Zavala, "A SEQUENTIAL ALGORITHM
FOR SOLVING NONLINEAR OPTIMIZATION PROBLEMS WITH
CHANCE CONSTRAINTS," SIAM J. Optim., vol. 28, no. 1, pp. 930-958, 2018,
doi: 10.1137/16m109003x.
[77] A. Nemirovski and A. Shapiro, "Convex approximations of chance constrained
programs," SIAM J. Optim., vol. 17, no. 4, pp. 969-996, 2006, doi:
10.1137/050622328.
[78] C. D. Maranas, "Optimization accounting for property prediction uncertainty in
polymer design," Comput. Chem. Eng., vol. 21, pp. S1019-S1024, 1997.
[Online]. Available: <Go to ISI>://WOS:A1997XD31300168.
[79] A. Gupta, C. D. Maranas, and C. M. McDonald, "Mid-term supply chain
planning under demand uncertainty: customer demand satisfaction and
inventory management," Comput. Chem. Eng., vol. 24, no. 12, pp. 2613-2621,
Dec 2000, doi: 10.1016/s0098-1354(00)00617-7.
[80] F. Q. You and I. E. Grossmann, "Stochastic Inventory Management for Tactical
Process Planning Under Uncertainties: MINLP Models and Algorithms," AIChE
J., vol. 57, no. 5, pp. 1250-1277, May 2011, doi: 10.1002/aic.12338.

227
[81] D. J. Yue and F. Q. You, "Planning and Scheduling of Flexible Process
Networks Under Uncertainty with Stochastic Inventory: MINLP Models and
Algorithm," AIChE J., vol. 59, no. 5, pp. 1511-1532, May 2013, doi:
10.1002/aic.13924.
[82] W. Shen, Z. Li, B. Huang, and N. M. Jan, "Chance-Constrained Model
Predictive Control for SAGD Process Using Robust Optimization
Approximation," Ind. Eng. Chem. Res., 2018/11/01 2018, doi:
10.1021/acs.iecr.8b03207.
[83] M. Cannon, B. Kouvaritakis, and X. J. Wu, "Probabilistic Constrained MPC for
Multiplicative and Additive Stochastic Uncertainty," IEEE Trans. Autom.
Control., vol. 54, no. 7, pp. 1626-1632, Jul 2009, doi: 10.1109/tac.2009.2017970.
[84] P. Li, H. Arellano-Garcia, and G. Wozny, "Chance constrained programming
approach to process optimization under uncertainty," Comput. Chem. Eng., vol.
32, no. 1-2, pp. 25-45, Jan-Feb 2008, doi: 10.1016/j.compchemeng.2007.05.009.
[85] Y. Yang, P. Vayanos, and P. I. Barton, "Chance-Constrained Optimization for
Refinery Blend Planning under Uncertainty," Ind. Eng. Chem. Res., vol. 56, no.
42, pp. 12139-12150, Oct 2017, doi: 10.1021/acs.iecr.7b02434.
[86] S. S. Liu, S. S. Farid, and L. G. Papageorgiou, "Integrated Optimization of
Upstream and Downstream Processing in Biopharmaceutical Manufacturing
under Uncertainty: A Chance Constrained Programming Approach," Ind. Eng.
Chem. Res., vol. 55, no. 16, pp. 4599-4612, Apr 2016, doi:
10.1021/acs.iecr.5b04403.
[87] K. Mitra, R. D. Gudi, S. C. Patwardhan, and G. Sardar, "Midterm supply chain
planning under uncertainty: A multiobjective chance constrained programming
framework," Ind. Eng. Chem. Res., vol. 47, no. 15, pp. 5501-5511, Aug 2008,
doi: 10.1021/ie0710364.
[88] J. Yang, H. Gu, and G. Rong, "Supply Chain Optimization for Refinery with
Considerations of Operation Mode Changeover and Yield Fluctuations," Ind.
Eng. Chem. Res., vol. 49, no. 1, pp. 276-287, Jan 2010, doi: 10.1021/ie900968x.
[89] F. Q. You and I. E. Grossmann, "Balancing Responsiveness and Economics in
Process Supply Chain Design with Multi-Echelon Stochastic Inventory," AIChE
J., vol. 57, no. 1, pp. 178-192, Jan 2011, doi: 10.1002/aic.12244.
[90] F. Q. You and I. E. Grossmann, "Mixed-Integer Nonlinear Programming Models
and Algorithms for Large-Scale Supply Chain Design with Stochastic Inventory
Management," Ind. Eng. Chem. Res., vol. 47, no. 20, pp. 7802-7817, Oct 2008,
doi: 10.1021/ie800257x.
[91] Y. Yuan, Z. Li, and B. Huang, "Robust optimization under correlated uncertainty:
Formulations and computational study," Comput. Chem. Eng., vol. 85, pp. 58-
71, 2016. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0098135415003464.
[92] A. Ben-Tal and A. Nemirovski, "Robust solutions of Linear Programming
problems contaminated with uncertain data," Math. Programming, vol. 88, p.
411, 2000.
[93] D. Bertsimas and M. Sim, "The price of robustness," Oper. Res., vol. 52, no. 1,
p. 35, 2004.

228
[94] A. Ben-Tal, L. E. Ghaoui, and A. Nemirovski, Robust Optimization. Princeton
University Press, 2009.
[95] C. Gregory, K. Darby-Dowman, and G. Mitra, "Robust optimization and
portfolio selection: The cost of robustness," Eur. J. Oper. Res., vol. 212, no. 2,
pp. 417-428, 2011, doi: http://dx.doi.org/10.1016/j.ejor.2011.02.015.
[96] T. Assavapokee, M. J. Realff, and J. C. Ammons, "Min-Max Regret Robust
Optimization Approach on Interval Data Uncertainty," J. Optim. Theory Appl.,
journal article vol. 137, no. 2, pp. 297-316, 2008, doi: 10.1007/s10957-007-
9334-6.
[97] A. L. Soyster, "Technical Note—Convex Programming with Set-Inclusive
Constraints and Applications to Inexact Linear Programming," Oper. Res., vol.
21, no. 5, pp. 1154-1157, 1973, doi: doi:10.1287/opre.21.5.1154.
[98] D. Bertsimas, D. B. Brown, and C. Caramanis, "Theory and applications of
robust optimization," SIAM Rev., vol. 53, no. 3, pp. 464-501, 2011.
[99] Á. Lorca, X. A. Sun, E. Litvinov, and T. Zheng, "Multistage adaptive robust
optimization for the unit commitment problem," Operations Research, vol. 61,
no. 1, pp. 32-51, 2016.
[100] A. Lorca and X. A. Sun, "Adaptive robust optimization with dynamic
uncertainty sets for multi-period economic dispatch under significant wind,"
Power Systems, IEEE Transactions on, vol. 30, no. 4, pp. 1702-1713, 2015.
[101] A. Atamtürk and M. Zhang, "Two-stage robust network flow and design under
demand uncertainty," Oper. Res., vol. 55, no. 4, pp. 662-673, 2007.
[102] D. Bertsimas, E. Litvinov, X. A. Sun, J. Zhao, and T. Zheng, "Adaptive Robust
Optimization for the Security Constrained Unit Commitment Problem," IEEE
Trans. Power Syst., vol. 28, no. 1, pp. 52-63, 2013.
[103] B. Zeng and L. Zhao, "Solving two-stage robust optimization problems using a
column-and-constraint generation method," Oper. Res. Lett., vol. 41, no. 5, pp.
457-461, 2013. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0167637713000618.
[104] Q. Zhang, M. F. Morari, I. E. Grossmann, A. Sundaramoorthy, and J. M. Pinto,
"An adjustable robust optimization approach to scheduling of continuous
industrial processes providing interruptible load," Comput. Chem. Eng., vol. 86,
pp. 106-119, 2016, doi: http://dx.doi.org/10.1016/j.compchemeng.2015.12.018.
[105] N. H. Lappas and C. E. Gounaris, "Multi-stage Adjustable Robust Optimization
for Process Scheduling under Uncertainty," AIChE Journal, vol. 62, no. 5, pp.
1646-1667, 2016, doi: 10.1002/aic.15183.
[106] A. Ben-Tal, A. Goryashko, E. Guslitzer, and A. Nemirovski, "Adjustable robust
solutions of uncertain linear programs," Math. Program., vol. 99, no. 2, pp. 351-
376, 2004, doi: 10.1007/s10107-003-0454-y.
[107] H. Shi and F. You, "A computational framework and solution algorithms for
two-stage adaptive robust scheduling of batch manufacturing processes under
uncertainty," AIChE J., vol. 62, no. 3, pp. 687-703, 2016. [Online]. Available:
http://dx.doi.org/10.1002/aic.15067.
[108] J. Gong, D. J. Garcia, and F. You, "Unraveling optimal biomass processing
routes from bioconversion product and process networks under uncertainty: An

229
adaptive robust optimization approach," ACS Sustain. Chem. Eng., vol. 4, no. 6,
pp. 3160-3173, 2016, doi: 10.1021/acssuschemeng.6b00188.
[109] J. Gong and F. You, "Optimal processing network design under uncertainty for
producing fuels and value-added bioproducts from microalgae: Two-stage
adaptive robust mixed integer fractional programming model and
computationally efficient solution algorithm," AIChE J., vol. 63, no. 2, pp. 582-
600, 2017, doi: 10.1002/aic.15370.
[110] E. Delage and D. A. Iancu, "Robust multistage decision making." Catonsville,
MD: INFORMS Tutorials in Operations Research, 2015, pp. 20-46.
[111] C. Ning and F. You, "A Transformation-Proximal Bundle Algorithm for Solving
Large-Scale Multistage Adaptive Robust Optimization Problems," arXiv
preprint arXiv:1810.05931, 2018.
[112] C. Ning and F. You, "A Data-Driven Multistage Adaptive Robust Optimization
Framework for Planning and Scheduling under Uncertainty," AIChE J., vol. 63,
no. 10, pp. 4343–4369, 2017b, doi: 10.1002/aic.15792.
[113] K. McLean and X. Li, "Robust Scenario Formulations for Strategic Supply
Chain Optimization under Uncertainty," Ind. Eng. Chem. Res., vol. 52, no. 16,
pp. 5721-5734, 2013, doi: 10.1021/ie303114r.
[114] D. Yue and F. You, "Optimal supply chain design and operations under multi-
scale uncertainties: Nested stochastic robust optimization modeling framework
and solution algorithm," AIChE J., vol. 62, no. 9, pp. 3041-3055, 2016, doi:
10.1002/aic.15255.
[115] C. Liu, C. Lee, H. Chen, and S. Mehrotra, "Stochastic Robust Mathematical
Programming Model for Power System Optimization," IEEE Trans. Power Syst.,
vol. 31, no. 1, pp. 821-822, 2016, doi: 10.1109/TPWRS.2015.2394320.
[116] L. Baringo and A. Baringo, "A Stochastic Adaptive Robust Optimization
Approach for the Generation and Transmission Expansion Planning," IEEE
Trans. Power Syst., vol. 33, no. 1, pp. 792-802, Jan 2018, doi:
10.1109/tpwrs.2017.2713486.
[117] C. Y. Zhao and Y. P. Guan, "Unified Stochastic and Robust Unit Commitment,"
IEEE Trans. Power Syst., vol. 28, no. 3, pp. 3353-3361, Aug 2013, doi:
10.1109/tpwrs.2013.2251916.
[118] G. D. Liu, Y. Xu, and K. Tomsovic, "Bidding Strategy for Microgrid in Day-
Ahead Market Based on Hybrid Stochastic/Robust Optimization," IEEE
Transactions on Smart Grid, vol. 7, no. 1, pp. 227-237, Jan 2016, doi:
10.1109/tsg.2015.2476669.
[119] E. Keyvanshokooh, S. M. Ryan, and E. Kabir, "Hybrid robust and stochastic
optimization for closed-loop supply chain network design using accelerated
Benders decomposition," Eur. J. Oper. Res., vol. 249, no. 1, pp. 76-92, Feb 2016,
doi: 10.1016/j.ejor.2015.08.028.
[120] P. Parpas, B. Rustem, and E. Pistikopoulos, "Global optimization of robust
chance constrained problems," Journal of Global Optimization, vol. 43, no. 2-3,
pp. 231-247, Mar 2009, doi: 10.1007/s10898-007-9244-z.

230
[121] J. E. Smith and R. L. Winkler, "The optimizer's curse: Skepticism and
postdecision surprise in decision analysis," Manage. Sci., vol. 52, no. 3, pp. 311-
322, 2006, doi: 10.1287/mnsc.1050.0451.
[122] E. Delage and Y. Y. Ye, "Distributionally Robust Optimization Under Moment
Uncertainty with Application to Data-Driven Problems," Oper. Res., vol. 58, no.
3, pp. 595-612, May-Jun 2010, doi: 10.1287/opre.1090.0741.
[123] G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann, "A distributionally
robust perspective on uncertainty quantification and chance constrained
programming," Math. Program., vol. 151, no. 1, pp. 35-62, Jun 2015, doi:
10.1007/s10107-015-0896-z.
[124] P. M. Esfahani and D. Kuhn, "Data-driven distributionally robust optimization
using the Wasserstein metric: performance guarantees and tractable
reformulations," Math. Program., vol. 171, no. 1-2, pp. 115-166, 2018, doi:
10.1007/s10107-017-1172-1.
[125] C. Shang and F. Q. You, "Distributionally robust optimization for planning and
scheduling under uncertainty," Comput. Chem. Eng., vol. 110, pp. 53-68, Feb
2018, doi: 10.1016/j.compchemeng.2017.12.002.
[126] G. C. Calafiore and L. El Ghaoui, "On distributionally robust chance-
constrained linear programs," J. Optim. Theory Appl., vol. 130, no. 1, pp. 1-22,
Jul 2006, doi: 10.1007/s10957-006-9084-x.
[127] J. Gao, C. Ning, and F. You, "Data-driven distributionally robust optimization
of shale gas supply chains under uncertainty," AIChE J., vol.
(doi:10.1002/aic.16488), no. 0, 2018, doi: doi:10.1002/aic.16488.
[128] Z. Hu and L. J. Hong, "Kullback-Leibler divergence constrained distributionally
robust optimization," Available at Optimization Online, 2013.
[129] D. Klabjan, D. Simchi-Levi, and M. Song, "Robust Stochastic Lot-Sizing by
Means of Histograms," Production and Operations Management, vol. 22, no. 3,
pp. 691-710, May-Jun 2013, doi: 10.1111/j.1937-5956.2012.01420.x.
[130] G. Bayraksan and D. K. Love, "Data-Driven Stochastic Programming Using
Phi-Divergences," in The Operations Research Revolution, 2015, pp. 1-19.
[131] G. A. Hanasusanto and D. Kuhn, "Conic Programming Reformulations of Two-
Stage Distributionally Robust Linear Programs over Wasserstein Balls," Oper.
Res., vol. 66, no. 3, pp. 849-869, May-Jun 2018, doi: 10.1287/opre.2017.1698.
[132] D. Bertsimas, M. Sim, and M. Zhang, "Adaptive Distributionally Robust
Optimization," Manage. Sci., vol. 0, no. 0, p. null, doi: 10.1287/mnsc.2017.2952.
[133] P. Xiong, P. Jirutitijaroen, and C. Singh, "A Distributionally Robust
Optimization Model for Unit Commitment Considering Uncertain Wind Power
Generation," IEEE Trans. Power Syst., vol. 32, no. 1, pp. 39-49, Jan 2017, doi:
10.1109/tpwrs.2016.2544795.
[134] Y. W. Chen, Q. L. Guo, H. B. Sun, Z. S. Li, W. C. Wu, and Z. H. Li, "A
Distributionally Robust Optimization Model for Unit Commitment Based on
Kullback-Leibler Divergence," IEEE Trans. Power Syst., vol. 33, no. 5, pp.
5147-5160, Sep 2018, doi: 10.1109/tpwrs.2018.2797069.

231
[135] C. Duan, L. Jiang, W. L. Fang, and J. Liu, "Data-Driven Affinely Adjustable
Distributionally Robust Unit Commitment," IEEE Trans. Power Syst., vol. 33,
no. 2, pp. 1385-1398, Mar 2018, doi: 10.1109/tpwrs.2017.2741506.
[136] C. Y. Zhao and Y. P. Guan, "Data-Driven Stochastic Unit Commitment for
Integrating Wind Generation," IEEE Trans. Power Syst., vol. 31, no. 4, pp. 2587-
2596, Jul 2016, doi: 10.1109/tpwrs.2015.2477311.
[137] C. Wang, R. Gao, F. Qiu, J. Wang, and L. Xin, "Risk-Based Distributionally
Robust Optimal Power Flow With Dynamic Line Rating," IEEE Trans. Power
Syst., vol. 33, no. 6, pp. 6074-6086, 2018, doi: 10.1109/TPWRS.2018.2844356.
[138] Y. Guo, K. Baker, E. Dall'Anese, Z. Hu, and T. Summers, "Stochastic Optimal
Power Flow Based on Data-Driven Distributionally Robust Optimization," in
2018 Annual American Control Conference (ACC), 27-29 June 2018 2018, pp.
3840-3846, doi: 10.23919/ACC.2018.8431542.
[139] S. Zymler, D. Kuhn, and B. Rustem, "Distributionally robust joint chance
constraints with second-order moment information," Math. Program., vol. 137,
no. 1-2, pp. 167-198, 2013.
[140] B. Li, R. Jiang, and J. L. Mathieu, "Ambiguous risk constraints with moment
and unimodality information," Math. Program., journal article November 24
2017, doi: 10.1007/s10107-017-1212-x.
[141] Z. Chen, S. Peng, and J. Liu, "Data-Driven Robust Chance Constrained
Problems: A Mixture Model Approach," J. Optim. Theory Appl., journal article
vol. 179, no. 3, pp. 1065-1085, December 01 2018, doi: 10.1007/s10957-018-
1376-4.
[142] L. El Ghaoui, M. Oks, and F. Oustry, "Worst-case Value-at-Risk and robust
portfolio optimization: A conic programming approach," Oper. Res., vol. 51, no.
4, pp. 543-556, Jul-Aug 2003. [Online]. Available: <Go to
ISI>://WOS:000184512800004.
[143] J. Cheng, E. Delage, and A. Lisser, "Distributionally Robust Stochastic
Knapsack Problem," SIAM J. Optim., vol. 24, no. 3, pp. 1485-1506, 2014, doi:
10.1137/130915315.
[144] Y. Zhang, R. Jiang, and S. Shen, "Ambiguous Chance-Constrained Binary
Programs under Mean-Covariance Information," SIAM J. Optim., vol. 28, no. 4,
pp. 2922-2944, 2018, doi: 10.1137/17m1158707.
[145] G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann, "Ambiguous Joint
Chance Constraints Under Mean and Dispersion Information," Oper. Res., vol.
65, no. 3, pp. 751-767, May-Jun 2017, doi: 10.1287/opre.2016.1583.
[146] W. Wiesemann, D. Kuhn, and M. Sim, "Distributionally Robust Convex
Optimization," Oper. Res., vol. 62, no. 6, pp. 1358-1376, Nov-Dec 2014, doi:
10.1287/opre.2014.1314.
[147] W. Z. Yang and H. Xu, "Distributionally robust chance constraints for non-linear
uncertainties," Math. Program., vol. 155, no. 1-2, pp. 231-265, Jan 2016, doi:
10.1007/s10107-014-0842-5.
[148] W. J. Xie and S. Ahmed, "On Deterministic Reformulations of Distributionally
Robust Joint Chance Constrained Optimization Problems," SIAM J. Optim., vol.
28, no. 2, pp. 1151-1182, 2018, doi: 10.1137/16m1094725.

232
[149] K. Postek, A. Ben-Tal, D. den Hertog, and B. Melenberg, "Robust Optimization
with Ambiguous Stochastic Constraints Under Mean and Dispersion
Information," Oper. Res., vol. 66, no. 3, pp. 814-833, May-Jun 2018, doi:
10.1287/opre.2017.1688.
[150] J. Lasserre and T. Weisser, "Distributionally robust polynomial chance-
constraints under mixture ambiguity sets," 2018.
[151] E. Erdogan and G. Iyengar, "Ambiguous chance constrained problems and
robust optimization," Math. Program., vol. 107, no. 1-2, pp. 37-61, Jun 2006,
doi: 10.1007/s10107-005-0678-0.
[152] R. W. Jiang and Y. P. Guan, "Data-driven chance constrained stochastic
program," Math. Program., vol. 158, no. 1-2, pp. 291-327, Jul 2016, doi:
10.1007/s10107-015-0929-7.
[153] Z. Chen, D. Kuhn, and W. Wiesemann, "Data-Driven Chance Constrained
Programs over Wasserstein Balls," arXiv preprint arXiv:1809.00210, 2018.
[154] R. Ji and M. Lejeune, "Data-Driven Distributionally Robust Chance-
Constrained Programming with Wasserstein Metric," 2018.
[155] R. Gao and A. J. Kleywegt, "Distributionally robust stochastic optimization with
Wasserstein distance," arXiv preprint arXiv:1604.02199, 2016.
[156] W. Xie, "On Distributionally Robust Chance Constrained Program with
Wasserstein Distance," arXiv preprint arXiv:1806.07418, 2018.
[157] A. R. Hota, A. Cherukuri, and J. Lygeros, "Data-Driven Chance Constrained
Optimization under Wasserstein Ambiguity Sets," arXiv preprint
arXiv:1805.06729, 2018.
[158] W. Xie and S. Ahmed, "Distributionally Robust Chance Constrained Optimal
Power Flow with Renewables: A Conic Reformulation," IEEE Trans. Power
Syst., vol. 33, no. 2, pp. 1860-1867, 2018, doi: 10.1109/TPWRS.2017.2725581.
[159] B. P. G. Van Parys, D. Kuhn, P. J. Goulart, and M. Morari, "Distributionally
Robust Control of Constrained Stochastic Systems," IEEE Trans. Autom.
Control., vol. 61, no. 2, pp. 430-442, Feb 2016, doi: 10.1109/tac.2015.2444134.
[160] S. Ghosal and W. Wiesemann, "The Distributionally Robust Chance
Constrained Vehicle Routing Problem," Available on Optimization Online, 2018.
[161] D. E. Bell, "Regret in Decision Making under Uncertainty," Oper. Res., vol. 30,
no. 5, pp. 961-981, 1982, doi: 10.1287/opre.30.5.961.
[162] C. Ning and F. You, "Adaptive robust optimization with minimax regret
criterion: Multiobjective optimization framework and computational algorithm
for planning and scheduling under uncertainty," Comput. Chem. Eng., vol. 108,
no. Supplement C, pp. 425-447, 2018, doi:
https://doi.org/10.1016/j.compchemeng.2017.09.026.
[163] C. Ning and F. You, "Data-driven stochastic robust optimization: General
computational framework and algorithm leveraging machine learning for
optimization under uncertainty in the big data era," Comput. Chem. Eng., vol.
111, pp. 115-133, 2018, doi:
https://doi.org/10.1016/j.compchemeng.2017.12.015.

233
[164] C. Shang, X. Huang, and F. You, "Data-driven robust optimization based on
kernel learning," Comput. Chem. Eng., vol. 106, pp. 464-479, 2017, doi:
https://doi.org/10.1016/j.compchemeng.2017.07.004.
[165] D. Bertsimas, V. Gupta, and N. Kallus, "Data-driven robust optimization," Math.
Program., journal article vol. 167, no. 2, pp. 235-292, February 01 2018, doi:
10.1007/s10107-017-1125-8.
[166] Y. Zhang, X. Z. Jin, Y. P. Feng, and G. Rong, "Data-driven robust optimization
under correlated uncertainty: A case study of production scheduling in ethylene
plant (Reprinted from computers and Chemical Engineering, vol 109, pg 48-67,
2017)," Comput. Chem. Eng., vol. 116, pp. 17-36, Aug 2018, doi:
10.1016/j.compchemeng.2017.10.039.
[167] Y. Zhang, Y. P. Feng, and G. Rong, "Data-driven rolling-horizon robust
optimization for petrochemical scheduling using probability density contours,"
Comput. Chem. Eng., vol. 115, pp. 342-360, Jul 2018, doi:
10.1016/j.compchemeng.2018.04.013.
[168] L. Zhao, C. Ning, and F. You, "Operational optimization of industrial steam
systems under uncertainty using data-Driven adaptive robust optimization,"
AIChE J., vol. (doi:10.1002/aic.16500), no. 0, doi: doi:10.1002/aic.16500.
[169] F. Miao et al., "Data-Driven Robust Taxi Dispatch Under Demand
Uncertainties," IEEE Transactions on Control Systems Technology, vol. 27, no.
1, pp. 175-191, Jan 2019, doi: 10.1109/tcst.2017.2766042.
[170] G. Calafiore and M. C. Campi, "Uncertain convex programs: randomized
solutions and confidence levels," Math. Program., journal article vol. 102, no.
1, pp. 25-46, January 01 2005, doi: 10.1007/s10107-003-0499-y.
[171] M. C. Campi, S. Garatti, and M. Prandini, "The scenario approach for systems
and control design," Annual Reviews in Control, vol. 33, no. 2, pp. 149-157, Dec
2009, doi: 10.1016/j.arcontrol.2009.07.001.
[172] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge university press,
2004.
[173] M. C. Campi and S. Garatti, "The Exact Feasibility of Randomized Solutions of
Uncertain Convex Programs," SIAM J. Optim., vol. 19, no. 3, pp. 1211-1230,
2008, doi: 10.1137/07069821x.
[174] X. J. Zhang, S. Grammatico, G. Schildbach, P. Goulart, and J. Lygeros, "On the
sample size of random convex programs with structured dependence on the
uncertainty," Automatica, vol. 60, pp. 182-188, Oct 2015, doi:
10.1016/j.automatica.2015.07.013.
[175] T. Kanamori and A. Takeda, "Worst-Case Violation of Sampled Convex
Programs for Optimization with Uncertainty," J. Optim. Theory Appl., vol. 152,
no. 1, pp. 171-197, Jan 2012, doi: 10.1007/s10957-011-9923-2.
[176] G. Calafiore, "On the Expected Probability of Constraint Violation in Sampled
Convex Programs," J. Optim. Theory Appl., vol. 143, no. 2, pp. 405-412, Nov
2009, doi: 10.1007/s10957-009-9579-3.
[177] P. M. Esfahani, T. Sutter, and J. Lygeros, "Performance Bounds for the Scenario
Approach and an Extension to a Class of Non-Convex Programs," IEEE Trans.

234
Autom. Control., vol. 60, no. 1, pp. 46-58, Jan 2015, doi:
10.1109/tac.2014.2330702.
[178] G. C. Calafiore, "RANDOM CONVEX PROGRAMS," SIAM J. Optim., vol. 20,
no. 6, pp. 3427-3464, 2010, doi: 10.1137/090773490.
[179] M. C. Campi and S. Garatti, "A Sampling-and-Discarding Approach to Chance-
Constrained Optimization: Feasibility and Optimality," J. Optim. Theory Appl.,
vol. 148, no. 2, pp. 257-280, Feb 2011, doi: 10.1007/s10957-010-9754-6.
[180] M. C. Campi and S. Garatti, "Wait-and-judge scenario optimization," Math.
Program., vol. 167, no. 1, pp. 155-189, Jan 2018, doi: 10.1007/s10107-016-
1056-9.
[181] N. Kariotoglou, K. Margellos, and J. Lygeros, "On the computational
complexity and generalization properties of multi-stage and stage-wise coupled
scenario programs," Systems & Control Letters, vol. 94, pp. 63-69, Aug 2016,
doi: 10.1016/j.sysconle.2016.05.009.
[182] P. Vayanos, D. Kuhn, and B. Rustem, "A constraint sampling approach for
multi-stage robust optimization," Automatica, vol. 48, no. 3, pp. 459-471,
2012/03/01/ 2012, doi: 10.1016/j.automatica.2011.12.002.
[183] G. Calafiore, D. Lyons, and L. Fagiano, "On mixed-integer random convex
programs," in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC),
10-13 Dec. 2012 2012, pp. 3508-3513, doi: 10.1109/CDC.2012.6426905.
[184] J. A. De Loera, R. N. La Haye, D. Oliveros, and E. Roldan-Pensado, "Chance-
Constrained Convex Mixed-Integer Optimization and Beyond: Two Sampling
Algorithms within S-Optimization," Journal of Convex Analysis, vol. 25, no. 1,
pp. 201-218, 2018. [Online]. Available: <Go to ISI>://WOS:000428115600012.
[185] M. Chamanbaz, F. Dabbene, R. Tempo, V. Venkataramanan, and Q. G. Wang,
"Sequential Randomized Algorithms for Convex Optimization in the Presence
of Uncertainty," IEEE Trans. Autom. Control., vol. 61, no. 9, pp. 2565-2571,
Sep 2016, doi: 10.1109/tac.2015.2494875.
[186] T. Alamo, R. Tempo, A. Luque, and D. R. Ramirez, "Randomized methods for
design of uncertain systems: Sample complexity and sequential algorithms,"
Automatica, vol. 52, pp. 160-172, Feb 2015, doi:
10.1016/j.automatica.2014.11.004.
[187] G. Calafiore, "Repetitive Scenario Design," IEEE Trans. Autom. Control., vol.
62, no. 3, pp. 1125-1137, Mar 2017, doi: 10.1109/tac.2016.2575859.
[188] K. You, R. Tempo, and P. Xie, "Distributed Algorithms for Robust Convex
Optimization via the Scenario Approach," IEEE Trans. Autom. Control., pp. 1-
1, 2018, doi: 10.1109/TAC.2018.2828093.
[189] K. Margellos, A. Falsone, S. Garatti, and M. Prandini, "Distributed Constrained
Optimization and Consensus in Uncertain Networks via Proximal
Minimization," IEEE Trans. Autom. Control., vol. 63, no. 5, pp. 1372-1387,
May 2018, doi: 10.1109/tac.2017.2747505.
[190] L. Carlone, V. Srivastava, F. Bullo, and G. C. Calafiore, "Distributed Random
Convex Programming via Constraints Consensus," SIAM Journal on Control
and Optimization, vol. 52, no. 1, pp. 629-662, 2014, doi: 10.1137/120885796.

235
[191] A. Care, S. Garatti, and M. C. Campi, "FAST-Fast Algorithm for the Scenario
Technique," Oper. Res., vol. 62, no. 3, pp. 662-671, May-Jun 2014, doi:
10.1287/opre.2014.1257.
[192] M. C. Campi, S. Garatti, and F. A. Ramponi, "A General Scenario Theory for
Nonconvex Optimization and Decision Making," IEEE Trans. Autom. Control.,
vol. 63, no. 12, pp. 4067-4078, 2018, doi: 10.1109/TAC.2018.2808446.
[193] T. Alamo, R. Tempo, and E. F. Camacho, "Randomized Strategies for
Probabilistic Solutions of Uncertain Feasibility and Optimization Problems,"
IEEE Trans. Autom. Control., vol. 54, no. 11, pp. 2545-2559, Nov 2009, doi:
10.1109/tac.2009.2031207.
[194] G. Calafiore, F. Dabbene, and R. Tempo, "Research on probabilistic methods
for control system design," Automatica, vol. 47, no. 7, pp. 1279-1293, Jul 2011,
doi: 10.1016/j.automatica.2011.02.029.
[195] S. Grammatico, X. J. Zhang, K. Margellos, P. Goulart, and J. Lygeros, "A
Scenario Approach for Non-Convex Control Design," IEEE Trans. Autom.
Control., vol. 61, no. 2, pp. 334-345, Feb 2016, doi: 10.1109/tac.2015.2433591.
[196] A. R. Mohamed, G. E. Dahl, and G. Hinton, "Acoustic Modeling Using Deep
Belief Networks," Ieee Transactions on Audio Speech and Language Processing,
vol. 20, no. 1, pp. 14-22, Jan 2012, doi: 10.1109/tasl.2011.2109382.
[197] Z. P. Zhang and J. S. Zhao, "A deep belief network based fault diagnosis model
for complex chemical processes," Comput. Chem. Eng., vol. 107, pp. 395-407,
Dec 2017, doi: 10.1016/j.compchemeng.2017.02.041.
[198] C. Shang, F. Yang, D. X. Huang, and W. X. Lyu, "Data-driven soft sensor
development based on deep learning technique," Journal of Process Control,
vol. 24, no. 3, pp. 223-233, Mar 2014, doi: 10.1016/j.jprocont.2014.012.
[199] E. Gawehn, J. A. Hiss, and G. Schneider, "Deep learning in drug discovery,"
Molecular informatics, vol. 35, no. 1, pp. 3-14, 2016.
[200] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "ImageNet Classification with
Deep Convolutional Neural Networks," Communications of the Acm, vol. 60, no.
6, pp. 84-90, Jun 2017, doi: 10.1145/3065386.
[201] Y. Wu, H. Tan, L. Qin, B. Ran, and Z. Jiang, "A hybrid deep learning based
traffic flow prediction method and its understanding," Transportation Research
Part C: Emerging Technologies, vol. 90, pp. 166-180, 2018/05/01/ 2018, doi:
https://doi.org/10.1016/j.trc.2018.03.001.
[202] A. Graves, A. R. Mohamed, G. Hinton, and Ieee, "SPEECH RECOGNITION
WITH DEEP RECURRENT NEURAL NETWORKS," in 2013 Ieee
International Conference on Acoustics, Speech and Signal Processing,
(International Conference on Acoustics Speech and Signal Processing ICASSP,
2013, pp. 6645-6649.
[203] J. Vermaak and E. C. Botha, "Recurrent neural networks for short-term load
forecasting," IEEE Trans. Power Syst., vol. 13, no. 1, pp. 126-132, 1998, doi:
10.1109/59.651623.
[204] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural
computation, vol. 9, no. 8, pp. 1735-1780, 1997.

236
[205] J. Potočnik, "Renewable Energy Sources and the Realities of Setting an Energy
Agenda," Science, vol. 315, no. 5813, pp. 810-811, 2007, doi:
10.1126/science.1139086.
[206] H. Kopetz, "Build a biomass energy market," Nature, Comment vol. 494, pp.
29-31, 2013, doi: 10.1038/494029a.
[207] D. Yue, F. You, and S. W. Snyder, "Biomass-to-bioenergy and biofuel supply
chain optimization: Overview, key issues and challenges," Comput. Chem. Eng.,
vol. 66, pp. 36-56, 2014, doi:
http://dx.doi.org/10.1016/j.compchemeng.2013.11.016.
[208] Z. Hu, Y. Wang, and Z. Wen, "Alkali (NaOH) pretreatment of switchgrass by
radio frequency-based dielectric heating," Appl. Biochem. Biotechnol., vol. 148,
no. 1-3, pp. 71-81, 2008, doi: 10.1007/s12010-007-8083-1.
[209] M. Safar et al., "Catalytic effects of potassium on biomass pyrolysis, combustion
and torrefaction," Applied Energy, vol. 235, pp. 346-355, 2019, doi:
https://doi.org/10.1016/j.apenergy.2018.10.065.
[210] V. Benedetti, F. Patuzzi, and M. Baratieri, "Characterization of char from
biomass gasification and its similarities with activated carbon in adsorption
applications," Applied Energy, vol. 227, pp. 92-99, 2018, doi:
https://doi.org/10.1016/j.apenergy.2017.08.076.
[211] W. Zhang, J. R. Barone, and S. Renneckar, "Biomass Fractionation after
Denaturing Cell Walls by Glycerol Thermal Processing," ACS Sustain. Chem.
Eng., vol. 3, no. 3, pp. 413-420, 2015, doi: 10.1021/sc500564g.
[212] T. Damartzis and A. Zabaniotou, "Thermochemical conversion of biomass to
second generation biofuels through integrated process design—A review,"
Renewable and Sustainable Energy Reviews, vol. 15, no. 1, pp. 366-378, 2011,
doi: https://doi.org/10.1016/j.rser.2010.08.003.
[213] K. Dutta, A. Daverey, and J.-G. Lin, "Evolution retrospective for alternative
fuels: First to fourth generation," Renewable Energy, vol. 69, pp. 114-122, 2014,
doi: https://doi.org/10.1016/j.renene.2014.02.044.
[214] R. A. Lee and J.-M. Lavoie, "From first- to third-generation biofuels: Challenges
of producing a commodity from a biomass of increasing complexity," Animal
Frontiers, vol. 3, no. 2, pp. 6-11, 2013, doi: 10.2527/af.2013-0010.
[215] L. Gil-Carrera, J. D. Browne, I. Kilgallon, and J. D. Murphy, "Feasibility study
of an off-grid biomethane mobile solution for agri-waste," Applied Energy, vol.
239, pp. 471-481, 2019, doi: https://doi.org/10.1016/j.apenergy.2019.01.141.
[216] J. Lee et al., "Pyrolysis process of agricultural waste using CO2 for waste
management, energy recovery, and biochar fabrication," Applied Energy, vol.
185, pp. 214-222, 2017, doi: https://doi.org/10.1016/j.apenergy.2016.10.092.
[217] M. Rajinipriya, M. Nagalakshmaiah, M. Robert, and S. Elkoun, "Importance of
Agricultural and Industrial Waste in the Field of Nanocellulose and Recent
Industrial Developments of Wood Based Nanocellulose: A Review," ACS
Sustain. Chem. Eng., vol. 6, no. 3, pp. 2807-2828, 2018, doi:
10.1021/acssuschemeng.7b03437.

237
[218] W. H. Chen et al., "A comprehensive analysis of food waste derived liquefaction
bio-oil properties for industrial application," Applied Energy, vol. 237, pp. 283-
291, 2019, doi: 10.1016/j.apenergy.2018.12.084.
[219] D. J. Garcia and F. You, "Multiobjective optimization of product and process
networks: General modeling framework, efficient global optimization algorithm,
and case studies on bioconversion," AIChE J., vol. 61, no. 2, pp. 530-554, 2015,
doi: 10.1002/aic.14666.
[220] A. Soroudi and T. Amraee, "Decision making under uncertainty in energy
systems: State of the art," Renewable and Sustainable Energy Reviews, vol. 28,
pp. 376-384, 2013, doi: https://doi.org/10.1016/j.rser.2013.08.039.
[221] C. Ning and F. You, "Optimization under uncertainty in the era of big data and
deep learning: When machine learning meets mathematical programming,"
Comput. Chem. Eng., vol. 125, pp. 434-448, 2019, doi:
https://doi.org/10.1016/j.compchemeng.2019.03.034.
[222] C. Ning and F. You, "A data-driven multistage adaptive robust optimization
framework for planning and scheduling under uncertainty," AIChE J., vol. 63,
no. 10, pp. 4343-4369, 2017, doi: 10.1002/aic.15792.
[223] P. Daoutidis, W. A. Marvin, S. Rangarajan, and A. I. Torres, "Engineering
Biomass Conversion Processes: A Systems Perspective," AIChE J., vol. 59, no.
1, pp. 3-18, 2013, doi: 10.1002/aic.13978.
[224] S. Rangarajan, A. Bhan, and P. Daoutidis, "Rule-Based Generation of
Thermochemical Routes to Biomass Conversion," Ind. Eng. Chem. Res., vol. 49,
no. 21, pp. 10459-10470, 2010, doi: 10.1021/ie100546t.
[225] J. Kim, S. M. Sen, and C. T. Maravelias, "An optimization-based assessment
framework for biomass-to-fuel conversion strategies," Energy & Environmental
Science, vol. 6, no. 4, pp. 1093-1104, 2013, doi: 10.1039/c3ee24243a.
[226] J. Gong and F. You, "Global Optimization for Sustainable Design and Synthesis
of Algae Processing Network for CO2 Mitigation and Biofuel Production Using
Life Cycle Optimization," AIChE J., vol. 60, no. 9, pp. 3195-3210, 2014, doi:
10.1002/aic.14504.
[227] J. Gong and F. You, "Sustainable design and synthesis of energy systems,"
Current Opinion in Chemical Engineering, vol. 10, pp. 77-86, 2015, doi:
10.1016/j.coche.2015.09.001.
[228] S. Bairamzadeh, M. Saidi-Mehrabad, and M. S. Pishvaee, "Modelling different
types of uncertainty in biofuel supply network design and planning: A robust
optimization approach," Renewable Energy, vol. 116, pp. 500-517, 2018, doi:
https://doi.org/10.1016/j.renene.2017.09.020.
[229] C. Caldeira, O. Swei, F. Freire, L. C. Dias, E. A. Olivetti, and R. Kirchain,
"Planning strategies to address operational and price uncertainty in biodiesel
production," Applied Energy, vol. 238, pp. 1573-1581, 2019, doi:
10.1016/j.apenergy.2019.01.195.
[230] K. Tong, J. Gong, D. Yue, and F. You, "Stochastic Programming Approach to
Optimal Design and Operations of Integrated Hydrocarbon Biofuel and
Petroleum Supply Chains," ACS Sustain. Chem. Eng., vol. 2, no. 1, pp. 49-61,
2014, doi: 10.1021/sc4002671.

238
[231] A. Osmani and J. Zhang, "Economic and environmental optimization of a large
scale sustainable dual feedstock lignocellulosic-based bioethanol supply chain
in a stochastic environment," Applied Energy, vol. 114, pp. 572-587, 2014, doi:
10.1016/j.apenergy.2013.10.024.
[232] D. Bertsimas, S. Shtern, and B. Sturt, "A Data-Driven Approach for Multi-Stage
Linear Optimization," optimization-online.org, 2019.
[233] Z. Chen, M. Sim, and P. Xiong, "Robust Stochastic Optimization: The Synergy
of Robust Optimization and Stochastic Programming," optimization-online.org,
2019.
[234] W. Xie, "On Distributionally Robust Chance Constrained Programs with
Wasserstein Distance," arXiv preprint arXiv:1806.07418, 2018.
[235] E. Delage and Y. Ye, "Distributionally Robust Optimization Under Moment
Uncertainty with Application to Data-Driven Problems," Oper. Res., vol. 58, no.
3, pp. 595-612, 2010, doi: 10.1287/opre.1090.0741.
[236] K. L. Hoffman, "A method for globally minimizing concave functions over
convex sets," Math. Program., journal article vol. 20, no. 1, pp. 22-32, 1981, doi:
10.1007/bf01589330.
[237] S. S. Toor, L. Rosendahl, and A. Rudolf, "Hydrothermal liquefaction of biomass:
A review of subcritical water technologies," Energy, vol. 36, no. 5, pp. 2328-
2342, 2011, doi: https://doi.org/10.1016/j.energy.2011.03.013.
[238] D. J. Garcia and F. You, "Systems engineering opportunities for agricultural and
organic waste management in the food-water-energy nexus," Current Opinion
in Chemical Engineering, vol. 18, pp. 23-31, 2017, doi:
10.1016/j.coche.2017.08.004.
[239] P. Morone, A. Koutinas, N. Gathergood, M. Arshadi, and A. Matharu, "Food
waste: Challenges and opportunities for enhancing the emerging bio-economy,"
Journal of Cleaner Production, vol. 221, pp. 10-16, 2019, doi:
https://doi.org/10.1016/j.jclepro.2019.02.258.
[240] J. Nicoletti, C. Ning, and F. You, "Incorporating Agricultural Waste-to-Energy
Pathways into Biomass Product and Process Network through Data-Driven
Nonlinear Adaptive Robust Optimization," Energy, vol. 180, pp. 556-571, 2019.
[241] R. Hakawati, B. M. Smyth, G. McCullough, F. De Rosa, and D. Rooney, "What
is the most energy efficient route for biogas utilization: Heat, electricity or
transport?," Applied Energy, vol. 206, pp. 1076-1087, 2017, doi:
10.1016/j.apenergy.2017.08.068.
[242] Y. Y. Jin, T. Chen, X. Chen, and Z. X. Yu, "Life-cycle assessment of energy
consumption and environmental impact of an integrated food waste-based
biogas plant," Applied Energy, vol. 151, pp. 227-236, 2015, doi:
10.1016/j.apenergy.2015.04.058.
[243] R. Campuzano and S. González-Martínez, "Characteristics of the organic
fraction of municipal solid waste and methane production: A review," Waste
Manage. (Oxford), vol. 54, pp. 3-12, 2016, doi:
https://doi.org/10.1016/j.wasman.2016.05.016.
[244] C. Villani, Optimal transport: old and new. Springer Science & Business Media,
2008.

239
[245] C. Zhao and Y. Guan, "Data-driven risk-averse stochastic optimization with
Wasserstein metric," Oper. Res. Lett., vol. 46, no. 2, pp. 262-267, 2018, doi:
https://doi.org/10.1016/j.orl.2018.01.011.
[246] T. A. Reddy, Applied data analysis and modeling for energy engineers and
scientists. Springer Science & Business Media, 2011.
[247] M. Rizwan, J. H. Lee, and R. Gani, "Optimal design of microalgae-based
biorefinery: Economics, opportunities and challenges," Applied Energy, vol. 150,
pp. 69-79, 2015, doi: https://doi.org/10.1016/j.apenergy.2015.04.018.
[248] Z. Zheng et al., "Effect of dairy manure to switchgrass co-digestion ratio on
methane production and the bacterial community in batch anaerobic digestion,"
Applied Energy, vol. 151, pp. 249-257, 2015, doi:
https://doi.org/10.1016/j.apenergy.2015.04.078.
[249] C. G. Gutierrez-Arriaga, M. Serna-Gonzalez, J. M. Ponce-Ortega, and M. M. El-
Halwagi, "Sustainable Integration of Algal Biodiesel Production with Steam
Electric Power Plants for Greenhouse Gas Mitigation," ACS Sustain. Chem. Eng.,
vol. 2, no. 6, pp. 1388-1403, 2014, doi: 10.1021/sc400436a.
[250] J. Seader, W. D. Seider, and D. R. Lewin, Product and process design principles:
synthesis, analysis and evaluation. Wiley, 2004.
[251] D. Bertsimas, M. Sim, and M. Zhang, "Adaptive Distributionally Robust
Optimization," Manage. Sci., vol. 65, no. 2, pp. 604-618, 2019, doi:
10.1287/mnsc.2017.2952.
[252] D. J. Garcia and F. You, "Network-Based Life Cycle Optimization of the Net
Atmospheric CO2-eq Ratio (NACR) of Fuels and Chemicals Production from
Biomass," ACS Sustain. Chem. Eng., vol. 3, no. 8, pp. 1732-1744, 2015, doi:
10.1021/acssuschemeng.5b00262.
[253] "https://www.indexmundi.com/commodities/," 2019.
[254] E. Rosenthal, GAMS-A user’s guide. Washington, DC: GAMS Development
Corporation, 2008.
[255] J. Remon, P. Arcelus-Arrillaga, L. Garcia, and J. Arauzo, "Simultaneous
production of gaseous and liquid biofuels from the synergetic co-valorisation of
bio-oil and crude glycerol in supercritical water," Applied Energy, vol. 228, pp.
2275-2287, 2018, doi: 10.1016/j.apenergy.2018.07.093.
[256] I. Ullah Khan et al., "Biogas as a renewable energy fuel – A review of biogas
upgrading, utilisation and storage," Energy Convers. Manage., vol. 150, pp. 277-
294, 2017, doi: https://doi.org/10.1016/j.enconman.2017.08.035.
[257] A. Shapiro, "On Duality Theory of Conic Linear Problems," in Semi-Infinite
Programming: Recent Advances, M. Á. Goberna and M. A. López Eds. Boston,
MA: Springer US, 2001, pp. 135-165.
[258] A. J. Conejo and L. Baringo, Power system operations. Springer, 2018.
[259] X. Xia and A. M. Elaiw, "Optimal dynamic economic dispatch of generation: A
review," Electric Power Systems Research, vol. 80, no. 8, pp. 975-986, 2010,
doi: https://doi.org/10.1016/j.epsr.2009.12.012.
[260] J. Hetzer, D. C. Yu, and K. Bhattarai, "An Economic Dispatch Model
Incorporating Wind Power," IEEE Transactions on Energy Conversion, vol. 23,
no. 2, pp. 603-611, 2008, doi: 10.1109/TEC.2007.914171.

240
[261] A. Alqurashi, A. H. Etemadi, and A. Khodaei, "Treatment of uncertainty for next
generation power systems: State-of-the-art in stochastic optimization," Electric
Power Systems Research, vol. 141, pp. 233-245, 2016, doi:
https://doi.org/10.1016/j.epsr.2016.08.009.
[262] L. Á and X. A. Sun, "Adaptive Robust Optimization With Dynamic Uncertainty
Sets for Multi-Period Economic Dispatch Under Significant Wind," IEEE Trans.
Power Syst., vol. 30, no. 4, pp. 1702-1713, 2015, doi:
10.1109/TPWRS.2014.2357714.
[263] J. Zhao, T. Zheng, and E. Litvinov, "Variable Resource Dispatch Through Do-
Not-Exceed Limit," IEEE Trans. Power Syst., vol. 30, no. 2, pp. 820-828, 2015,
doi: 10.1109/TPWRS.2014.2333367.
[264] R. A. Jabr, S. Karaki, and J. A. Korbane, "Robust Multi-Period OPF With
Storage and Renewables," IEEE Trans. Power Syst., vol. 30, no. 5, pp. 2790-
2799, 2015, doi: 10.1109/TPWRS.2014.2365835.
[265] H. Qiu, B. Zhao, W. Gu, and R. Bo, "Bi-Level Two-Stage Robust Optimal
Scheduling for AC/DC Hybrid Multi-Microgrids," IEEE Transactions on Smart
Grid, vol. 9, no. 5, pp. 5455-5466, 2018, doi: 10.1109/TSG.2018.2806973.
[266] W. Wu, J. Chen, B. Zhang, and H. Sun, "A Robust Wind Power Optimization
Method for Look-Ahead Power Dispatch," IEEE Transactions on Sustainable
Energy, vol. 5, no. 2, pp. 507-515, 2014, doi: 10.1109/TSTE.2013.2294467.
[267] Z. Li, W. Wu, B. Zhang, and B. Wang, "Adjustable Robust Real-Time Power
Dispatch With Large-Scale Wind Power Integration," IEEE Transactions on
Sustainable Energy, vol. 6, no. 2, pp. 357-368, 2015, doi:
10.1109/TSTE.2014.2377752.
[268] Z. Lin, H. Chen, Q. Wu, W. Li, M. Li, and T. Ji, "Mean-tracking model based
stochastic economic dispatch for power systems with high penetration of wind
power," Energy, vol. 193, p. 116826, 2020, doi:
https://doi.org/10.1016/j.energy.2019.116826.
[269] R. Lu, T. Ding, B. Qin, J. Ma, X. Fang, and Z. Y. Dong, "Multi-Stage Stochastic
Programming to Joint Economic Dispatch for Energy and Reserve with
Uncertain Renewable Energy," IEEE Transactions on Sustainable Energy, pp.
1-1, 2019, doi: 10.1109/TSTE.2019.2918269.
[270] F. Qiu and J. Wang, "Chance-Constrained Transmission Switching With
Guaranteed Wind Power Utilization," IEEE Trans. Power Syst., vol. 30, no. 3,
pp. 1270-1278, 2015, doi: 10.1109/TPWRS.2014.2346987.
[271] Z. Zhang, Y. Sun, D. W. Gao, J. Lin, and L. Cheng, "A Versatile Probability
Distribution Model for Wind Power Forecast Errors and Its Application in
Economic Dispatch," IEEE Trans. Power Syst., vol. 28, no. 3, pp. 3114-3125,
2013, doi: 10.1109/TPWRS.2013.2249596.
[272] C. Tang et al., "Look-Ahead Economic Dispatch With Adjustable Confidence
Interval Based on a Truncated Versatile Distribution Model for Wind Power,"
IEEE Trans. Power Syst., vol. 33, no. 2, pp. 1755-1767, 2018, doi:
10.1109/TPWRS.2017.2715852.
[273] Z. Wang, C. Shen, F. Liu, X. Wu, C. Liu, and F. Gao, "Chance-Constrained
Economic Dispatch With Non-Gaussian Correlated Wind Power Uncertainty,"

241
IEEE Trans. Power Syst., vol. 32, no. 6, pp. 4880-4893, 2017, doi:
10.1109/TPWRS.2017.2672750.
[274] Y. Yang, W. Wu, B. Wang, and M. Li, "Analytical Reformulation for Stochastic
Unit Commitment Considering Wind Power Uncertainty with Gaussian Mixture
Model," IEEE Trans. Power Syst., pp. 1-1, 2019, doi:
10.1109/TPWRS.2019.2960389.
[275] B. Khorramdel, A. Zare, C. Y. Chung, and P. Gavriliadis, "A Generic Convex
Model for a Chance-Constrained Look-Ahead Economic Dispatch Problem
Incorporating an Efficient Wind Power Distribution Modeling," IEEE Trans.
Power Syst., vol. 35, no. 2, pp. 873-886, 2020, doi:
10.1109/TPWRS.2019.2940288.
[276] K. Baker and A. Bernstein, "Joint Chance Constraints in AC Optimal Power
Flow: Improving Bounds Through Learning," IEEE Transactions on Smart Grid,
vol. 10, no. 6, pp. 6376-6385, 2019, doi: 10.1109/TSG.2019.2903767.
[277] M. S. Modarresi et al., "Scenario-Based Economic Dispatch With Tunable Risk
Levels in High-Renewable Power Systems," IEEE Trans. Power Syst., vol. 34,
no. 6, pp. 5103-5114, 2019, doi: 10.1109/TPWRS.2018.2874464.
[278] H. Ming, L. Xie, M. C. Campi, S. Garatti, and P. R. Kumar, "Scenario-Based
Economic Dispatch With Uncertain Demand Response," IEEE Transactions on
Smart Grid, vol. 10, no. 2, pp. 1858-1868, 2019, doi:
10.1109/TSG.2017.2778688.
[279] X. Geng and L. Xie, "Data-driven decision making in power systems with
probabilistic guarantees: Theory and applications of chance-constrained
optimization," Annual Reviews in Control, vol. 47, pp. 341-363, 2019, doi:
https://doi.org/10.1016/j.arcontrol.2019.05.005.
[280] O. Ciftci, M. Mehrtash, and A. Kargarian, "Data-Driven Nonparametric Chance-
Constrained Optimization for Microgrid Energy Management," IEEE
Transactions on Industrial Informatics, vol. 16, no. 4, pp. 2447-2457, 2020, doi:
10.1109/TII.2019.2932078.
[281] W. Sun, M. Zamani, M. R. Hesamzadeh, and H. Zhang, "Data-Driven
Probabilistic Optimal Power Flow With Nonparametric Bayesian Modeling and
Inference," IEEE Transactions on Smart Grid, vol. 11, no. 2, pp. 1077-1090,
2020, doi: 10.1109/TSG.2019.2931160.
[282] C. Ning and F. You, "Data-Driven Adaptive Robust Unit Commitment Under
Wind Power Uncertainty: A Bayesian Nonparametric Approach," IEEE Trans.
Power Syst., vol. 34, no. 3, pp. 2409-2418, 2019, doi:
10.1109/TPWRS.2019.2891057.
[283] W. Wei, F. Liu, and S. Mei, "Distributionally Robust Co-Optimization of
Energy and Reserve Dispatch," IEEE Transactions on Sustainable Energy, vol.
7, no. 1, pp. 289-300, 2016, doi: 10.1109/TSTE.2015.2494010.
[284] Y. L. Zhang, S. Q. Shen, and J. L. Mathieu, "Distributionally Robust Chance-
Constrained Optimal Power Flow With Uncertain Renewables and Uncertain
Reserves Provided by Loads," IEEE Trans. Power Syst., vol. 32, no. 2, pp. 1378-
1388, Mar 2017, doi: 10.1109/tpwrs.2016.2572104.

242
[285] M. Shahidehpour, Y. Zhou, Z. Wei, S. Chen, Z. Li, and G. Sun, "Distributionally
Robust Co-optimization of Energy and Reserve for Combined Distribution
Networks of Power and District Heating," IEEE Trans. Power Syst., pp. 1-1,
2019, doi: 10.1109/TPWRS.2019.2954710.
[286] Z. Shi, H. Liang, S. Huang, and V. Dinavahi, "Distributionally Robust Chance-
Constrained Energy Management for Islanded Microgrids," IEEE Transactions
on Smart Grid, vol. 10, no. 2, pp. 2234-2244, 2019, doi:
10.1109/TSG.2018.2792322.
[287] X. Lu, K. W. Chan, S. Xia, B. Zhou, and X. Luo, "Security-Constrained
Multiperiod Economic Dispatch With Renewable Energy Utilizing
Distributionally Robust Optimization," IEEE Transactions on Sustainable
Energy, vol. 10, no. 2, pp. 768-779, 2019, doi: 10.1109/TSTE.2018.2847419.
[288] B. Li, R. Jiang, and J. L. Mathieu, "Distributionally Robust Chance-Constrained
Optimal Power Flow Assuming Unimodal Distributions With Misspecified
Modes," IEEE Transactions on Control of Network Systems, vol. 6, no. 3, pp.
1223-1234, 2019, doi: 10.1109/TCNS.2019.2930872.
[289] Y. Chen, W. Wei, F. Liu, and S. Mei, "Distributionally robust hydro-thermal-
wind economic dispatch," Applied Energy, vol. 173, pp. 511-519, 2016/07/01/
2016, doi: https://doi.org/10.1016/j.apenergy.2016.04.060.
[290] H. Ma, R. Jiang, and Z. Yan, "Distributionally Robust Co-Optimization of
Power Dispatch and Do-Not-Exceed Limits," IEEE Trans. Power Syst., vol. 35,
no. 2, pp. 887-897, 2020, doi: 10.1109/TPWRS.2019.2941635.
[291] W. J. Xie and S. Ahmed, "Distributionally Robust Chance Constrained Optimal
Power Flow with Renewables: A Conic Reformulation," IEEE Trans. Power
Syst., vol. 33, no. 2, pp. 1860-1867, Mar 2018, doi:
10.1109/tpwrs.2017.2725581.
[292] M. Lubin, Y. Dvorkin, and S. Backhaus, "A Robust Approach to Chance
Constrained Optimal Power Flow With Renewable Generation," IEEE Trans.
Power Syst., vol. 31, no. 5, pp. 3840-3849, 2016, doi:
10.1109/TPWRS.2015.2499753.
[293] C. Duan, L. Jiang, W. Fang, J. Liu, and S. Liu, "Data-Driven Distributionally
Robust Energy-Reserve-Storage Dispatch," IEEE Transactions on Industrial
Informatics, vol. 14, no. 7, pp. 2826-2836, 2018, doi: 10.1109/TII.2017.2771355.
[294] H. Zhang, Z. Hu, E. Munsing, S. J. Moura, and Y. Song, "Data-Driven Chance-
Constrained Regulation Capacity Offering for Distributed Energy Resources,"
IEEE Transactions on Smart Grid, vol. 10, no. 3, pp. 2713-2725, 2019, doi:
10.1109/TSG.2018.2809046.
[295] Y. Guo, K. Baker, E. Dall'Anese, Z. Hu, and T. Summers, "Data-based
distributionally robust stochastic optimal power flow, Part I: Methodologies,"
IEEE Trans. Power Syst., pp. 1-1, 2018, doi: 10.1109/TPWRS.2018.2878385.
[296] C. Ordoudis, V. A. Nguyen, D. Kuhn, and P. Pinson, "Energy and Reserve
Dispatch with Distributionally Robust Joint Chance Constraints," 2018.
[297] Y. Chen, Q. Guo, H. Sun, Z. Li, W. Wu, and Z. Li, "A Distributionally Robust
Optimization Model for Unit Commitment Based on Kullback–Leibler

243
Divergence," IEEE Trans. Power Syst., vol. 33, no. 5, pp. 5147-5160, 2018, doi:
10.1109/TPWRS.2018.2797069.
[298] Y. Z. Chen, Y. S. Wang, D. Kirschen, and B. S. Zhang, "Model-Free Renewable
Scenario Generation Using Generative Adversarial Networks," IEEE Trans.
Power Syst., vol. 33, no. 3, pp. 3265-3275, May 2018, doi:
10.1109/tpwrs.2018.2794541.
[299] S. Zhao and F. You, "Distributionally Robust Chance Constrained Programming
with Generative Adversarial Networks (GANs)," AIChE J., vol. 66, no. 6, p.
e16963, 2020.
[300] S. Nowozin, B. Cseke, and R. Tomioka, "f-gan: Training generative neural
samplers using variational divergence minimization," in Advances in neural
information processing systems, 2016, pp. 271-279.
[301] X. Nguyen, M. J. Wainwright, and M. I. Jordan, "Estimating Divergence
Functionals and the Likelihood Ratio by Convex Risk Minimization," IEEE
Transactions on Information Theory, vol. 56, no. 11, pp. 5847-5861, 2010, doi:
10.1109/TIT.2010.2068870.
[302] I. J. Goodfellow et al., "Generative Adversarial Nets," in Advances in Neural
Information Processing Systems 27, vol. 27, Z. Ghahramani, M. Welling, C.
Cortes, N. D. Lawrence, and K. Q. Weinberger Eds., (Advances in Neural
Information Processing Systems, 2014.
[303] G. Schildbach, L. Fagiano, and M. Morari, "Randomized Solutions to Convex
Programs with Multiple Chance Constraints," SIAM J. Optim., vol. 23, no. 4, pp.
2479-2501, 2013, doi: 10.1137/120878719.
[304] C. Draxl, A. Clifton, B.-M. Hodge, and J. McCaa, "The Wind Integration
National Dataset (WIND) Toolkit," Applied Energy, vol. 151, pp. 355-366, 2015,
doi: https://doi.org/10.1016/j.apenergy.2015.03.121.
[305] G. Morales-España, "Unit commitment: computational performance, system
representation and wind uncertainty management," Comillas Pontifical
University, 2014.
[306] D. Q. Mayne, "Model predictive control: Recent developments and future
promise," Automatica, vol. 50, no. 12, pp. 2967-2986, 2014, doi:
https://doi.org/10.1016/j.automatica.2014.10.128.
[307] M. Morari and J. Lee, "Model predictive control: past, present and future,"
Comput. Chem. Eng., vol. 23, no. 4, pp. 667-682, 1999/05/01/ 1999, doi:
https://doi.org/10.1016/S0098-1354(98)00301-9.
[308] D. Q. Mayne, J. B. Rawlings, C. V. Rao, and P. O. M. Scokaert, "Constrained
model predictive control: Stability and optimality," Automatica, vol. 36, no. 6,
pp. 789-814, 2000, doi: https://doi.org/10.1016/S0005-1098(99)00214-9.
[309] S. J. Qin and T. A. Badgwell, "A survey of industrial model predictive control
technology," Control Engineering Practice, vol. 11, no. 7, pp. 733-764, 2003,
doi: https://doi.org/10.1016/S0967-0661(02)00186-7.
[310] J. B. Rawlings and D. Q. Mayne, Model predictive control: Theory and design.
Nob Hill Pub., 2009.
[311] A. Bemporad and M. Morari, "Robust model predictive control: A survey," in
Robustness in identification and control: Springer, 1999, pp. 207-226.

244
[312] D. Q. Mayne, M. M. Seron, and S. V. Raković, "Robust model predictive control
of constrained linear systems with bounded disturbances," Automatica, vol. 41,
no. 2, pp. 219-224, 2005/02/01/ 2005, doi:
https://doi.org/10.1016/j.automatica.2004.08.019.
[313] W. Langson, I. Chryssochoos, S. V. Raković, and D. Q. Mayne, "Robust model
predictive control using tubes," Automatica, vol. 40, no. 1, pp. 125-133,
2004/01/01/ 2004, doi: https://doi.org/10.1016/j.automatica.2003.08.009.
[314] L. Chisci, J. A. Rossiter, and G. Zappa, "Systems with persistent disturbances:
predictive control with restricted constraints," Automatica, vol. 37, no. 7, pp.
1019-1028, 2001, doi: https://doi.org/10.1016/S0005-1098(01)00051-6.
[315] A. Mesbah, "Stochastic Model Predictive Control: An Overview and
Perspectives for Future Research," IEEE Control Systems Magazine, vol. 36, no.
6, pp. 30-44, 2016, doi: 10.1109/MCS.2016.2602087.
[316] M. Farina, L. Giulioni, and R. Scattolini, "Stochastic linear Model Predictive
Control with chance constraints – A review," Journal of Process Control, vol.
44, pp. 53-67, 2016, doi: https://doi.org/10.1016/j.jprocont.2016.03.005.
[317] D. Mayne, "Robust and stochastic model predictive control: Are we going in the
right direction?," Annual Reviews in Control, vol. 41, pp. 184-192, 2016, doi:
https://doi.org/10.1016/j.arcontrol.2016.04.006.
[318] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allgöwer, "Constraint-Tightening
and Stability in Stochastic Model Predictive Control," IEEE Trans. Autom.
Control., vol. 62, no. 7, pp. 3165-3177, 2017, doi: 10.1109/TAC.2016.2625048.
[319] D. Muñoz-Carpintero, G. Hu, and C. J. Spanos, "Stochastic Model Predictive
Control with adaptive constraint tightening for non-conservative chance
constraints satisfaction," Automatica, vol. 96, pp. 32-39, 2018, doi:
https://doi.org/10.1016/j.automatica.2018.06.026.
[320] B. Kouvaritakis, M. Cannon, S. V. Raković, and Q. Cheng, "Explicit use of
probabilistic distributions in linear predictive control," Automatica, vol. 46, no.
10, pp. 1719-1724, 2010, doi: https://doi.org/10.1016/j.automatica.2010.06.034.
[321] M. Cannon, B. Kouvaritakis, S. V. Raković, and Q. Cheng, "Stochastic Tubes
in Model Predictive Control With Probabilistic Constraints," IEEE Trans.
Autom. Control., vol. 56, no. 1, pp. 194-200, 2011, doi:
10.1109/TAC.2010.2086553.
[322] L. Dai, Y. Xia, Y. Gao, B. Kouvaritakis, and M. Cannon, "Cooperative
distributed stochastic MPC for systems with state estimation and coupled
probabilistic constraints," Automatica, vol. 61, pp. 89-96, 2015, doi:
https://doi.org/10.1016/j.automatica.2015.07.025.
[323] M. Korda, R. Gondhalekar, F. Oldewurtel, and C. N. Jones, "Stochastic MPC
Framework for Controlling the Average Constraint Violation," IEEE Trans.
Autom. Control., vol. 59, no. 7, pp. 1706-1721, 2014, doi:
10.1109/TAC.2014.2310066.
[324] D. Chatterjee and J. Lygeros, "On Stability and Performance of Stochastic
Predictive Control Techniques," IEEE Trans. Autom. Control., vol. 60, no. 2, pp.
509-514, 2015, doi: 10.1109/TAC.2014.2335274.

245
[325] D. Chatterjee, P. Hokayem, and J. Lygeros, "Stochastic Receding Horizon
Control With Bounded Control Inputs: A Vector Space Approach," IEEE Trans.
Autom. Control., vol. 56, no. 11, pp. 2704-2710, 2011, doi:
10.1109/TAC.2011.2159422.
[326] G. Schildbach, L. Fagiano, C. Frei, and M. Morari, "The scenario approach for
Stochastic Model Predictive Control with bounds on closed-loop constraint
violations," Automatica, vol. 50, no. 12, pp. 3009-3018, 2014, doi:
https://doi.org/10.1016/j.automatica.2014.10.035.
[327] G. C. Calafiore and L. Fagiano, "Stochastic model predictive control of LPV
systems via scenario optimization," Automatica, vol. 49, no. 6, pp. 1861-1866,
2013, doi: https://doi.org/10.1016/j.automatica.2013.02.060.
[328] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allgöwer, "Stochastic MPC with
offline uncertainty sampling," Automatica, vol. 81, pp. 176-183, 2017, doi:
https://doi.org/10.1016/j.automatica.2017.03.031.
[329] M. Farina, L. Giulioni, L. Magni, and R. Scattolini, "An approach to output-
feedback MPC of stochastic linear discrete-time systems," Automatica, vol. 55,
pp. 140-149, 2015, doi: https://doi.org/10.1016/j.automatica.2015.02.039.
[330] M. Farina and R. Scattolini, "Model predictive control of linear systems with
multiplicative unbounded uncertainty and chance constraints," Automatica, vol.
70, pp. 258-265, 2016, doi: https://doi.org/10.1016/j.automatica.2016.04.008.
[331] J. A. Paulson and A. Mesbah, "An efficient method for stochastic optimal
control with joint chance constraints for nonlinear systems," International
Journal of Robust and Nonlinear Control, 2017.
[332] P. Sopasakis, D. Herceg, A. Bemporad, and P. Patrinos, "Risk-averse model
predictive control," Automatica, vol. 100, pp. 281-288, 2019, doi:
https://doi.org/10.1016/j.automatica.2018.11.022.
[333] S. Singh, Y. Chow, A. Majumdar, and M. Pavone, "A Framework for Time-
Consistent, Risk-Sensitive Model Predictive Control: Theory and Algorithms,"
IEEE Trans. Autom. Control., vol. 64, no. 7, pp. 2905-2912, 2019, doi:
10.1109/TAC.2018.2874704.
[334] I. Yang, "A dynamic game approach to distributionally robust safety
specifications for stochastic systems," Automatica, vol. 94, pp. 94-101, Aug
2018, doi: 10.1016/j.automatica.2018.04.022.
[335] A. Aswani, H. Gonzalez, S. S. Sastry, and C. Tomlin, "Provably safe and robust
learning-based model predictive control," Automatica, vol. 49, no. 5, pp. 1216-
1226, 2013, doi: https://doi.org/10.1016/j.automatica.2013.02.003.
[336] U. Rosolia and F. Borrelli, "Learning Model Predictive Control for Iterative
Tasks. A Data-Driven Control Framework," IEEE Trans. Autom. Control., vol.
63, no. 7, pp. 1883-1896, 2018, doi: 10.1109/TAC.2017.2753460.
[337] U. Rosolia, X. Zhang, and F. Borrelli, "Data-Driven Predictive Control for
Autonomous Systems," Annual Review of Control, Robotics, and Autonomous
Systems, vol. 1, no. 1, pp. 259-286, 2018, doi: 10.1146/annurev-control-060117-
105215.
[338] T. Koller, F. Berkenkamp, M. Turchetta, and A. Krause, "Learning-Based
Model Predictive Control for Safe Exploration," in 2018 IEEE Conference on

246
Decision and Control (CDC), 2018, pp. 6059-6066, doi:
10.1109/CDC.2018.8619572.
[339] D. Limon, J. Calliess, and J. M. Maciejowski, "Learning-based Nonlinear Model
Predictive Control," IFAC-PapersOnLine, vol. 50, no. 1, pp. 7769-7776, 2017,
doi: https://doi.org/10.1016/j.ifacol.2017.08.1050.
[340] E. Terzi, L. Fagiano, M. Farina, and R. Scattolini, "Learning-based predictive
control for linear systems: A unitary approach," Automatica, vol. 108, p. 108473,
2019, doi: https://doi.org/10.1016/j.automatica.2019.06.025.
[341] T. A. N. Heirung, B. E. Ydstie, and B. Foss, "Dual adaptive model predictive
control," Automatica, vol. 80, pp. 340-348, 2017, doi:
https://doi.org/10.1016/j.automatica.2017.01.030.
[342] N. M. Filatov and H. Unbehauen, "Survey of adaptive dual control methods,"
IEE Proceedings Control Theory and Applications, vol. 147, no. 1, pp. 118-128,
2000.
[343] L. Hewing, K. P. Wabersich, M. Menner, and M. N. Zeilinger, "Learning-Based
Model Predictive Control: Toward Safe Learning in Control," Annual Review of
Control, Robotics, and Autonomous Systems, vol. 3, no. 1, pp. 269-296, 2020,
doi: 10.1146/annurev-control-090419-075625.
[344] L. Hewing, J. Kabzan, and M. N. Zeilinger, "Cautious model predictive control
using Gaussian process regression," arXiv preprint arXiv:1705.10702, 2017.
[345] R. Soloperto, M. A. Müller, S. Trimpe, and F. Allgöwer, "Learning-Based
Robust Model Predictive Control with State-Dependent Uncertainty," IFAC-
PapersOnLine, vol. 51, no. 20, pp. 442-447, 2018, doi:
https://doi.org/10.1016/j.ifacol.2018.11.052.
[346] Z. Wu, D. Rincon, and P. D. Christofides, "Real-Time Adaptive Machine-
Learning-Based Predictive Control of Nonlinear Processes," Ind. Eng. Chem.
Res., 2019, doi: 10.1021/acs.iecr.9b03055.
[347] R. Gomes, M. Welling, and P. Perona, "Incremental learning of nonparametric
Bayesian mixture models," in 2008 IEEE Conference on Computer Vision and
Pattern Recognition, 2008, pp. 1-8, doi: 10.1109/CVPR.2008.4587370.
[348] J. A. Paulson, T. L. M. Santos, and A. Mesbah, "Mixed stochastic-deterministic
tube MPC for offset-free tracking in the presence of plant-model mismatch,"
Journal of Process Control, 2018, doi:
https://doi.org/10.1016/j.jprocont.2018.04.010.
[349] M. Korda, R. Gondhalekar, J. Cigler, and F. Oldewurtel, "Strongly feasible
stochastic model predictive control," in 2011 50th IEEE Conference on Decision
and Control and European Control Conference, 2011, pp. 1245-1251, doi:
10.1109/CDC.2011.6161250.
[350] S. Samuelson and I. Yang, "Safety-Aware Optimal Control of Stochastic
Systems Using Conditional Value-at-Risk," in 2018 Annual American Control
Conference (ACC), 2018, pp. 6285-6290, doi: 10.23919/ACC.2018.8430957.
[351] J. Sethuraman, "A CONSTRUCTIVE DEFINITION OF DIRICHLET
PRIORS," Statistica Sinica, vol. 4, no. 2, pp. 639-650, Jul 1994. [Online].
Available: <Go to ISI>://WOS:A1994PK25300016.

247
[352] T. Campbell and J. P. How, "Bayesian Nonparametric Set Construction for
Robust Optimization," in 2015 American Control Conference, (Proceedings of
the American Control Conference, 2015, pp. 4216-4221.
[353] D. M. Blei and M. I. Jordan, "Variational Inference for Dirichlet Process
Mixtures," Bayesian Anal., vol. 1, no. 1, pp. 121-143, 2006, doi: 10.1214/06-
ba104.
[354] K. Kurihara, M. Welling, and N. Vlassis, "Accelerated variational Dirichlet
process mixtures," in Advances in neural information processing systems, 2007,
pp. 761-768.
[355] F. D. Brunner, W. Heemels, and F. Allgöwer, "Robust event-triggered MPC
with guaranteed asymptotic bound and average sampling rate," IEEE
Transactions on Automatic Control, vol. 62, no. 11, pp. 5694-5709, 2017.
[356] M. Cannon and B. Kouvaritakis, "Model Predictive Control—Classical, Robust
and Stochastic," ed: Springer: New York, NY, USA, 2016.
[357] F. Blanchini, "Set invariance in control," Automatica, vol. 35, no. 11, pp. 1747-
1767, 1999/11/01/ 1999, doi: https://doi.org/10.1016/S0005-1098(99)00113-2.
[358] I. Kolmanovsky and E. G. Gilbert, "Theory and computation of disturbance
invariant sets for discrete-time linear systems," Mathematical problems in
engineering, vol. 4, no. 4, pp. 317-367, 1998.
[359] J. Lofberg, "YALMIP: A toolbox for modeling and optimization in MATLAB,"
in 2004 IEEE international conference on robotics and automation (IEEE Cat.
No. 04CH37508), 2004: IEEE, pp. 284-289.
[360] M. Herceg, M. Kvasnica, C. N. Jones, and M. Morari, "Multi-parametric
toolbox 3.0," in 2013 European control conference (ECC), 2013: IEEE, pp. 502-
510.
[361] S. V. Rakovic, E. C. Kerrigan, K. I. Kouramas, and D. Q. Mayne, "Invariant
approximations of the minimal robust positively invariant set," IEEE
Transactions on automatic control, vol. 50, no. 3, pp. 406-410, 2005.
[362] A. Shapiro and A. Kleywegt, "Minimax analysis of stochastic problems,"
Optimization Methods and Software, vol. 17, no. 3, pp. 523-542, 2002.
[363] G. A. Hanasusanto, D. Kuhn, S. W. Wallace, and S. Zymler, "Distributionally
robust multi-item newsvendor problems with multimodal demand
distributions," Math. Program., vol. 152, no. 1-2, pp. 1-32, 2015.
[364] X. J. Zhang, M. Kamgarpour, A. Georghiou, P. Goulart, and J. Lygeros, "Robust
optimal control with adjustable uncertainty sets," Automatica, vol. 75, pp. 249-
259, Jan 2017, doi: 10.1016/j.automatica.2016.09.016.
[365] I. R. Petersen and R. Tempo, "Robust control of uncertain systems: Classical
results and recent developments," Automatica, vol. 50, no. 5, pp. 1315-1335,
May 2014, doi: 10.1016/j.automatica.2014.02.042.
[366] C. Z. Wu, K. L. Teo, and S. Y. Wu, "Min-max optimal control of linear systems
with uncertainty and terminal state constraints," Automatica, vol. 49, no. 6, pp.
1809-1815, Jun 2013, doi: 10.1016/j.automatica.2013.02.052.
[367] M. E. Villanueva, R. Quirynen, M. Diehl, B. Chachuat, and B. Houska, "Robust
MPC via min-max differential inequalities," Automatica, vol. 77, pp. 311-321,
Mar 2017, doi: 10.1016/j.automatica.2016.11.022.

248
[368] D. Bertsimas and M. Sim, "The price of robustness," Oper. Res., vol. 52, no. 1,
pp. 35-53, Jan-Feb 2004, doi: 10.1287/opre.1030.0065.
[369] C. Ning and F. You, "Data-driven adaptive nested robust optimization: General
modeling framework and efficient computational algorithm for decision making
under uncertainty," AIChE J., vol. 63, no. 9, pp. 3790-3817, 2017a, doi:
10.1002/aic.15717.
[370] İ. Yanıkoğlu, B. L. Gorissen, and D. d. Hertog, "A Survey of Adjustable Robust
Optimization," Eur. J. Oper. Res., 2018/09/06/ 2018, doi:
10.1016/j.ejor.2018.08.031.
[371] D. Bertsimas and I. Dunning, "Multistage Robust Mixed-Integer Optimization
with Adaptive Partitions," Oper. Res., vol. 64, no. 4, pp. 980-998, 2016, doi:
10.1287/opre.2016.1515.
[372] G. C. Calafiore, "Multi-period portfolio optimization with linear control
policies," Automatica, vol. 44, no. 10, pp. 2463-2473, Oct 2008, doi:
10.1016/j.automatica.2008.02.007.
[373] H. Bannister, B. Goldys, S. Penev, and W. Wu, "Multiperiod mean-standard-
deviation time consistent portfolio selection," Automatica, vol. 73, pp. 15-26,
Nov 2016, doi: 10.1016/j.automatica.2016.06.021.
[374] G. C. Calafiore, "Direct data-driven portfolio optimization with guaranteed
shortfall probability," Automatica, vol. 49, no. 2, pp. 370-380, Feb 2013, doi:
10.1016/j.automatica.2012.11.012.
[375] F. Oldewurtel, R. Gondhalekar, C. N. Jones, M. Morari, and Ieee, "Blocking
Parameterizations for Improving the Computational Tractability of Affine
Disturbance Feedback MPC Problems," in Proceedings of the 48th IEEE
Conference on Decision and Control, 2009 Held Jointly with the 2009 28th
Chinese Control Conference, (IEEE Conference on Decision and Control, 2009,
pp. 7381-7386.
[376] P. J. Goulart, E. C. Kerrigan, and J. A. Maciejowski, "Optimization over state
feedback policies for robust control with constraints," Automatica, vol. 42, no.
4, pp. 523-533, Apr 2006, doi: 10.1016/j.automatica.2005.08.023.
[377] K. Postek and D. den Hertog, "Multistage Adjustable Robust Mixed-Integer
Optimization via Iterative Splitting of the Uncertainty Set," INFORMS J.
Comput., vol. 28, no. 3, pp. 553-574, Sum 2016, doi: 10.1287/ijoc.2016.0696.
[378] D. Bertsimas and F. de Ruiter, "Duality in Two-Stage Adaptive Linear
Optimization: Faster Computation and Stronger Bounds," INFORMS J. Comput.,
vol. 28, no. 3, pp. 500-511, Sum 2016, doi: 10.1287/ijoc.2016.0689.
[379] G. C. Calafiore, "An affine control method for optimal dynamic asset allocation
with transaction costs," SIAM Journal on Control and Optimization, vol. 48, no.
4, pp. 2254-2274, 2009, doi: 10.1137/080723776.
[380] A. Lorca, X. A. Sun, E. Litvinov, and T. Zheng, "Multistage adaptive robust
optimization for the unit commitment problem," Oper. Res., vol. 64, no. 1, pp.
32-51, 2016.
[381] D. Bertsimas and A. Georghiou, "Design of near optimal decision rules in
multistage adaptive mixed-integer optimization," Oper. Res., vol. 63, no. 3, pp.
610-627, 2015, doi: 10.1287/opre.2015.1365.

249
[382] D. Bertsimas and V. Goyal, "On the power and limitations of affine policies in
two-stage adaptive optimization," Math. Program., journal article vol. 134, no.
2, pp. 491-531, 2012, doi: 10.1007/s10107-011-0444-4.
[383] D. Bertsimas, D. A. Iancu, and P. A. Parrilo, "A Hierarchy of Near-Optimal
Policies for Multistage Adaptive Optimization," IEEE Trans. Autom. Control.,
vol. 56, no. 12, pp. 2803-2818, Dec 2011, doi: 10.1109/tac.2011.2162878.
[384] D. Bertsimas and C. Caramanis, "Finite Adaptability in Multistage Linear
Optimization," IEEE Trans. Autom. Control., vol. 55, no. 12, pp. 2751-2766,
2010, doi: 10.1109/TAC.2010.2049764.
[385] G. A. Hanasusanto, D. Kuhn, and W. Wiesemann, "K-Adaptability in Two-
Stage Robust Binary Programming," Oper. Res., vol. 63, no. 4, pp. 877-891,
2015, doi: 10.1287/opre.2015.1392.
[386] A. Ardestani-Jaafari and E. Delage, "Linearized robust counterparts of two-stage
robust optimization problem with applications in operations management,"
Manuscript, HEC Montreal, 2016.
[387] G. Xu and S. Burer, "A copositive approach for two-stage adjustable robust
optimization with uncertain right-hand sides," Comput. Optim. Appl., journal
article vol. 70, no. 1, pp. 33-59, December 27 2018, doi: 10.1007/s10589-017-
9974-x.
[388] A. Takeda, S. Taguchi, and R. H. Tütüncü, "Adjustable Robust Optimization
Models for a Nonlinear Two-Period System," J. Optim. Theory Appl., journal
article vol. 136, no. 2, pp. 275-295, 2008, doi: 10.1007/s10957-007-9288-8.
[389] A. Thiele, T. Terry, and M. Epelman, "Robust linear optimization with
recourse," Tech. Rep., pp. 4-37, 2009.
[390] A. Georghiou, A. Tsoukalas, and W. Wiesemann, "A Primal-Dual Lifting
Scheme for Two-Stage Robust Optimization," Optimization Online, 2017.
[391] M. Bodur and J. Luedtke, "Two-stage Linear Decision Rules for Multi-stage
Stochastic Programming," arXiv preprint arXiv:1701.04102, 2017.
[392] J. Zou, S. Ahmed, and X. A. Sun, "Stochastic dual dynamic integer
programming," Math. Program., journal article March 02 2018, doi:
10.1007/s10107-018-1249-5.
[393] C. Ning and F. You, " A Transformation-Proximal Bundle Algorithm for
Solving Multistage Adaptive Robust Optimization Problems," in 2018 IEEE
57th Conference on Decision and Control (CDC), Miami Beach, FL, USA 17-
19 Dec. 2018 2018, pp. 2439-2444.
[394] J.-B. Hiriart-Urruty and C. Lemaréchal, Convex analysis and minimization
algorithms I: Fundamentals. Springer science & business media, 2013.
[395] C. Lemarechal and C. Sagastizabal, "Practical aspects of the Moreau-Yosida
regularization: Theoretical preliminaries," SIAM J. Optim., vol. 7, no. 2, pp. 367-
385, May 1997, doi: 10.1137/s1052623494267127.
[396] X. Chen and Y. Zhang, "Uncertain Linear Programs: Extended Affinely
Adjustable Robust Counterparts," Oper. Res., vol. 57, no. 6, pp. 1469-1482,
Nov-Dec 2009, doi: 10.1287/opre.1080.0605.

250
[397] K. C. Kiwiel, "An Inexact Bundle Approach to Cutting-Stock Problems,"
INFORMS J. Comput., vol. 22, no. 1, pp. 131-143, Win 2010, doi:
10.1287/ijoc.1090.0326.
[398] W. van Ackooij, N. Lebbe, and J. Malick, "Regularized decomposition of large
scale block-structured robust optimization problems," Comput. Manag. Sci., vol.
14, no. 3, pp. 393-421, 2017, doi: 10.1007/s10287-017-0281-x.
[399] A. Ruszczynski and A. Swietanowski, "Accelerating the regularized
decomposition method for two stage stochastic linear problems," Eur. J. Oper.
Res., vol. 101, no. 2, pp. 328-342, Sep 1997, doi: 10.1016/s0377-
2217(96)00401-8.
[400] K. C. Kiwiel, "A Proximal Bundle Method with Approximate Subgradient
Linearizations," SIAM J. Optim., vol. 16, no. 4, pp. 1007-1023, 2006, doi:
10.1137/040603929.
[401] Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization. Princeton
University Press, 2009.
[402] L. A. Wolsey, Integer programming. Wiley, 1998.
[403] A. Belloni, "Lecture Notes for IAP 2005 Course Introduction to Bundle
Methods," Operation Research Center, MIT, Version of February, 2005.
[404] M. J. Hadjiyiannis, P. J. Goulart, and D. Kuhn, "A scenario approach for
estimating the suboptimality of linear decision rules in two-stage robust
optimization," in 2011 50th IEEE Conference on Decision and Control and
European Control Conference, 12-15 Dec. 2011 2011, pp. 7386-7391, doi:
10.1109/CDC.2011.6161342.
[405] A. Ardestani-Jaafari and E. Delage, "The Value of Flexibility in Robust
Location-Transportation Problems," Transp. Res., vol. 52, no. 1, pp. 189-209,
Jan-Feb 2018, doi: 10.1287/trsc.2016.0728.
[406] A. Ben-Tal, B. Golany, and S. Shtern, "Robust multi-echelon multi-period
inventory control," Eur. J. Oper. Res., vol. 199, no. 3, pp. 922-935, Dec 2009,
doi: 10.1016/j.ejor.2009.01.058.
[407] D. Bertsimas and A. Thiele, "A robust optimization approach to inventory
theory," Oper. Res., vol. 54, no. 1, pp. 150-168, Jan-Feb 2006, doi:
10.1287/opre.1050.0238.
[408] J. D. Schwartz, W. L. Wang, and D. E. Rivera, "Simulation-based optimization
of process control policies for inventory management in supply chains,"
Automatica, vol. 42, no. 8, pp. 1311-1320, Aug 2006, doi:
10.1016/j.automatica.2006.03.019.
[409] A. Georghiou, A. Tsoukalas, and W. Wiesemann, "Robust Dual Dynamic
Programming," Available on Optimization Online, 2016.
[410] C.-T. See and M. Sim, "Robust Approximation to Multiperiod Inventory
Management," Oper. Res., vol. 58, no. 3, pp. 583-594, 2010, doi:
10.1287/opre.1090.0746.
[411] F. Maggioni, M. Bertocchi, F. Dabbene, and R. Tempo, "Sampling methods for
multistage robust convex optimization problems," arXiv preprint
arXiv:1611.00980, 2016.

251
[412] F. You and I. E. Grossmann, "Stochastic inventory management for tactical
process planning under uncertainties: MINLP models and algorithms," AIChE
J., vol. 57, no. 5, pp. 1250-1277, 2011, doi: 10.1002/aic.12338.
[413] C. Ning and F. You, "A Data-Driven Multistage Adaptive Robust Optimization
Framework for Planning and Scheduling under Uncertainty," AIChE J., vol. 63,
no. 10, pp. 4343–4369, 2017, doi: 10.1002/aic.15792.
[414] D. A. Van Dyk and X.-L. Meng, "The art of data augmentation," Journal of
Computational and Graphical Statistics, vol. 10, no. 1, pp. 1-50, 2001.
[415] S. Hauberg, O. Freifeld, A. B. L. Larsen, J. Fisher, and L. Hansen, "Dreaming
more data: Class-dependent distributions over diffeomorphisms for learned data
augmentation," in Artificial Intelligence and Statistics, 2016, pp. 342-350.

252

You might also like