Neural Approximation For ChanceC

Sample-based Neural Approximation Approach for
Probabilistic Constrained Programs
Journal: IEEE Transactions on Neural Networks and Learning Systems
Manuscript ID TNNLS-2020-B-14128
Manuscript Type: Brief Paper
Date Submitted by the

29-May-2020
Author:
Complete List of Authors: Shen, Xun; The Graduate University for Advanced Studies, Statistical
Science
Ouyang, Tinghui; National Institute of Advanced Industrial Science and
Technology Tsukuba Center Tsukuba Central
Yang, Nan; China Three Gorges University
Zhuang, Jiancang; Institute of Statistical Mathematics
Probabilistic constrained optimization, Neural network approximation,

Keywords:
Sample average approximation, nonlinear optimization
Page 1 of 6
1
1
2
3 Sample-based Neural Approximation Approach for
4
5
6
Probabilistic Constrained Programs
7 Xun Shen, Member IEEE, Tinghui Ouyang, Nan Yang and Jiancang Zhuang
8
9
10
11
12 Abstract—This brief paper introduces a neural approximation- determined bound of probability. The bound of probability is
based method for solving continuous optimization problems with determined by the number of extracted samples. Afterwards,
13
probabilistic constraints. After reformulating the probabilistic the scenario approach has been improved to ensure more
14 constraints as the quantile function, a sample-based neural
15 network model is used to approximate the quantile function. tight confidence bounds of probability by a sampling-and-
16 The statistical guarantees of the neural approximation are dis- discarding approach[6]. In the sampling-and-discarding sce-
17 cussed by showing the convergence and feasibility analysis. Then, nario approach, a certain proportion of the extracted samples
by introducing the neural approximation, simulated annealing- are used to define the deterministic constraints while the rest
18 based algorithm is revised to solve the probabilistic constrained
19 ones are discarded. With a less sample number, the violation
programs. Interval Predictor Model (IPM) of wind power is
20 investigated to validate the proposed method. probabilities of the original probabilistic constraints is still pre-
21 served. However, the scenario approach has a fatal drawback.
Index Terms—Probabilistic constraints, nonlinear optimiza-
22 It gives a too conservative solution which converges to the
tion, sample average approximation, quantile function, neural
23 network model. totally robust solution of the uncertain constraints rather than
24 the original probabilistic constraints. Namely, as the sample
25 number becomes infinity, the solution will satisfy the uncertain
I. I NTRODUCTION
26 constraints with probability 1 rather than the given level of
Probabilistic Constrained Programs (PCPs) or Chance Con- probability.
27
strained Programs(CCPs) are mathematical programs involved On the other hand, sample average approach has been pro-
28
random variables in constraints which is required to be posed in [7] to satisfy the probabilistic constraints strictly. The
29
satisfied in a given probability level[1]. In this study, the sample-average program is formulated as an approximation of
30 appellation of probabilistic constrained programs is adopted.
31 the probabilistic constrained program by replacing the prob-
Probabilistic constrained programs have been used in various abilistic constraints with a measure to indicate the violation
32 fields, such as decision-making and filtering in uncertain
33 probability. Besides, an inner-outer approximate approach is
systems[2], motion planning with probabilistic constraints[3], proposed in [8] to get a tight sample average approximate
34 and energy storage modeling towards distribution systems in
35 program.
electricity market[4]. This paper extends the sample average approach by intro-
36 Probabilistic constrained programs have a great value for
37 ducing neural network-based constraints to replace probabilis-
practical applications. However, it is NP-hard to solve the tic constraints. After reformulating the probabilistic constraints
38 probabilistic constrained programs directly due to the exis-
39 as the quantile function, a sample-based neural network model
tence of the probabilistic constraints. Thus, a lot of research is used to approximate the quantile function. The statistical
40 has been conducted to develop approximation approach to
41 guarantees of the neural approximation is discussed by show-
solve the probabilistic constrained programs. For instance, ing the convergence and feasibility analysis. Then, by intro-
42 [5] proposed the scenario approach in which the proba-
43 ducing the neural approximation, simulated annealing-based
bilistic constraints are replaced by deterministic constraints algorithm is revised to solve the probabilistic constrained
44 imposed for finite sets of independently extracted samples
45 programs. Interval Predictor Model (IPM) of wind power is
of random variables. The scenario approach ensures that investigated to validate the proposed method.
46 the solution of the approximately formulated deterministic
47 This paper is organized as follows: section II gives the
program satisfies the original probabilistic constraints with a problem formulation; Section III demonstrates the proposed
48
49 Manuscript received XXX XX, 201X; revised XXX XX, 201X. This work neural approximation approach and the proposed algorithm is
50 was supported in part by The Graduate University for Advanced Studies, introduced in section IV; Section V presents how to use the
SOKENDAI. proposed method to establish IPM of wind power. Finally,
51 X. Shen is with the Department of Statistical Sciences, the Graduate
52 University for Advanced Studies, SOKENDAI, Tokyo, 106-8569, Japan, e- section VI concludes the whole paper.
mail: shenxun@ism.ac.jp.
53 T. Ouyang is with National Institute of Advanced Industrial Science and
54 II. P ROBLEM F ORMULATION
Technology Tokyo Bay Area Center, Koto-ku, Tokyo, 135-0064 Japan e-mail:
55 ouyang.tinghui@aist.go.jp Consider the following optimization problem with proba-
N. Yang is with the Department of Hubei Provincial Collaborative Inno-
56 vation Center for New Energy Microgrid, China Three Gorges University,
bilistic constraints
57 Yichang 443002, China, e-mail: ynyyayy@ctgu.edu.cn. min J(u)
58 J. Zhuang is with the The Institute of Statistical Mathematics, Tokyo 190- u∈U (1)
59 8562, Japan, e-mail: zhuangjc@ism.ac.jp. s.t. Pr{h(u, δ) ≤ 0} ≥ 1 − α, δ ∈ ∆, α ∈ (0, 1)
60
Page 2 of 6
2
1
2 where u represents the decision variable with a closed feasible While, the 1 − α level quantile of a random variable X is
3 domain U ⊂ Rnu , J : Rnu → R is the objective function, defined as[9]
4 δ represents an uncertain parameter vector with sample space
Q1−α (X) = inf{x ∈ R|Pr{X ≤ x} ≥ 1 − α}. (4)
5 ∆ ⊂ Rnδ , probability measure is well defined on the B∆ (the
6 sigma algebra of ∆) and the probability density of δ is denoted For the case nh = 1, h : Rnx × Rnδ → R is a real valued
7 as pδ (δ), h : Rnu × Rnδ → Rnh is a vector-valued function, function and h(u, δ) is a scalar random variable. Thus, the
8 and α is a risk parameter. The notation of the probabilistic cumulative probability function of h(u, δ) can be defined as
9 constraints is that the constraints represented by h(u, δ) ≤
10 0 should be satisfied with a probability larger than 1 − α. F (γ, u) = Pr{h(u, δ) ≤ γ}, γ ∈ R. (5)
11 Besides, this study focuses on problems in which J and h are
The quantile Q1−α (h(u, δ)) ≤ 0 is equivalent to F (0, u) ≤
12 continuous and differentiable ∀u ∈ U, ∀δ ∈ ∆. The uncertain
0} ≥ 1 − α. For the case nh > 1, Q1−α (h(u, δ) ≤ 0
13 variable h(u, δ) has a continuous cumulative density function
cannot be well defined since h(u, δ) is not a scalar ran-
14 (CDF) for all u ∈ U .
dom variable any more. Instead of using h(u, δ), h̄(u, δ) =
15 There are several difficulties to address problem described
maxj=1,...,nh hj (u, δ) is used. Then, the cumulative prob-
16 by (1):
ability function F (γ, u) notes the same style as Eq. (5)
17 • The structural properties of the feasible domain defined only replacing h(·, ·) by h̄(·, ·). Consequently, the quantile is
18 by h(u, δ) ≤ 0 may not succeed to the domain defined by written as Q1−α (h̄(u, δ) ≤ 0. Without losing the generality,
19 the constraints Pr{h(u, δ) ≤ 0} ≥ 1−α. For instance, the Q1−α (h̄(u, δ) ≤ 0 can be used for nh = 1 as well. Then, the
20 even if h are all linear in u, the probabilistic constraint following reformulation of problem (1) is written as
21 may not define a convex domain;
22 • Instead of knowing the distribution of δ, only the samples min J(u)
u∈U (6)
23 of δ is known; s.t. Q1−α (h̄(u, δ)) ≤ 0, δ ∈ ∆, α ∈ (0, 1).
24 • The tractable analytical function of probabilistic con-
25 straint does not exist even with the knowledge of the The advantage of using the quantile is that the quantile
26 distribution of δ. function has dramatically reduced flatness compared to the
27 This study is to find out a tractable analytical function H(u) probability function Pr{h(u, δ) ≤ 0}. The comparison exam-
28 to define a feasible domain Ũf = {u ∈ U|H(u) ≤ 0}. The ple of the quantile function and probability function is shown
29 feasible domain Ũf equals to the feasible domain Uf = {u ∈ in Fig. 1. By the quantile function-based reformulation, the
30 U|Pr{h(u, δ) ≤ 0} ≥ 1 − α} with probability as 1. Then, the feasible region is measured in the image of h̄(u, δ) instead of
31 problem with deterministic constraints the bounded image [0, 1] of the probability function.
32
min J(u)
33 u∈U (2) B. Neural Approximation of Quantile Function
34 s.t. H(u) ≤ 0 The sample set ∆N = {δ1 , ..., δN } is obtained by extracting
35 samples independently from ∆ according to the identical
is the equivalent problem of (1). By solving (2), the optimal
36 distribution pδ (δ). For a given u, the empirical CDF of
solution of (1) can be approximated.
37 HN = {h(u, δ1 ), ..., h(u, δN )} is written as
38
III. N EURAL A PPROXIMATION A PPROACH
39 1 ∑
N
40 A. Quantile Function-based Problem Reformulation F̃ N (t, u) = I(h̄(u, δi ) ≤ t) (7)

N i=1
41
42 where I(h(u, δi ) ≤ t) denotes the indicator function written
43 as
44 {
0, if h̄(u, δi ) > t
45 I(h̄(u, δi ) ≤ t) = (8)
1, if h̄(u, δi ) ≤ t.
46
47 The (1 − α)-empirical quantile at u can be obtained from a
48 value t such that F̃ N (t, u) ≈ 1 − α, which is defined as
49
1 ∑
N
50 Q̃1−α (h̄(u, δ)) = inf{y| I(h̄(u, δi ) ≤ y) ≥ 1−α}, (9)
51 N i=1
52 equivalently written as
53
54 Fig. 1. Comparison of 1 − α − Pr{h(u, δ) ≤ 0} and Q1−α (h(u, δ)) where Q̃1−α (h̄(u, δ))) = h̄⌈M ⌉ (u) (10)
55 h(u, δ) = 1.5u2 − 3 + δ and δ N (0, 1)(α = 0.1).
56 where M = ⌈(1 − α)N ⌉ and h̄⌈M ⌉ (u) denotes the M −th
57 The cumulative probability function of a random variable smallest observation of the values HN for a given u. Using
58 X is denote as the (1 − α)-empirical quantile as an approximation of the
59 F (x) = Pr{X ≤ x}. (3) probabilistic constraints has two drawbacks:
60
Page 3 of 6
3
1
2 •Q̃1−α (h̄(u, δ))) is usually not differentiable. M changes Q̃1−α (h̄(u, δ)) → Q1−α (h̄(u, δ)), the above two uniform con-
3 for different u. Thus, even if the constraint h(u, δ) is vergences imply that ĤS (u) → Q1−α (h̄(u, δ)) as N → ∞.
4 smooth for a fixed value of δ, Q̃1−α (h̄(u, δ))) does not Therefore, the constraints of problem (13) and problem (1) are
5 have to be smooth; equivalent as N → ∞.
1−α
6 • For small N , the uncertainty of Q̃ (h̄(u, δ))) is large Since Assumption 1 holds, F (γ, u) is continuous function
7 and the inward kinks of the feasible boundary will be very of u. Thus, Q1−α (h̄(u, δ)) is continuous function of u. Then,
8 non-smooth. Large N can bring some smoothness back, according to the universal approximation theorem[10], [11],
9 however, this also increases the computation burden. ∀ϵH < 0, ∀Q1−α (h̄(u, δ)), ∃S ∈ N+ , β, b ∈ RS , a ∈ RS×k ,
10 Thus, it is necessary to look for the smooth approximation of and ∃g : Rk × RS × RS×k → R, such that
11 (1 − α)-empirical quantile.
∥ĤS (u) − Q1−α (h̄(u, δ))∥ < ϵH , (14)
12 Given the sample set ∆N , the (1 − α)-empirical quan-
13 tile Q̃1−α (h̄(u, δ))) is essentially a function of u. Although for all u ∈ U .
14 Q̃1−α (h̄(u, δ))) is not certain to be a differentiable function, Due to the definition of Q̂1−α (h̄(u, δ)), Q̂1−α (h̄(u, δ)) is
15 a smooth function can be used to approximate the (1 − α)- a piecewise continuous function of u ∈ U ⊂ Rnu with
16 empirical quantile Q̃1−α (h̄(u, δ))). In this study, single layer discontinuous points cj , j = 1, ..., nc . Therefore, according
17 neural network model is used to approximate the (1 − α)- to the results in [12], given any ϵH , there exists ĤS (u) with
18 empirical quantile Q̃1−α (h̄(u, δ))). The neural approximation the form
19 of Q̃1−α (h̄(u, δ))) with S hidden nodes and activation func-
∑c
S−n ∑
nc
20 tion g(·) is defined as ĤS (u) = βi g(u, ai , bi ) + βS−nc +j ϕj (u − cj ), (15)
21 ∑
S i=1 j=1
22 ĤS (u) = βi g(u, ai , bi ), (11)
23 where ϕj (·) is the sigmoid jump approximation basis function,
i=1
24 such that
where βi denotes the weight vector connecting the i-th hidden ∥ĤS (u) − Q̃1−α (h̄(u, δ))∥ < ϵH ,
25 node and the output nodes, ai = [ai,1 , ..., ai,k ] represents the
(16)
26 weight vector towards u, and bi is the scalar threshold of the for all u ∈ U .
27 i-th hidden node. The purpose is to achieve Since I(ĥ(u, δ) ≤ t) is measurable, ∥I(ĥ(u, δ) ≤ t)∥ ≤ 1
28
for all u ∈ U, and the sample is identically and independently
29 ĤS (u) ≈ Q̃1−α (h̄(u, δ))). (12)
distributed. As a result of applying Artstein-Vitale Theorem
30
Then, ĤS (·) maps the decision variable u ∈ Rk into the image (Theorem 7.53 in [13]), F̃ N (0, u) converges to F (0, u) as N
31
of h̄(u, δ) which is R. In order to find solutions for (1) and goes to ∞.
32
(6), it is equivalent to solve the following problem For the finite sample feasibility analysis of the approxi-
33
34 min J(u) mation problem solutions, we will make use of Hoeffiding’s
u∈U inequality[14]:
35 (13)
36 s.t. ĤS (u) ≤ 0. Theorem 2. Denote Z1 , ...., ZN for independent random
37 The above formulation is a sample-based neural approximation variables with bounded sample spaces, namely Pr{Zi ∈
38 of the original probabilistic constrained problem which is [zi,min , zi,max ]}, ∀i ∈ {1, ..., N }. Then, if s > 0,
39 obtained by a two-layer approximation: sample approximation
40 ∑
N
− ∑N 2N 2 s2
and neural approximation. The convergence and feasibility of Pr{ (Zi − E{Zi }) ≥ sN } ≤ e i=1 (zi,max −zi,min ) . (17)
2
41 the two-layer approximation should be analyzed. i=1

42
43 Based on Hoeffding’s inequality, probabilistic feasibility
C. Convergence and Feasibility Analysis guarantee of the sample approximation method is proved in
44
45 We make the following assumption on h̄(u, δ). [7] and summarized here as:
46 Assumption 1. For every u ∈ U, the random variable h̄(u, δ) Theorem 3. Let u ∈ U be such that u ∈
/ Uf . Then,
47 is measurable and has a continuous distribution. Besides, for
Pr{F̃ N (−γ, u) ≥ 1 − β} ≤ e−2N τu
2
48 every δ ∈ ∆, h̄(u, δ) is a continuous function of u. (18)

49 The asymptotic convergence of ĤS (u) ≤ 0 is summarized where γ > 0, β ∈ [0, α], and τu > 0 is written as
50 in the following theorem.
51 τu = F (0, u) − F (−γ, u) + (α − β). (19)
Theorem 1. Suppose Assumption 1 holds, as N → ∞, there
52 exist S, a, b, δ to assure that Problem (13) has the same feasible
53 Theorem 3 shows an important property of approximating
domain as Problem (1) w.p. 1. the cumulative probability function by samples. Set γ ≈ 0 and
54
55 Proof. We prove the equivalence of problem (13) and prob- β ≈ α, if the sample number goes to ∞, Pr{F̃ N (−γ, u) ≥
56 lem (1) by first showing uniform convergence of ĤS (u) → 1 − β} goes to 0. Namely, the sample-based neural approx-
57 Q1−α (h̄(u, δ)) and ĤS (u) → Q̃1−α (h̄(u, δ)) for all u ∈ U imation problem has the solution which satisfies the original
58 and then uniform convergence of F̃ N (0, u) → F (0, u) for probabilistic constraint in a higher probability with more
59 all u ∈ U as N → ∞. Since F̃ N (0, u) → F (0, u) means sample numbers.
60
Page 4 of 6
4
1
2 IV. P ROPOSED A LGORITHMS FOR PCP S c) Move to the new point by setting um = u∗ if a
3 Two algorithms are used to solve probabilistic constrained
random variable µ distributed uniformly over (0,1),
4 programs. First, sample-based algorithm is summarized to
satisfies
− ▽Em
5 train the deterministic constraints which has a neural network µ ≤ e Tm−1 (23)
6 form. Then, a simulated annealing algorithm is summarized or equivalently
7 to solve the deterministic program obtained by the neural
8 approximation approach. ▽Em ≤ −Tm−1 log µ; (24)
9
10 d) Update the temperature as
{
11 A. Algorithm for Training ĤS (u) Tm−1 , if (23) holds
12 Tm = (25)
Due to the discussion in section III, the algorithm for ρ · Tm−1 , if (23) doesn’t hold;
13 training ĤS (u) is summarized as followings:
14 e) Terminate the algorithm if Tm < Tmin where Tmin
1. Generate the sample set of uncertain parameter vector is the lower boundary for the temperature or m = M .
15
∆N = {δ1 , ..., δN } by extracting samples independently
16
from sample space ∆ according to the identical distribu-
17
tion pδ (δ);
18
2. Generate the sample set of decision variable U K =
19
{u1 , ..., uK } by extracting samples independently from
20
feasible domain U according to uniform distribution;
21
3. For all uk ∈ U K , calculate the (1 − α)-
22
empirical quantile at uk , Q̃1−α (h̄(uk , δ)),
23
using Eq. (9) and obtain the output sequence
24 Y K = {Q̃1−α (h̄(u1 , δ)), ..., Q̃1−α (h̄(uK , δ))};
25 4. Train β, a, b, c in Eq. (15) based on U K and Y K .
26
27
28 B. Simulated Annealing Algorithm for Approximate Program
29 Simulated annealing algorithm has been widely used to ad-
30 dress global optimizations with constraints in many fields[15],
Fig. 2. Power curve of an industrial wind turbine.
31 [16], [17], [18], [19] and also been proved to be practical
32 for non-convex optimization[20], [21]. After using neural
33 approximation approach, we obtain the original problem with
V. A PPLICATION TO I NTERVAL P REDICTOR M ODEL OF
34 deterministic constraints. The deterministic constraints has the
W IND P OWER
35 neural-network formulation. The original edition of simulated A. Wind-Turbine Power Curve and Dataset
36 annealing algorithm is not presented here which can be found Predictor model of wind power is used to predict the wind
37 in [16], [20]. After minor revision on the original algorithm, power based on wind speed measurement[22]. Fig. 2 illustrates
38 it can be used to solve the deterministic constrained problem. the power curve constructed from the industrial data. The data
39 The algorithm is summarized as comes from a large wind farm located in Jiugongshan, Hubei,
40
1. Initialize a temperature T0 , a decision value u∗ and China. The data set was collected at turbines at a sampling
41
calculate the cost value as interval of 10 min. In total 56,618 data points were collected
42
from March 17, 2009, to April 17, 2009. The unit of the active
43 E ∗ = J(u∗ ) + C(ĤS (u∗ )), (20) wind power is kW, and the value of power is normalized for
44
air density of 1.18 kg/m3 .
45 where
{ Piecewise models are often used to improve the prediction
46
0, if ĤS (u∗ ) < 0 accuracy[23], [24]. The wind speed range is discretized into
47 C(u∗ ) = (21)
C̄, if ĤS (u∗ ) ≤ 0. intervals, and the corresponding wind speed and power data
48
make the partitions. Supposing the cut-out speed is vc o, the
49 Here, C̄ should be chosen as a very larger value; speed range [0, vc o] is divided into Nw equal length intervals.
50 2. For iteration m = 1, 2, ..., M , do the following step The data points in each interval are defined as
51 iteratively
52 Di = {(v, p(v)) ∈ R2 |v ∈ [vi , vi+1 )}, i = 1, ..., Nw (26)
a) Randomly select another point um ∈ Vr (u∗ ) =
53
{um ∈ U |∥um − u∗ ∥ ≤ r}. Vr (u∗ ) a neighbourhood where Di represents the points set of the i−th partion, v
54
of the previous point and r is the radius. Calculate represents wind speed and p(v) is the corresponding wind
55
56
the corresponding cost value Em similarly as Eq. power, and vi = i−1Nw vco is the demarcation speed between
57
(21); the i−th and (i − 1)-th partition. In this study, vco is chosen
b) Calculate as 20, and Nw is 10, as shown in Fig. 2. Besides, We do not
58
59 ▽Em = Em − E ∗ , (22) establish models for all partitions in this study. As an example
60
Page 5 of 6
5
1
2 to demonstrate the proposed method. The data in partition 4
3 and 5 is used as an unity, which are marked red circles in Fig.
4 2. Notice that some under-power points or stopping points
5 are wiped out by the data pre-process method introduced in
6 Section 3 of [22]. For convenience, denote the used dataset as
7 D.
8
9 B. Interval Predictor Model of Wind Power
10
Differently from standard predictor models, interval pre-
11 dictor models return a prediction interval instead of a single
12 prediction value [25]. As defined in [25], an interval predictor
13 model is a set-valued map
14
15 I : φ → I(φ) ⊂ Y (27)
16
where φ ∈ Rnφ is a regression vector containing input
17
variables u ∈ Rnu or decision variables on which the system
18
output y depends, and I(φ) is the prediction interval. Given
19
an observed φ, I(φ) is an informative interval, containing y
20
with a given guaranteed probability 1 − α, α ∈ [0, 1). Output
21
intervals are obtained by considering the span of parametric
22
families of functions. Here, we consider the family of linear
23
regression functions defined by
24
25 M = {y = θT φ + e, θ ∈ Θ ⊂ Rnθ , ∥e∥ ≤ ε ∈ R}. (28)
26
27 Then, a parametric IPM is obtained by associating to each φ Fig. 3. Estimation results of quantile function by using neural networks.
28 the set of all possible outputs given by θT φ + e as θ varies
29 over Θ and e varies over E = {e ∈ R|∥e∥ ≤ ε}, is defined as
30 I(φ) = {y : y = θT φ + e, ∀θ ∈ Θ, ∀e ∈ E}. (29)
31
32 A possible choice for the set Θ is a ball with center c and
33 radius r:
34
Θ = Bc,r = {θ ∈ Rnθ : ∥θ − c∥ ≤ r}. (30)
35
36 Then, the interval output of the IPM can be explicitly com-
37 puted as
38
39 I(φ) = [cT φ − (r∥φ∥ + ε), cT φ + (r∥φ∥ + ε)]. (31)
40 For the wind power prediction problem, the input is wind
41 speed and output is wind power as u = v and y = p. Besides, Fig. 4. Results of interval predictions by scenario approach using different
42 φ is chosen as φ = [u u2 ]T . Then, φ = [v v 2 ]T in the wind choices of sample number.
43 power prediction problem. Identifying the IPM is essentially
44 identifying (c, r, ε). Without loss of the generality, we use
45 I(c,r,ε) (v) for I(φ) in the later part for IPM identification
46 problem of wind power prediction.
47 Due to the above discussions, the IPM identification prob-
48 lem of wind power prediction is to find IPM I(c,r,ε) (v) (or
49 find (c, r, ε) )such that
50
51 min wr + ε (32)
(c,r,ε)
52
53 subject to
54 r, ε ≥ 0,
55 (33)
Pr{p ∈ Ic,r,ε (v)} ≥ 1 − α, α ∈ [0, 1).
56
57 Notice that w is a weight and often chosen as w = Fig. 5. Results of interval predictions by neural approximation approach using
58 E{∥φ∥}[25]. In this study, w is chosen as 20. different choices of sample number. The limitation of violation probability is
Scenario approach α = 0.05.
59
60
Page 6 of 6
6
1
2 TABLE I [2] M. C. Campi, and S. Garatti, Introduction to the scenario approach,
T HE PROBABILITY THAT CONSTRAINT FAILURE PROBABILITY α > 0.05: MOS-SIAM Series on Optimization, Philadelphia, 2019.
3 P R{α > 0.05} [3] M. Guo, and M. M. Zavlanos, ”Probabilistic motion planning under
4 temporal tasks and soft constraints”, in IEEE Transactions on Automatic
5 N 150 375 750 1500 3750 7500 Control, vol. 63, no. 12, pp. 4051-4066, 2018.
[4] P. Gautam, R. Karki, and P. Piya, ”Probabilistic modeling of energy stor-
6 SA 0.224 0.004 0 0 0 0
age to quantify market constrained reliability value to active distribution
Proposed 0.156 0.07 0.006 0 0 0
7 systems”, in IEEE Transactions on Sustainable Energy, vol. 11, no. 2,
8 pp. 1043-1053, 2020.
[5] G. Calariore, and M. C. Campi, ”The scenario approach to robust control
9 TABLE II
design”, in IEEE Transactions on Automatic Control, vol. 51, no. 5, pp.
10 T HE MEAN VALUE OF COST FUNCTION
742-753, 2006.
11 N 150 375 750 1500 3750 7500
[6] M. C. Campi, and S. Garatti, ”A sampling-and-discarding approach to
chance-constrained optimization: feasibility and optimality”, in Journal
12 SA 61.56 70.47 74.72 77.58 79.95 80.93
of Optimization Theory and Applications, vol. 148, no. 2, pp. 257-280,
13 Proposed 50.25 50.45 50.69 50.97 51.33 51.72 2011.
14 [7] J. Luedtke, and S. Ahmed, ”A sample approximation approach for
optimization with probabilistic constraints”, in SIAM Journal on Opti-
15 mization, vol. 19, no. 2, pp. 674-699, 2008.
16 C. Results and Discussions [8] A. Geletu, A. Hoffmann, M. Kloppel, and P. Li, ”An inner-outer approx-
17 Fig. 3 shows one example of the estimation results of imation approach to chance constrained optimization”, in SIAM Journal
on Optimization, vol. 27, no. 3, pp. 1834-1857, 2017.
18 quantile function by using neural networks. 5000 different [9] G. Steinbrecher, and W. T. Shaw, ”Quantile mechanics”, in European
19 combinations of (c, r, ε) are considered. The data set was Journal of Applied Mathematics, vol. 19, no. 2, pp. 87-112, 2008.
20 divided into a train set and a test set. The red line shows the [10] G. Cybenko, ”Approximation by superpositions of a sigmoidal func-
tion”, in Mathematics of Control, Signals and Systems, vol. 2, pp. 303-
21 quantile function value given by the trained model. The blue 314, Dec. 1989.
22 line shows the empirical quantile function value calculated [11] M. Hassoun, Fundamentals of Artificial Neural Networks, The MIT
23 by using the data of the test set. The mean value of error is Press, 1995.
[12] R. Selmic, and F. L. Lewis, ”Neural network approximation of piecewise
24 -0.0421 and the mean of the abstract value of mean is 12.5882. continuous functions: Application to friction compensation”, in IEEE
25 We compare the proposed method with scenario approach Transactions on Neural Networks, vol. 13, no. 3, pp. 745-751, 2002.
26 which is proposed in [5]. The required α is set as 0.05. [13] A. Shapiro, D. Dentcheva, and A. Ruszczynski, Lectures on Stochastic
Programming: Modeling and Theory 2nd ed., SIAM, 2014.
27 Fig. 4 and Fig. 5 shows the results of interval predictions [14] W. Hoeffding, ”Probability inequalities for sums of bounded random
28 by scenario approach and the proposed method, respectively. variables”, in Journal of the American Statistical Association, vol. 58,
29 The number of used data samples is 150, 750 or 7500. As no. 301, pp. 13-30, 1963.
[15] S. Kirkpatrick, Jr. C. D. Gelatt, and M. Vecchi, ”Optimization bt
30 the number of samples increases, the scenario approach gives simulated annealing”, in Science, vol. 220, no. 4598, pp. 671-680, 1983.
31 more conservative results while the proposed method performs [16] S. P. Brooks, and B. J. T. Morgan, ”Optimization using simulated
32 with better robust on sample numbers. annealing”, in Journal of the Royal Statistical Society. Series D(The
Statistician), vol. 44, no. 2, pp. 241-257, 1995.
33 A more comprehensive validation was conducted through [17] A. S. Azad, M. Islam, and S. Chakraborty, ”A heuristic initialized
34 the posterior Monte Carlo analysis. In the validation, the stochastic memetic algorithm for MDPVRP with interdependent depot
35 number of data sample was increased from 150 to 7500. operations”, in IEEE Transactions on Cybernetics, vol. 47, no. 12, pp.
4302-4315, 2017.
36 For each number of data samples, 500 times sampling and [18] A. Albanna, and H. YousefiZadeh, ”Congestion minimization of LTE
37 calculation of c, r, ε processes were done. For instance, if we networks: a deep learning approach”, in IEEE/ACM Transactions on
38 set the number of data samples as 150. Then, 500 times of Networking, vol. 28, no. 1, pp. 347-359, 2020.
[19] A. Dekkers, and E. Aarts, ”Global optimization and simulated anneal-
39 extracting 150 samples of (v, p) from the training set were ing”, in Mathematical Programming, vol. 50, pp. 367-393, 1991.
40 done. For each extracted 150 samples, the corresponding c, r, ε [20] H. Szu, and R. Hartley, ”Fast simulated annealing”, in Physics Letters
41 was calculated by scenario approach or proposed method. The A., vol. 122, no. 3,4 pp. 157-162, 1987.
[21] L. Ingber, ”Simulated annealing: practice versus theory”, in Mathl.
42 statistical analysis results are summarized in Tab. 1 and Tab. 2. Comput. Modelling, vol. 18, no. 11, pp. 29-57, 1993.
43 Apparently, the proposed method can achieve a better trade-off [22] T. Ouyang, A. Kusiak, and Y. He, ”Modeling wind-turbine power curve:
a data partitioning and mining approach”, in Renewable Energy, vol.
44 between probability constraint and cost value.
102, Part A, pp. 1-8, 2017.
45 [23] M. Lydia, S. S. Kumar, and G. E. P. Kumar, ”Advanced algorithms for
46 VI. C ONCLUSION wind turbine power curve modeling”, in IEEE Trans. Sustain. Energy,
vol. 4, no. 3, pp. 827-835, 2013.
47 In this brief, a neural approximation-based method for
[24] M. G. Khalfallah, and A. M. Koliub, ”Suggestions for improving wind
48 solving probabilistic constrained problem has been proposed. turbines power curves”, in Desalination, vol. 209, no. 1, pp. 221-229,
49 The statistical guarantees of the proposed method are dis- 2007.
[25] M. C. Campi, G. Calafiore, and S. Garatti, ”Interval predictor models:
50 cussed. Furthermore, interval predictor model of wind power
identification and reliability”, in Automatica, vol. 45, no. 2, pp. 382-392,
51 is investigated with experimental data to validate the proposed 2009.
52 method. In the validation, the proposed method is compared
53 with scenario approach. The results show that the proposed
54 method performs better in the trade-off between satisfying the
55 probabilistic constraints and minimizing the cost.
56
57 R EFERENCES
58 [1] A. Charnes, and W. W. Cooper, ”Chance constrained programming”, in
Management Science, vol. 6, no. 1, pp. 73-79, 1959.
59
60

Neural Approximation For ChanceC

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Neural Approximation For ChanceC

Uploaded by

Copyright:

Available Formats

Sample-based Neural Approximation Approach for

Probabilistic Constrained Programs

Journal: IEEE Transactions on Neural Networks and Learning Systems

Manuscript Type: Brief Paper

Date Submitted by the

Probabilistic constrained optimization, Neural network approximation,

40 A. Quantile Function-based Problem Reformulation F̃ N (t, u) = I(h̄(u, δi ) ≤ t) (7)

41 the two-layer approximation should be analyzed. i=1

48 every δ ∈ ∆, h̄(u, δ) is a continuous function of u. (18)

You might also like