Professional Documents
Culture Documents
2.NCC-SFC-LMT-KKT 2
2.NCC-SFC-LMT-KKT 2
2.NCC-SFC-LMT-KKT 2
&
SUFFICIENT
CONDITIONS
FOR OPTIMALITY
Inequality Equality
Objectives Variables
Category Constraints Constraints
(M) (n)
(J) (K)
Type-I 1 0 0 1
⎬ Taylor Series
Type-II 1 0 0 ≥2
Lagrange
Type-III 1 0 1 ≥2
Multiplier T.
2
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-I M=1 J=0 K=0 n=1
Taylor’s Series helps represent a function as an infinite sum of terms that are calculated from
the function value and the derivatives of the function at a single point
h x h2 h3 hp p
x0 f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
h2 h3 hp p
f(x) − f(x0) = hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
h2 h3 hp p
hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1 ≥ 0
2! 3! 3!
First-order
hf′(x0) ≥ 0
Approximation
Suppose at x0: f′(x0) > 0 a negative h would violate the above condition
Second-order h2
Approximation f′′(x0) ≥ 0
2!
Suppose at x0: f′′(x0) > 0 • a negative/positive h would not impact the above condition
∙ x0 is guaranteed to be a local minimum
Taylor’s Series helps represent a function as an infinite sum of terms that are calculated from
the function value and the derivatives of the function at a single point
h x h2 h3 hp p
x0 f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
h2 h3 hp p
f(x) − f(x0) = hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
A point x0 is a local minimum iff the first non-zero element in the sequence
f k(x) is positive, and occures at even positive k




















6
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2
h x h2 h3 hp p
n=1 x f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
n≥2 Unlike the case of n=1 where h was a scalar; in the case of n ≥ 2, h is a vector
ℎ"
representing the component-wise difference between the known point and the h=
ℎ#
unknown point
∂f ∂f 1 2 ∂2f ∂2f 2 ∂ 2
f
f(X ) = f(x, y) = f(x0 + hx, y0 + hy) = f(x0, y0) + [hx + hy ]x0,y0 + [hx 2 + 2hxhy + hy 2 ]x0,y0 + …
∂x ∂y 2 ∂ x ∂x∂y ∂y
1 2
f(X ) = f(x, y) = f(x0 + hx, y0 + hy) = f(x0, y0) + [hx fx + hy fy]x0,y0 + [hx fxx + 2hxhy fxy + hy2 fyy]x0,y0 + …
2
Again, h is a vector representing the component-wise difference ℎ1 &1 − &01
n ℎ2 &2 − &02
between the known point and the unknown point. Indices x, y can’t h= =
: :
serve more than 2 dimensions. Hence, indices 1,2,….n are used. ℎ% &% − &0%
n n n 2
∂f 1 ∂ f
f(X ) = f(x1, x2, …, xn) = f(x10 + h1, x20 + h2, …, xn0 + hn) = f (x10, x20, …, xn0) +
∑ ∂xj j 2 ∑ ∑ ∂xj∂xk j k
h+ hh +…
j=1 j=1 k=1
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h + …
2
∇f T (X0) :Gradient of the H :Hessian of the
function, evaluated at function, evaluated at
the known point X0 the known point X0






7
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2
n n n 2
0 0 0 0 0 0 ∂f 1 ∂ f
f(X ) = f(x1, x2, …, xn) = f(x1 + h1, x2 + h2, …, xn + hn) = f (x1 , x2 , …, xn ) + ∑ hj + ∑ ∑ hj hk + …
j=1
∂xj 2 j=1 k=1 ∂xj∂xk
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h + …
2
∇f T (X0):Gradient of the function, H :Hessian of the function,
evaluated at the known point X0 evaluated at the known point X0
!" !"#
fi = !#$
f= H= Hij=
!$%!$&
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h
2
8
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2
Next steps: • With reference to an example assess how good is this 2nd order approximation
• assess how good is this 2nd order approximation
{ • learn to compute Gradient & Hessian
Obtain a second-order Taylor’s expansion for the function f(X ) = 3x13x2 at X0 = (1,1)
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h
2
9"21 "2 18#1#2 9#21 !1 − 1 9 18 9
f(X0) = 3 f= H= h= f(X0)= H(X0) =
3"31 9#21 0 !2 − 1 3 9 0
!1 − 1 18 9 !1 − 1
f(X) 3 + [9 3] + 0.5 [x1-1 x2-1]
!2 − 1 9 0 !2 − 1
Quality of Approximation
For a 30% change in the given point, the Taylor’s series approximation underestimates the original function by 4%
9
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the Gradient
hx = h cosθx Direction cosines of the vector joining X0(x0, y0) and X(x, y) hx = h ux
h hy = h cosθy Call these ux and uy respectively h y = h uy
x = x0 + hx = x0 + h ux y = y0 + hy = y0 + h uy
dϕ ∂ϕ d x ∂ϕ dy ∂ϕ dz
The variation of a scalar function ϕ over a distance h, is given by = ( )+ ( )+ ( )
dh ∂x dh ∂y dh ∂z dh
dϕ
=[
∂ϕ ∂ϕ ∂ϕ
]
x = x0 + h ux dϕ ∂ϕ ∂ϕ ∂ϕ !" dϕ
dh ∂x ∂y ∂z =[ ] !# = ∇ϕ ∙ u
dx dh ∂x ∂y ∂z ! dh
= ux $
dh
dϕ
= ∇ϕ ∙ u
dh
!"
!#
!$
⎬
⎬
10
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the Gradient
dϕ !"
= ∇ϕ ∙ u where ∇ϕ = u ≡ !#
dh !$
dϕ
= ∇ϕ ∙ u = | ∇ϕ | | u | cosθ∇ϕ/u
dh
dϕ
| = | ∇ϕ | ……at θ∇ϕ/u = 0
dh max
dϕ
| = − | ∇ϕ | ……at θ∇ϕ/u = 180
dh min
Rate of change of a scalar function over a distance h is maximum
if one moves along the direction the function’s gradient
Now the natural question is: What is the direction the function’s gradient?
11
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the Gradient
dϕ dϕ
= ∇ϕ ∙ u = | ∇ϕ | | u | cosθ∇ϕ/u | = | ∇ϕ | ……at θ∇ϕ/u = 0 or 180
dh dh max
Rate of change of a scalar function over a distance h is maximum
if one moves along the direction the function’s gradient
Now the natural question is: What is the direction the function’s gradient?
!"
=0
!ℎ
∇ϕ
=c !" '" '" '" !(
on
sta
nt !#
= [ '( ') '* !+
=∇1 . r’(t)
,-
{x(t), y(t), z(t)} ,.
t)
,/
r′(t)
r(
,.
Rate of change of a scalar function over a distance h is maximum if one moves along the
direction the function’s gradient, that is, along the normal at that point

12
𝛷
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the HESSIAN
T 1 T !11 !12 h1
f(X ) = f(X0) + ∇f (X0)h + h H h hT H h = [h1 h2] = !11 h12 + 2!22 h1h2+ !22 h22
2 !21 !22 h2
! # $ x1
f(x1, x2, x3) = 2x12 + 2x1x2 + 4x1x3 − 6x22 − 4x2 x3 + 5x32 =[x1 x2 x3] % −' ( x2
) * + x3
! # !
T H T
H= H= # −%# −! Q = X SX = X X
! −! %& 2
13
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the HESSIAN
h2 Realize that Necessary and Sufficient Conditions
f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0)
2! were defined based on the sign of f ′′(x0)
Similarly the sign of the scalar function h T H h should be known Theory of Quadratic functions helps
Characterization through
Qf(X) = X T SX is Definition
Eigenvalues ( λ )
h2 T 1 T
f(x) = f(x0) + hf′(x0) + f′′(x0) f(X ) = f(X0) + ∇f (X0)h + h H h
2! 2
′′ H is positive semi-definite
Second order NCC f (x0) ≥ 0
(λ ≥ 0)
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0
Balancing act: Move away from U but only the minimum you need to
become feasible - P is that point
Method-II: Algebraic
16
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0
df
First-order Necessary Condition = 0, leading to x1 = 1; x2 = 1
d x1
d 2f
Second-order Necessary Condition = 4 ≥ 0, hence satisfied
d x12
17
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation & Application of Necessary & Sufficient Conditions
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0
df ∂f ∂f dx2 x2 = ϕ(x1) ∂f ∂f dϕ
= 0 implying + =0 + = 0… Eqn(1)
dx1 ∂x1 ∂x2 dx1 ∂x1 ∂x2 dx1
dh(x*
1 , x*
2) ∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2 ) dx2
Since (x* , x*) ∈ h(x*
1 2
, x*) = 0
1 2 = 0 implying + =0
dx1 ∂x1 ∂x2 dx1
∂h ∂h dϕ
+ = 0… Eqn(2)
∂x1 ∂x2 dx1
18
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
∂f ∂f dϕ ∂h ∂h dϕ
+ = 0… Eqn(1) + = 0… Eqn(2)
∂x1 ∂x2 dx1 ∂x1 ∂x2 dx1
∂f ∂h
dϕ ∂x1 ∂x1
− = ∂f
= ∂h
… Eqn(3)
d x1
∂x2 ∂x2
∂f ∂f ∂f ∂h
+λ =0
∂x1 ∂x2 λ : free in sign ∂x1 ∂x1
Eqn(4)…
∂h
= ∂h
=−λ
∂f ∂h
∂x1 ∂x2 +λ =0
∂x2 ∂x2
To be a candidate optima, a point X*,
that is, (x* , x*) needs to satisfy:
1 2
+! =0
∂f (x*
1 , x*
2) ∂h(x*
1 , x*
2)
+λ =0
∂x1 ∂x1
h(x* , x*) = 0 Grad f
1 2 λ=−
∂f (x*
1 , x*
2) ∂h(x*
1 , x*
2) Grad h
+λ =0
∂x2 ∂x2 ∇f (X*) + λ ∇h(X*) = 0 (free in sign)
h(x* , x*) = 0
1 2 h(x* , x*) = 0
1 2
19
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation of Necessary & Sufficient Conditions (Alternative Derivation)
At a stationary point X* ≡ {x* , x*} the total derivative of the function = 0, that is
1 2
∂f ∂f
df = d x1 + d x2 = 0 ≡ ∇f T d x = 0…Eqn(1)
∂x1 ∂x2
Unlike unconstrained problem, the infinitesimal vector d x ≡ {d x1, d x2} in a constrained problem can not be arbitrary .
∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2)
∂h ∂h
h(x* + d x1, x* + d x2) = h(x* , x*) + d x1 + d x2 = 0 dh = d x1 + d x2 = 0 ≡ ∇h T d x = 0……Eqn(2)
1 2 1 2 ∂x1 ∂x2 ∂x1 ∂x2
∂h ∂h
∂x1 ∂f ∂f ∂x1
dx2 = − dx1 …Eqn(3) Plugging Eqn(3) in Eqn(1) df = d x1 − d x1 = 0…Eqn(4)
∂h ∂x1 ∂x2 ∂h
∂x2 ∂x2
∂h
∂f ∂f ∂x1
( − )d x1 = 0
∂x1 ∂x2 ∂h
∂x2
∂f
∂f ∂f
∂f ∂h ∂f ∂h ∂f ∂x2 ∂h
− =0 − =0 ∂x2 ∂x1
∂x1 ∂x2 ∂x2 ∂x1 ∂x1 ∂h ∂x1 −λ = −λ =
∂x2 ∂h ∂h
∂x2 ∂x1
!" !ℎ
∂f ∂h ∂f ∂h
!#1 !#1 = 0 +λ =0 +λ =0
!" !ℎ ∂x1 ∂x1 ∂x2 ∂x2
!#2 !#2
20
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation of Necessary & Sufficient Conditions (Alternative Derivation)
∂f ∂f
At a stationary point X* ≡ {x* , x*} the total derivative of the function = 0, that is df =
1 2
d x1 + d x2 = 0 ≡ ∇f T d x = 0…Eqn(1)
∂x1 ∂x2
Unlike unconstrained problem, the infinitesimal vector d X ≡ {d x1, d x2} in a constrained problem can not be arbitrary .
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(X ) ≡ x1 + x2 − 2 = 0 Minimize f (x1) = (x1 − 1.5)2 + (2 − x1 − 1.5)2
Lagrange : one could multiply each constraint variation by a scalar & add it to the objective function
K n K n n K
∂f ∂h ∂f ∂h
∑
df + λk dhk ≡
∑ ∂xi ∑ k ∑ ∂xi ∑ ∂xi ∑ k ∂xi
( ) d xi + λ ( d xi ) ≡ ( + λ ) d xi
k=1 i=1 k=1 i=1 i=1 k=1
}
}
This formulation jointly accounts for the variation in the objective function and the constraints. This equals 0, if f
n K
Now that the constraints do not need to be accounted for separately, the infinitesimal vector dx ∂f ∂h
∑ ∂xi ∑ ∂xi
n ( + λk ) 0 ∀i = 1,…, n
∑
becomes independent and arbitrary. ⟹ βi d xi = 0 if f βi = 0 ∀ i = 1,…, n i=1 k=1
i=1
∇f(X*) + λ ∇h(X*) = 0
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation of Necessary & Sufficient Conditions (Example)
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(X ) ≡ x1 + x2 − 2 = 0
∇f(X*) + λ ∇h(X*) = 0
h(X ) ≡ x1 + x2 − 2 = 0
2(#1 − 1.5) 1 !1 + !2 = 2 !1 = 1
2(#2 − 1.5)
+) =0 ⟹ !1 = !2 ⟹!=1
1 !2 = 1
h(X ) ≡ 2 − x1 − x2 = 0
2(#1 − 1.5) −1 !1 + !2 = 2 !1 = 1
2(#2 − 1.5)
+)
−1
=0 ⟹ !1 = !2 ⟹ ! = -1
!2 = 1
∇f (X*) + λ ∇h(X*) = 0
A point X* satisfying the constraints hk(X* = 0) is called a Regular Point of the Feasible Set if:
∙ f (X*) is differentiable K
∑
∙ the gradient vectors of all constraints at X* are Linearly Independent, that is αk hk = 0 iff αk = 0 ∀ k = 0 … K
k=1
a set of vectors is linearly dependent if and only if one of them is zero or a linear combination of the others
25
𝛌
𝛌
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2
∑
If L(X, λ) = f (X ) + λk hk(X )
there exist "unique" Lagrange Multipliers, λk ∀ k = 1,…, K such that: k=1
∙ d T ∇2 LX (X*) d should be greater than Zero : ………………………d T ∇2 LX (X*) d > 0 Second-oder Sufficient C
26
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2
Lagrange Multiplier Theorem: Necessary Condition for Optimality - An Example
Minimize f (X ) = x1 + x2 subject to h(X ) ≡ x12 + x22 − 2 = 0
∙ Form the Lagrange function: L(x1, x2, λ) = x1 + x2 + λ(x12 + x22 − 2)
∙ Apply the NCC - Lagrange Multiplier Theorem:
1
⟹ ∇Lλ(X*) = x12 + x22 −2=0 ⟹ λ =±
2
∙ Sufficient condition
Directions d = (d1, d2)T that satisfy the linearized constraints are given by:
" $1 d1 + d2 = 0
T
∇h d = 0 [2x1 2x2]
!1
!
=0
2
"
[- - ]
#
=0
# $2 d =−d { 2 1
2" 0
Hessian of the Lagrangian at the stationary points is ∇2 LX (X*) =
0 2"
2" 0 d1 2
Consequently, the Hessian of the Lagrangian in the subspace defined by d is d T ∇2 LX (X*)d = [d1 -d1] = 4"d1
0 2" −d1
X = [x1 x2]T = [−1 − 1]T satisfies both NCC/SFC • in a subspace defined by the linearized constraints: ∇h T d = 0
X = [x1 x2]T = [−1 − 1]T is optimum
27
𝜆
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2
Design for minimum cost, a cylindrical can closed at both the ends, with a Volume V
Treating the radius and height of the can as the design variables, namely, r and b, respectively
the problem can be formulated as: Minimize f (r, b) = r 2 + rb subject to h ≡ πr 2b − V = 0
∙ Form the Lagrange function: L(r, b, λ) = r 2 + rb + λ(πr 2b − V )
∙ Apply the NCC - Lagrange Multiplier Theorem:
∂L ∂L
∇LX (X*) = 0 ≡ ∇Lr,b(X*) = 0 = 2r + b + 2πrbλ = 0 … Eqn(1) = r + b + πr 2 λ = 0 … Eqn(2)
∂r ∂b
∂L V
∇Lλ(X*) = 0 = πr 2b − V = 0 … Eqn(3) b= … Eqn(4)
∂λ πr 2
−(2πr 3 + V )
∙ Substituting Eqn(4) in Eqn(1): λ = … Eqn(5)
2πrV
−1
∙ Substituting Eqn(4) in Eqn(2): λ = … Eqn(6)
πr
V 2 1/3
Equating λ in Eqn(5) and Eqn(6) leads to: r* = [ ]1/3 λ* = − [ 2 ]
2π πV
4V 1/3 V 2/3
… and subsequently b* = [ ] f * = 3[ ]
π 2π
28
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2
Minimize f (r, b) = r 2 + rb V 1/3 4V 1/3 2 1/3 V 2/3
r* = [ ] b* = [ ] λ* = − [ ] f * = 3[ ]
subject to h ≡ πr 2b − V = 0 2π π πV
2 2π
2 2
L(r, b, λ) = r + rb + λ(πr b − V ) 2%&' ∂2 L ∂2 L ∂2 L
!h = = 2 + 2πbλ = 1 + 2πrλ =1
%&2 ∂r∂r ∂r∂b ∂b∂b
∙ Sufficient condition
Directions d = (d1, d2)T that satisfy the linearized constraints are given by:∇h T d = 0
' &1
[2#$% #$2] 1 = 0
'2 [2# %]
&2
=0 d2 = − 4d1
29
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
We have seen: Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0
Now consider: Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to g(x1, x2) ≡ x1 + x2 − 2 ≤ 0
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 + s 2 = 0
Let the Lagrange Function be L(X, μ, s) ≡ L(x1, x2, μ, s) = (x1 − 1.5)2 + (x2 − 1.5)2 + μ (x1 + x2 − 2 + s 2)
30
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
− ∇g ∇f (X ) + μ ∇g(x) = 0 ⟹ ∇f (X ) = μ[ − ∇g(x)]
31
𝜇
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2
subject to g(x1, x2) ≡ x1 + x2 − 2 ≤ 0 subject to g1(X ) ≡ x1 + x2 − 2 ≤ 0 and g2(X ) ≡ x1 − 1 ≤ 0
x*
1
= x*
2
= μ* = 1
− ∇g
Critical Revelations
∇f + μ ∇g = 0⃗ ⟹ S ⋅ 0⃗ = 0 ⟹ S T[ ∇f + μ ∇g] = 0
⟹ S T ∇f + μS T ∇g = 0 …(a)
S
}
}
If X* is to be a local minimum, then along S : f value By geometrical construct: ∇g and S are on the
should increase or at worst remain the same, that is, the opposite side to the tangent at X*
angle between ∇f and S SHOULD BE ACUTE ⟹ the angle between them is OBTUSE
S T ∇f + μS T ∇g = 0 ⟹ μ[−S T ∇g] = S T ∇f
⟹ μ[positive] = S T ∇f
⟹ μ[positive] ≥ 0 ⟹ μ ≥ 0
33
𝜇
𝛌
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Can the Lagrange Multiplier for Inequality Constraint ( ) be negative as for Equality Constraint ( )?
The Lagrange Multiplier in the case of Inequality Constraint ( ) has to NON-NEGATIVE (either Zero or Positive)
Here: when X* is not even known: how does one decide which of the constraints are to be treated as active
so they could be included in the Lagrange Function
Well! include ALL …because LMT will itself lead to μ = 0 for inactive constraint
X* X* ∙ X*
13
: then LMT shall also ⟹ μ1 ≥ 0; μ3 ≥ 0; μ2 = 0
13 23
g3 ≤ 0 ∙ X*
23
: then LMT shall also ⟹ μ2 ≥ 0; μ3 ≥ 0; μ1 = 0
35
𝜇
𝜇
𝜇
𝜇
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Formulation
Let X* be a regular point of the feasible set, that is a local minimum for
Minimize f (X ) subject to: gj ≤ 0 ∀ j = i, …, J and hk = 0
∀ k = i, …, K
Then there exist unique Lagrange Multipliers μ ⃗ (J-vector: μj ≥ 0) and λ ⃗ (K-vector) such that the Lagrange F
is stationary w.r.t. X ,⃗ s,⃗ μ ,⃗ and λ,⃗ where the Lagrange function is of the form:
J K
μj [gj(X ) + sj2] +
∑ ∑
L(X, μ, λ, s) = f (X ) + λk hk(X )
j=1 k=1
∑ ∑
∇LX (X*) = 0 ∇f (X*) + μj ∇gj(X*) + λk ∇hj(X*) = 0
= + ∑!j + ∑"K =0; where !j ≥ 0
j=1 k=1
where μj ≥ 0
Stationary w.r.t s Switching conditions: J;
∇Ls(X*) = 0 Solution cases: 2J
2 μj sj = 0 ∀ j = 1,…, J
Stationary w.r.t
37
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-I
Minimize f(X ) = x12 + x22 − 3x1x2 subject to g(X ) ≡ x12 + x22 − 6 ≤ 0
L(X, μ, s) = x12 + x22 − 3x1x2 + μ(x12 + x22 − 6 + s 2) = 0
Stationary w.r.t X
∂L 3x2 − 2x1
∇LX (X*) = 0 = 2x1 − 3x2 + μ2x1 = 0 ⟹ μ = … Eqn(1)
∂x1 2x1
∂L 3x1 − 2x2
= 2x2 − 3x1 + μ2x2 = 0 ⟹ μ = … Eqn(2)
∂x2 2x2
Stationary w.r.t ∂L
= x12 + x22 − 6 + s 2 = 0… Eqn(3)
∇Lμ(X*) = 0 ∂μ
f =0
f =−3
38
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-II
Maximize f (X ) = 2x1 + 2x2 − x12 − x22 − 2 subject to g1(X ) ≡ 2x1 + x2 ≥ 4; and g2(X ) ≡ x1 + 2x2 ≥ 4
Minimize f (X ) = x12 + x22 − 2x1 − 2x2 + 2 subject to g1(X ) ≡ 4 − 2x1 − x2 ≤ 0; and g2(X ) ≡ 4 − x1 − 2x2 ≤ 0
L(X, μ1, s1, μ2, s2) = x12 + x22 − 2x1 − 2x2 + 2 + μ1[4 − 2x1 − x2 + s12] + μ2[4 − x1 − 2x2 + s22]
Stationary w.r.t X
∂L
= 2x1 − 2 − 2μ1 − μ2 = 0… Eqn(1) Case-I 1=0 & 2=0
∂x1
∇LX (X*) = 0 Eqn(1) ⟹ x1 = 1 Eqn(3) ⟹ s12 = − 1
∂L
= 2x2 − 2 − μ1 − 2μ2 = 0… Eqn(2) Eqn(2) ⟹ x2 = 1 Eqn(4) ⟹ s22 =−1
}
Infeasible
∂x2
Minimize f (X ) = x12 + x22 − 2x1 − 2x2 + 2 subject to g1(X ) ≡ 4 − 2x1 − x2 ≤ 0; and g2(X ) ≡ 4 − x1 − 2x2 ≤ 0
L(X, μ1, s1, μ2, s2) = x12 + x22 − 2x1 − 2x2 + 2 + μ1[4 − 2x1 − x2 + s12] + μ2[4 − x1 − 2x2 + s22]
Stationary w.r.t X
∂L
= 2x1 − 2 − 2μ1 − μ2 = 0… Eqn(1) Case-I 1=0 & 2=0 Case-II s1=0 & 2=0
∂x1
}Infeasible
}
∇LX (X*) = 0 Case-III 1=0 & s2=0
∂L
= 2x2 − 2 − μ1 − 2μ2 = 0… Eqn(2) Case-IV s1=0 & s2=0
∂x2
Eqn(3) ⟹ 2x1 + x2 = 4… Eqn(13) 4
∂L Eqn(4) ⟹ x1 + 2x2 = 4… Eqn(14) }
x1 = x2 =
3 2
Stationary w.r.t = 4 − 2x1 − x2 + s12 = 0… Eqn(3)
∇Lμ(X*) = 0
∂μ1 Plugging x1 & x2 in Eqn(1) and (2) ⟹ μ1 = μ2 = > 0
9
∂L 4/3 −2 −1
= 4 − x1 − 2x2 + s22 = 0… Eqn(4) Regularity check X*=
4/3
g1(X*)=
−1
g2(X*)=
−2
∂μ2
∂L Since ∇g1 ≠ α ∇g2 KKT conditions are met and X* is a
Stationary w.r.t s = 2μ1s1 = 0… Eqn(5) candidate optima
∂s1
∇Ls(X*) = 0 ⟹ μ1 = 0 or s1 = 0
∂L
= 2μ2s2 = 0… Eqn(6)
∂s2
⟹ μ2 = 0 or s2 = 0
μ1 = 0 s1 = 0
Switching conditions: J = 2;
μ2 = 0 I II
J
Solution cases: 2 = 4 s2 = 0 III IV
40
𝛍
𝛍
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Alternative Formulation
Let X* be a regular point of the feasible set, that is a local minimum for
Minimize f (X ) subject to: gj ≤ 0 ∀ j = i, …, J ∀ k = i, …, K
and hk ≤ 0
Then there exist unique Lagrange Multipliers μ ⃗ (a J-vector: μj ≥ 0) and λ ⃗ (a K-vector) such that the Lagrange
function is stationary w.r.t. X ,⃗ s,⃗ μ ,⃗ and λ,⃗ where the Lagrange function is of the form:
J K J K
∑ ∑
μj [gj(X ) + sj2] + L(X, μ, λ) = f (X ) + μj gj(X ) + λk hk(X )
∑ ∑
L(X, μ, λ, s) = f (X ) + λk hk(X )
j=1 k=1 j=1 k=1
∑ ∑
∇LX (X*) = 0 ∇f (X*) + μj ∇gj(X*) + λk ∇hj(X*) = 0
j=1 k=1 } Same
∂L Solution cases: 2J = 4
Stationary w.r.t X
= 2(x − 10) + μ1 + μ2 = 0… Eqn(1) Straight to switching conditions μ1 = 0 g1 = 0
∂x
∇LX (X*) = 0 μ1g1 = 0 and μ2g2 = 0 μ2 = 0
∂L I II
= 2(y − 8) + μ1 = 0… Eqn(2) g2 = 0
∂y III IV
42
𝛍
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Attention! Type-IV M=1 J≥1 K≥1 n≥2
43
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Attention! Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-IV
2m
44
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Attention!
Karush Kuhn Tucker Conditions: Example-IV
2m
2m 2m
2m
45
SOME CONCEPTS
FACILITATING
AN INTERESTING INTERPRETATION
OF KKT CONDITIONS
….. AHEAD
VECTOR SPACE: SPANNING SET
A set of vectors X1, X2, …, XK is said to span the Vector space V, if any vector X ∈ V can be expressed through this
∑
set of K vectors, in that: X = α1Xi
1 1 0 −1 1
X3 X2 Consider these vectors in R 2 : X1= X2= X3= X4= X5=
0 1 1 0 −1
Any vector X ∈ R 2 can be expressed as a linear combination of X1, X2, …, X5 .
Notably:
∙ Different Basis for the same V may exist (but all will have the same cardinality)
∙ For a given Basis: its elements need not be Orthogonal (though its the most convenient one)
Example: Standard basis for R n : {e1, e2, …, en} where ei is an n-dimensional vector
with all elements=0, except for the i th element = 1
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Proof & Physical Interpretation of 1st and IInd Order KKT Conditions for Optimality
Minimize f (X ) subject to gj(x) ≤ 0 ∀ j = 1,…, J; hk(x) = 0 ∀ k = 1,…, K; X ∈ Rn
Both sides of the Equality Constraint surface are infeasible Only one side of the Inequality Constraint surface is infeasible
If you move along or opposite to ∇h If you move along ∇g you become infeasible
you become infeasible but you remain feasible if you go opposite to ∇g
Each h reduces the domain (Rn) to a lower dimensional Each g does not reduce the domain (Rn) to a lower
subspace (hypersurface) which has a tangent plane dimensional subspace (hypersurface) but a subset of n-
possible at every point over it. dimensional space
The feasible directions 'd' are given by: The feasible directions 'd' are given by:
∇h T d = 0 ∇g T d ≤ 0
d belongs to the plane tangent to ∇h d belongs to the cone of feasibility
48
𝝴
𝝴
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Each h reduces the domain (Rn) to a lower dimensional subspace Each g does not reduce the domain (Rn) to a lower dimensional
(hypersurface) with a defined tangent plane at every point over it. subspace (hypersurface) but a subset of n-dimensional space
Whether you can find a feasible direction or not depends on A direction would be a descent direction if it can offer
whether or not there are linearly independent vectors f (X ) < f (X*)
to form the basis (B) for THG ≡ Tp ∩ CF : Let rB = Rank(B)
A necessary condition to prevent a descent direction at X*
rB = Rank(B) = n − K − | JA | ⟹ n = rB + K + | JA | is that: − ∇f (X*) (which promises freduction) should have
0 component in the Tangent Plane ⟹ γ(B) = 0
X* ∈ R n ⟹ ∇f (X*) ∈ R n
⟹ − ∇f (X*) = γ(B) + λ( ∇h(X*)) + μA( ∇gA(X*)) − ∇f (X*) = λ( ∇h(X*)) + μA( ∇gA(X*))
⟹ − ∇f (X ) can be expressed as a linear combination of the ⟹ ∇f (X*) + λ( ∇h(X*)) + μ ( ∇g (X*)) = 0
A A
∙ linearly independent vectors constituting B
∙ Gradients of Equality Constraints ⟹ ∇f (X*) + λ( ∇h(X*)) + μA( ∇gA(X*)) + μI ( ∇gI (X*)) = 0
∙ Gradients of Active Inequality Constraints (with the insistance that μI = 0 where ∇gI ≠ 0)
}
Complementary Condition:
∙ When gI ≠ 0 : μI = 0
∙ When gA = 0 : μA ≥ 0
50
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
⤐
⤐
μA ≥ 0 ensures that component of − ∇f (X*) can exist Even if ∃ a component of − ∇f (X*) along ± ∇h(X*),
only along + ∇gA(X*), which being infeasible is it is inconsequential to local optimality of X* since
inconsequential to local optimlity of X* the directions marked by ∇h are infeasible
51
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
μA ≥ 0
52
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Necessary Condition of Optimality for a point X*: If at X* you CAN NOT FIND a direction ‘d’ which is BOTH
a FEASIBLE direction a DESCENT (USABLE) direction
∙ − ∇f (X*) is orthogonal to TP ∙ μA ≥ 0
1 T 2
d ∇LXX d
2
Second order Necessary Condition … that the second order change along d ∈ THG ≡ {TP ∩ CF} ≥ 0
Second order Sufficient Condition … that the second order change along d ∈ THG ≡ {TP ∩ CF} > 0 … with one additional
qualifier
53
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
∗ 0 ∗ 2 −3 ∗ )1
g(X ) = 2
L(X ) = gT X d = [0 0] =0 ⟹ No relation can be induced between d1 and d2
0 −3 2 )2
2 −3 '
d T ∇2 LX (X*)d = d1 d2 −3 2 '1 = 2(d12 − 3d1d2 + d22) = 2[(d1 − d2)2 − d1d2] ≱ 0 conclusively ⟹ SONCC is not met
2
2√3 ∗ 3 −3 ∗ ,1
⟹ d2 = − d1
∗ 2
g X = L X = gT X d = [2 3 2√3] =0
2√3 −3 3 ,2
3 −3 '
d T ∇2 LX (X*)d = d1 d2 −3 3 1 = 3(d1 − d2)2 = 12d12 ≥ 0 and also > 0 ⟹ both SONCC and SOSFC are met
'2
Case-I(b): X* = [− 3 − 3]T ; μ = 0.5 ; g = 0 Repeat the process to prove that SONCC & SOSFC are met
55
Thank You