2.NCC-SFC-LMT-KKT 2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

NECESSARY

&
SUFFICIENT
CONDITIONS
FOR OPTIMALITY

Dhish Kumar Saxena


Professor
Department of Mechanical & Industrial Engineering
(Joint Faculty, MFSDSAI)
IIT Roorkee
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY

Inequality Equality
Objectives Variables
Category Constraints Constraints
(M) (n)
(J) (K)

Type-I 1 0 0 1
⎬ Taylor Series
Type-II 1 0 0 ≥2

Lagrange
Type-III 1 0 1 ≥2
Multiplier T.

Type-IV 1 ≥1 ≥1 ≥2 Karush Kuhn


⎬ Tucker (KKT)
Type-V ≥2 ≥1 ≥1 ≥2 Conditions

2
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-I M=1 J=0 K=0 n=1
Taylor’s Series helps represent a function as an infinite sum of terms that are calculated from
the function value and the derivatives of the function at a single point

h x h2 h3 hp p
x0 f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
h2 h3 hp p
f(x) − f(x0) = hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0

h2 h3 hp p
hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1 ≥ 0
2! 3! 3!

First-order
hf′(x0) ≥ 0
Approximation
Suppose at x0: f′(x0) > 0 a negative h would violate the above condition

f′(x0) < 0 a positive h would violate the above condition

hf′(x0) ≥ 0 for this to hold true: f′(x0) = 0

First-order Necessary Condition for Optimality (FO-NCC) f′(x0) = 0


(if x0 satisfies the FO-NCC, it is an stationary point)
























3
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-I M=1 J=0 K=0 n=1
h x
x0 For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
h2 h3 hp p
hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1 ≥ 0
2! 3! 3!
Say, x0 satisfies the FO-NCC: it is an stationary point f′(x0) = 0
You need more conditions to infer about x0

Second-order h2
Approximation f′′(x0) ≥ 0
2!
Suppose at x0: f′′(x0) > 0 • a negative/positive h would not impact the above condition
∙ x0 is guaranteed to be a local minimum

f′′(x0) < 0 • a negative/positive h would not impact the above condition


∙ x0 is guaranteed to be a local maximum
f′′(x0) = 0 ∙ nothing conclusive can be inferred about x0 .
∙ it may be a local minimum or maximum

Second-order Necessary Condition for Optimality (SO-NCC) f′′(x0) ≥ 0


Second-order Sufficient Condition for Optimality (SO-SFC) f′′(x0) > 0













4






NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-I M=1 J=0 K=0 n=1
h x
x0 For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
h2 h3 hp p
hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1 ≥ 0 f′(x0) = 0 f′′(x0) = 0
2! 3! 3!
Third-order h3
f′′′(x0) ≥ 0
Approximation 3!
Suppose at x0: f′′′(x0) > 0 a negative h would violate the above condition
f′′′(x0) < 0 a positive h would violate the above condition
h 3f′′′(x0) ≥ 0 for this to hold true: f′′′(x0) = 0
Third-order Necessary Condition for Optimality (TO-NCC) f′′′(x0) = 0
Fourth-order h 4 IV
Approximation
f (x0) ≥ 0
4!
Suppose at x0: f IV (x0) > 0 • a negative/positive h would not impact the above condition
∙ x0 is guaranteed to be a local minimum
f IV (x0) < 0 • a negative/positive h would not impact the above condition
∙ x0 is guaranteed to be a local maximum
f IV (x0) = 0 ∙ nothing conclusive can be inferred about x0 .
∙ it may be a local minimum or maximum
Fourth-order Necessary Condition for Optimality (FrO-NCC) fIV(x0) ≥ 0
Fourth-order Sufficient Condition for Optimality (FrO-SFC) fIV(x0) > 0






















5





NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-I M=1 J=0 K=0 n=1

Taylor’s Series helps represent a function as an infinite sum of terms that are calculated from
the function value and the derivatives of the function at a single point

h x h2 h3 hp p
x0 f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
h2 h3 hp p
f(x) − f(x0) = hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0

First-order Necessary Condition for Optimality (FO-NCC) f′(x0) = 0


Second-order Necessary Condition for Optimality (SO-NCC) f′′(x0) ≥ 0
Second-order Sufficient Condition for Optimality (SO-SFC) f′′(x0) > 0
Third-order Necessary Condition for Optimality (TO-NCC) f′′′(x0) = 0
Fourth-order Necessary Condition for Optimality (FrO-NCC) fIV(x0) ≥ 0
Fourth-order Sufficient Condition for Optimality (FrO-SFC) fIV(x0) > 0

A point x0 is a local minimum iff the first non-zero element in the sequence
f k(x) is positive, and occures at even positive k




















6
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2

h x h2 h3 hp p
n=1 x f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!

n≥2 Unlike the case of n=1 where h was a scalar; in the case of n ≥ 2, h is a vector
ℎ"
representing the component-wise difference between the known point and the h=
ℎ#
unknown point

∂f ∂f 1 2 ∂2f ∂2f 2 ∂ 2
f
f(X ) = f(x, y) = f(x0 + hx, y0 + hy) = f(x0, y0) + [hx + hy ]x0,y0 + [hx 2 + 2hxhy + hy 2 ]x0,y0 + …
∂x ∂y 2 ∂ x ∂x∂y ∂y
1 2
f(X ) = f(x, y) = f(x0 + hx, y0 + hy) = f(x0, y0) + [hx fx + hy fy]x0,y0 + [hx fxx + 2hxhy fxy + hy2 fyy]x0,y0 + …
2
Again, h is a vector representing the component-wise difference ℎ1 &1 − &01
n ℎ2 &2 − &02
between the known point and the unknown point. Indices x, y can’t h= =
: :
serve more than 2 dimensions. Hence, indices 1,2,….n are used. ℎ% &% − &0%

n n n 2
∂f 1 ∂ f
f(X ) = f(x1, x2, …, xn) = f(x10 + h1, x20 + h2, …, xn0 + hn) = f (x10, x20, …, xn0) +
∑ ∂xj j 2 ∑ ∑ ∂xj∂xk j k
h+ hh +…
j=1 j=1 k=1

T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h + …
2
∇f T (X0) :Gradient of the H :Hessian of the
function, evaluated at function, evaluated at
the known point X0 the known point X0






7
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2
n n n 2
0 0 0 0 0 0 ∂f 1 ∂ f
f(X ) = f(x1, x2, …, xn) = f(x1 + h1, x2 + h2, …, xn + hn) = f (x1 , x2 , …, xn ) + ∑ hj + ∑ ∑ hj hk + …
j=1
∂xj 2 j=1 k=1 ∂xj∂xk

T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h + …
2
∇f T (X0):Gradient of the function, H :Hessian of the function,
evaluated at the known point X0 evaluated at the known point X0

!" !"#
fi = !#$
f= H= Hij=
!$%!$&

T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h
2

8
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2
Next steps: • With reference to an example assess how good is this 2nd order approximation
• assess how good is this 2nd order approximation
{ • learn to compute Gradient & Hessian

• Understand more about Gradient and Hessian

Obtain a second-order Taylor’s expansion for the function f(X ) = 3x13x2 at X0 = (1,1)

T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h
2
9"21 "2 18#1#2 9#21 !1 − 1 9 18 9
f(X0) = 3 f= H= h= f(X0)= H(X0) =
3"31 9#21 0 !2 − 1 3 9 0

!1 − 1 18 9 !1 − 1
f(X) 3 + [9 3] + 0.5 [x1-1 x2-1]
!2 − 1 9 0 !2 − 1

f(X) ≃ 9x21 + 9x1 x2 – 18x1 – 6x2 + 9

Quality of Approximation

1 1.3 f(X) = 3 x31 x2 = 8.5683


X0 = X=
1 1.3 f(X) ≃ 9x21 + 9x1 x2 – 18x1 – 6x2 + 9 = 8.220

For a 30% change in the given point, the Taylor’s series approximation underestimates the original function by 4%
9
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the Gradient

hx = h cosθx Direction cosines of the vector joining X0(x0, y0) and X(x, y) hx = h ux
h hy = h cosθy Call these ux and uy respectively h y = h uy

x = x0 + hx = x0 + h ux y = y0 + hy = y0 + h uy
dϕ ∂ϕ d x ∂ϕ dy ∂ϕ dz
The variation of a scalar function ϕ over a distance h, is given by = ( )+ ( )+ ( )
dh ∂x dh ∂y dh ∂z dh


=[
∂ϕ ∂ϕ ∂ϕ
]
x = x0 + h ux dϕ ∂ϕ ∂ϕ ∂ϕ !" dϕ
dh ∂x ∂y ∂z =[ ] !# = ∇ϕ ∙ u
dx dh ∂x ∂y ∂z ! dh
= ux $
dh

= ∇ϕ ∙ u
dh
!"
!#
!$

Gradient of a u:direction cosines of the


scalar function displacement vector

10
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the Gradient

dϕ !"
= ∇ϕ ∙ u where ∇ϕ = u ≡ !#
dh !$

Rate of change of a scalar function over a distance h = the dot


product between the function’s gradient and the direction cosines
of the displacement vector


= ∇ϕ ∙ u = | ∇ϕ | | u | cosθ∇ϕ/u
dh

| = | ∇ϕ | ……at θ∇ϕ/u = 0
dh max

| = − | ∇ϕ | ……at θ∇ϕ/u = 180
dh min
Rate of change of a scalar function over a distance h is maximum
if one moves along the direction the function’s gradient

Now the natural question is: What is the direction the function’s gradient?

11
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the Gradient

dϕ dϕ
= ∇ϕ ∙ u = | ∇ϕ | | u | cosθ∇ϕ/u | = | ∇ϕ | ……at θ∇ϕ/u = 0 or 180
dh dh max
Rate of change of a scalar function over a distance h is maximum
if one moves along the direction the function’s gradient

Now the natural question is: What is the direction the function’s gradient?

!"
=0
!ℎ
∇ϕ
=c !" '" '" '" !(
on
sta
nt !#
= [ '( ') '* !+
=∇1 . r’(t)
,-
{x(t), y(t), z(t)} ,.
t)

,/
r′(t)
r(

,.

∇" . r’(t)=0 The direction the function’s gradient at a point is


perpendicular to the tangent at that point

Rate of change of a scalar function over a distance h is maximum if one moves along the
direction the function’s gradient, that is, along the normal at that point

12
𝛷
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the HESSIAN

T 1 T !11 !12 h1
f(X ) = f(X0) + ∇f (X0)h + h H h hT H h = [h1 h2] = !11 h12 + 2!22 h1h2+ !22 h22
2 !21 !22 h2

Scalar function (in Quadratic Form)

Any Quadratic function can be expressed as: Q = X T MX

! # $ x1
f(x1, x2, x3) = 2x12 + 2x1x2 + 4x1x3 − 6x22 − 4x2 x3 + 5x32 =[x1 x2 x3] % −' ( x2
) * + x3

Lets compute a, b, c, d, e, f can be anything as long as a + d = 2; b + e = 4; c + f = − 4


Hessian
Matrix for the
same function Special case for M is when it is symmetrical
! # !
T
⟹ Q = X SX ⟹ a = d; b = e; c = f ⟹ S= # −% −!
! −! &

! # !
T H T
H= H= # −%# −! Q = X SX = X X
! −! %& 2

13
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the HESSIAN
h2 Realize that Necessary and Sufficient Conditions
f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0)
2! were defined based on the sign of f ′′(x0)

Similarly the sign of the scalar function h T H h should be known Theory of Quadratic functions helps

Characterization through
Qf(X) = X T SX is Definition
Eigenvalues ( λ )

Positive definite (PD) Qf(X) > 0 ∀ X ≠ 0 λi > 0 ∀ i = 1…n

Positive semi-definite (PSD) Qf(X) ≥ 0 ∀ X ≠ 0 λi ≥ 0 ∀ i = 1…n

Negative definite (ND) Qf(X) < 0 ∀ X ≠ 0 λi < 0 ∀ i = 1…n

Negative semi-definite (NSD) Qf(X) ≤ 0 ∀ X ≠ 0 λi ≤ 0 ∀ i = 1…n

Qf(X) > 0 ∀ for some X λi > 0 for some i


Indefinite (ID)
Qf(X) < 0 ∀ for some X λi < 0 for some i



14


NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-II M=1 J=0 K=0 n≥2 Understanding the HESSIAN

h2 T 1 T
f(x) = f(x0) + hf′(x0) + f′′(x0) f(X ) = f(X0) + ∇f (X0)h + h H h
2! 2

For x0 to be a local minimum h2 1 T


hf′(x0) + f′′(x0) ≥ 0 ∇f T (X0)h + h H h≥0
f(x) − f(x0) ≥ 0 2! 2

First order NCC f′(x0) = 0 ∇f(X0) = 0⃗

′′ H is positive semi-definite
Second order NCC f (x0) ≥ 0
(λ ≥ 0)

Second order SFC ′′ H is positive definite


f (x0) > 0
(λ > 0)






15





NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-I: Geometrical

Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0

The best value of objective function is 0, at U(1.5, 1.5) - an infeasible point

The farther you move away from U:


the objective function value undesirably increases
you get closer to the feasible region

Balancing act: Move away from U but only the minimum you need to
become feasible - P is that point

Method-II: Algebraic

Product of Slopes of two perpendicular lines = -1


x2 − 1.5
( )(−1) = − 1 x1 = x2…Eqn(1)
x1 − 1.5
x1 + x2 = 2…Eqn(2)
x1 = 1; x2 = 1
Slope=-1

16
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2

Method-III: Variable Reduction; Unconstrained Optimization

Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0

(Use of Equality Constraint


x2 = 2 − x1
for Variable reduction)

Minimize f (x1) = (x1 − 1.5)2 + (2 − x1 − 1.5)2

Minimize f (x1) = (x1 − 1.5)2 + (0.5 − x1)2 (Unconstrained Optimization Prob)

df
First-order Necessary Condition = 0, leading to x1 = 1; x2 = 1
d x1
d 2f
Second-order Necessary Condition = 4 ≥ 0, hence satisfied
d x12

Second-order Sufficiency Condition d 2f


= 4 > 0, hence satisfied
d x12

17
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation & Application of Necessary & Sufficient Conditions
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0

Assume x2 is expressible as a function of x1: x2 = ϕ(x1)

Apply the first-order Necessary Condition (unconstrained, single-variable


problem) at the point (x*, x*)
1 2
df (x*
1 , x*
2) ∂f (x*
1 , x*
2) ∂f (x*
1 , x*
2 ) dx2
= 0 implying + =0
dx1 ∂x1 ∂x2 dx1

Forgoing (x* , x*)


1 2 for ease of representation

df ∂f ∂f dx2 x2 = ϕ(x1) ∂f ∂f dϕ
= 0 implying + =0 + = 0… Eqn(1)
dx1 ∂x1 ∂x2 dx1 ∂x1 ∂x2 dx1

dh(x*
1 , x*
2) ∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2 ) dx2
Since (x* , x*) ∈ h(x*
1 2
, x*) = 0
1 2 = 0 implying + =0
dx1 ∂x1 ∂x2 dx1

Forgoing (x* , x*)


1 2 for ease of representation

∂h ∂h dϕ
+ = 0… Eqn(2)
∂x1 ∂x2 dx1

18
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
∂f ∂f dϕ ∂h ∂h dϕ
+ = 0… Eqn(1) + = 0… Eqn(2)
∂x1 ∂x2 dx1 ∂x1 ∂x2 dx1
∂f ∂h
dϕ ∂x1 ∂x1
− = ∂f
= ∂h
… Eqn(3)
d x1
∂x2 ∂x2

Collect Partial Derivatives w.r.t x1 and x2 together

∂f ∂f ∂f ∂h
+λ =0
∂x1 ∂x2 λ : free in sign ∂x1 ∂x1
Eqn(4)…
∂h
= ∂h
=−λ
∂f ∂h
∂x1 ∂x2 +λ =0
∂x2 ∂x2
To be a candidate optima, a point X*,
that is, (x* , x*) needs to satisfy:
1 2
+! =0
∂f (x*
1 , x*
2) ∂h(x*
1 , x*
2)
+λ =0
∂x1 ∂x1
h(x* , x*) = 0 Grad f
1 2 λ=−
∂f (x*
1 , x*
2) ∂h(x*
1 , x*
2) Grad h
+λ =0
∂x2 ∂x2 ∇f (X*) + λ ∇h(X*) = 0 (free in sign)

h(x* , x*) = 0
1 2 h(x* , x*) = 0
1 2

19
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation of Necessary & Sufficient Conditions (Alternative Derivation)
At a stationary point X* ≡ {x* , x*} the total derivative of the function = 0, that is
1 2
∂f ∂f
df = d x1 + d x2 = 0 ≡ ∇f T d x = 0…Eqn(1)
∂x1 ∂x2
Unlike unconstrained problem, the infinitesimal vector d x ≡ {d x1, d x2} in a constrained problem can not be arbitrary .

Any variation d x1 and d x2 in (x*, x*) is permissible only when h(x*


1 2 1
+ d x1, x*
2
+ d x2) = 0

∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2)
∂h ∂h
h(x* + d x1, x* + d x2) = h(x* , x*) + d x1 + d x2 = 0 dh = d x1 + d x2 = 0 ≡ ∇h T d x = 0……Eqn(2)
1 2 1 2 ∂x1 ∂x2 ∂x1 ∂x2
∂h ∂h
∂x1 ∂f ∂f ∂x1
dx2 = − dx1 …Eqn(3) Plugging Eqn(3) in Eqn(1) df = d x1 − d x1 = 0…Eqn(4)
∂h ∂x1 ∂x2 ∂h
∂x2 ∂x2
∂h
∂f ∂f ∂x1
( − )d x1 = 0
∂x1 ∂x2 ∂h
∂x2
∂f
∂f ∂f
∂f ∂h ∂f ∂h ∂f ∂x2 ∂h
− =0 − =0 ∂x2 ∂x1
∂x1 ∂x2 ∂x2 ∂x1 ∂x1 ∂h ∂x1 −λ = −λ =
∂x2 ∂h ∂h
∂x2 ∂x1
!" !ℎ
∂f ∂h ∂f ∂h
!#1 !#1 = 0 +λ =0 +λ =0
!" !ℎ ∂x1 ∂x1 ∂x2 ∂x2
!#2 !#2

Columns representing the gradients of f and h Grad f +! =0


(λ = − : free in sign)
are proportional to each other Grad h

20
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation of Necessary & Sufficient Conditions (Alternative Derivation)
∂f ∂f
At a stationary point X* ≡ {x* , x*} the total derivative of the function = 0, that is df =
1 2
d x1 + d x2 = 0 ≡ ∇f T d x = 0…Eqn(1)
∂x1 ∂x2

Unlike unconstrained problem, the infinitesimal vector d X ≡ {d x1, d x2} in a constrained problem can not be arbitrary .

Any variation d x1 and d x2 in (x*, x*) is permissible only whenh(x*


1 2 1
+ d x1, x*
2
+ d x2) = 0
∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2)
∂h ∂h
h(x* + d x1, x* + d x2) = h(x* , x*) + d x1 + d x2 = 0 dh = d x1 + d x2 = 0 ≡ ∇h T d x = 0……Eqn(2)
1 2 1 2 ∂x1 ∂x2 ∂x1 ∂x2
Recall that the gradient of a function is orthogonal to its contours. Thus, since the displacement dX satisfies the constraint contour,
(the straight line in this case), it follows that dX is orthogonal to ∇h .

Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(X ) ≡ x1 + x2 − 2 = 0 Minimize f (x1) = (x1 − 1.5)2 + (2 − x1 − 1.5)2

Lagrange : one could multiply each constraint variation by a scalar & add it to the objective function
K n K n n K
∂f ∂h ∂f ∂h

df + λk dhk ≡
∑ ∂xi ∑ k ∑ ∂xi ∑ ∂xi ∑ k ∂xi
( ) d xi + λ ( d xi ) ≡ ( + λ ) d xi
k=1 i=1 k=1 i=1 i=1 k=1

}
}
This formulation jointly accounts for the variation in the objective function and the constraints. This equals 0, if f
n K
Now that the constraints do not need to be accounted for separately, the infinitesimal vector dx ∂f ∂h
∑ ∂xi ∑ ∂xi
n ( + λk ) 0 ∀i = 1,…, n

becomes independent and arbitrary. ⟹ βi d xi = 0 if f βi = 0 ∀ i = 1,…, n i=1 k=1
i=1

∇f(X*) + λ ∇h(X*) = 0
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=1 n≥2
Method-IV: Formulation of Necessary & Sufficient Conditions (Example)
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(X ) ≡ x1 + x2 − 2 = 0

∇f(X*) + λ ∇h(X*) = 0
h(X ) ≡ x1 + x2 − 2 = 0

2(#1 − 1.5) 1 !1 + !2 = 2 !1 = 1
2(#2 − 1.5)
+) =0 ⟹ !1 = !2 ⟹!=1
1 !2 = 1

h(X ) ≡ 2 − x1 − x2 = 0

2(#1 − 1.5) −1 !1 + !2 = 2 !1 = 1
2(#2 − 1.5)
+)
−1
=0 ⟹ !1 = !2 ⟹ ! = -1
!2 = 1

∇f (X*) + λ ∇h(X*) = 0

Due to + here: is +ve, when the two gradients point opposite


Due to + here: is -ve, when the two gradients point to the same direction
𝜆
𝜆
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=2 n≥2
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K=2 n≥2
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2
‘`K Equality Constraints in n variables: Form a Lagrange function L(X, ) = f(X) + ∑ k hk(X)

= + !1 + !2 + ……. + !K ∇LX ≡ ∇f + λ1 ∇h1 + … + λK ∇hK = ∇Lλk ≡ hk

∇f (X*) + λ1 ∇h1(X*) + λ2 ∇h2(X*) + … + λK ∇hK (X*) = 0 ≡ ∇LX (X*) = 0 hk(X*) = 0 ≡ ∇Lλk(X*) = 0

To Avoid degeneracy of this equation to ∇f = 0


∇h1, ∇h2… ∇hK , should be linearly independent

Formalization through “Regular Point”

Given a problem: Min f (X ) subject to hk(X ) = 0 ∀ k = 1 … K

A point X* satisfying the constraints hk(X* = 0) is called a Regular Point of the Feasible Set if:
∙ f (X*) is differentiable K


∙ the gradient vectors of all constraints at X* are Linearly Independent, that is αk hk = 0 iff αk = 0 ∀ k = 0 … K
k=1

a set of vectors is linearly dependent if and only if one of them is zero or a linear combination of the others

25
𝛌
𝛌
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2

Lagrange Multiplier Theorem: First order Necessary Condition for Optimality

Given a problem: Min f (X ) subject to hk(X ) = 0 ∀ k = 1 … K

If X* is a Regular Point that is a local minimum for the problem, then K


If L(X, λ) = f (X ) + λk hk(X )
there exist "unique" Lagrange Multipliers, λk ∀ k = 1,…, K such that: k=1

∙ ∇f(X*) + λ1 ∇h1(X*) + λ2 ∇h2(X*) + … + λK ∇hK (X*) = 0 ∇LX (X*) = 0


∙ hk(X*) = 0 ∀ k = 1,…, K ∇Lλk(X*) = 0 ∀ k = 1,…, K

Lagrange Multiplier Theorem: First order Necessary Condition for Optimality;


Second order Conditions?
Geometrically, if we move away from a stationary point (X*, λ*) For feasibility: h(X ) = 0 and h(X + d ) = 0
Also h(X + d ) = h(X ) + ∇h T d + R2
along a direction d that satisfies the linearized constraints, ∇hkT (X ) d = 0 ∀ k = 1,…, K Hence, the linearized constraint is given
by ∇h T d = 0
then, the Hessian of the Lagrangian in the subspace defined by d, given by:

∙ d T ∇2 LX (X*) d should be non-negative : ……………………………d T ∇2 LX (X*) d ≥ 0 Second-oder Necessary C

∙ d T ∇2 LX (X*) d should be greater than Zero : ………………………d T ∇2 LX (X*) d > 0 Second-oder Sufficient C

26
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2
Lagrange Multiplier Theorem: Necessary Condition for Optimality - An Example
Minimize f (X ) = x1 + x2 subject to h(X ) ≡ x12 + x22 − 2 = 0
∙ Form the Lagrange function: L(x1, x2, λ) = x1 + x2 + λ(x12 + x22 − 2)
∙ Apply the NCC - Lagrange Multiplier Theorem:

∇LX,λ(X*) = 0 ⟹ ∇LX (X*) = ⟹

1
⟹ ∇Lλ(X*) = x12 + x22 −2=0 ⟹ λ =±
2
∙ Sufficient condition
Directions d = (d1, d2)T that satisfy the linearized constraints are given by:
" $1 d1 + d2 = 0
T
∇h d = 0 [2x1 2x2]
!1
!
=0
2
"
[- - ]
#
=0
# $2 d =−d { 2 1

2" 0
Hessian of the Lagrangian at the stationary points is ∇2 LX (X*) =
0 2"
2" 0 d1 2
Consequently, the Hessian of the Lagrangian in the subspace defined by d is d T ∇2 LX (X*)d = [d1 -d1] = 4"d1
0 2" −d1

x1 x2 d T ∇2 LX d = 4λd12 Nature Summary: we seek Positive Definite Hessian Matrix:


1/2 -1 -1 2d12 PD
-1/2 1 1 −2d12 ND • of the Lagrange Function ∇2 LX (X*)

X = [x1 x2]T = [−1 − 1]T satisfies both NCC/SFC • in a subspace defined by the linearized constraints: ∇h T d = 0
X = [x1 x2]T = [−1 − 1]T is optimum

27
𝜆
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2

Lagrange Multiplier Theorem: An Application

Design for minimum cost, a cylindrical can closed at both the ends, with a Volume V

Treating the radius and height of the can as the design variables, namely, r and b, respectively
the problem can be formulated as: Minimize f (r, b) = r 2 + rb subject to h ≡ πr 2b − V = 0
∙ Form the Lagrange function: L(r, b, λ) = r 2 + rb + λ(πr 2b − V )
∙ Apply the NCC - Lagrange Multiplier Theorem:
∂L ∂L
∇LX (X*) = 0 ≡ ∇Lr,b(X*) = 0 = 2r + b + 2πrbλ = 0 … Eqn(1) = r + b + πr 2 λ = 0 … Eqn(2)
∂r ∂b

∂L V
∇Lλ(X*) = 0 = πr 2b − V = 0 … Eqn(3) b= … Eqn(4)
∂λ πr 2

−(2πr 3 + V )
∙ Substituting Eqn(4) in Eqn(1): λ = … Eqn(5)
2πrV
−1
∙ Substituting Eqn(4) in Eqn(2): λ = … Eqn(6)
πr
V 2 1/3
Equating λ in Eqn(5) and Eqn(6) leads to: r* = [ ]1/3 λ* = − [ 2 ]
2π πV
4V 1/3 V 2/3
… and subsequently b* = [ ] f * = 3[ ]
π 2π

28
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-III M=1 J=0 K≥1 n≥2
Minimize f (r, b) = r 2 + rb V 1/3 4V 1/3 2 1/3 V 2/3
r* = [ ] b* = [ ] λ* = − [ ] f * = 3[ ]
subject to h ≡ πr 2b − V = 0 2π π πV
2 2π
2 2
L(r, b, λ) = r + rb + λ(πr b − V ) 2%&' ∂2 L ∂2 L ∂2 L
!h = = 2 + 2πbλ = 1 + 2πrλ =1
%&2 ∂r∂r ∂r∂b ∂b∂b
∙ Sufficient condition
Directions d = (d1, d2)T that satisfy the linearized constraints are given by:∇h T d = 0
' &1
[2#$% #$2] 1 = 0
'2 [2# %]
&2
=0 d2 = − 4d1

Hessian of the Lagrangian at the stationary points is ∇2 LX (X*) = = =

Consequently, the Hessian of the Lagrangian in the subspace defined by d is d T ∇2 LX (X*)d :


−2 −1 d1 2
[d1 -4d1] = 22d1
−1 1 −4d1

V 1/3 4V 1/3 2 1/3 V 2/3


r* = [ ] b* = [ ] λ* = − [ ] f * = 3[ ]
2π π πV
2 2π

Satisfies both the Necessary and Sufficient Conditions: Hence - optimum

29
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
We have seen: Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0
Now consider: Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to g(x1, x2) ≡ x1 + x2 − 2 ≤ 0

If we can convert this to equality: LMT can be applied


How about h(x1, x2) ≡ x1 + x2 − 2 + s = 0 where s ≥ 0 ?
An attempt to eliminate one inequality has given rise to another inequality
This could be avoided through: h(x1, x2) ≡ x1 + x2 − 2 + s 2 = 0

Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 + s 2 = 0
Let the Lagrange Function be L(X, μ, s) ≡ L(x1, x2, μ, s) = (x1 − 1.5)2 + (x2 − 1.5)2 + μ (x1 + x2 − 2 + s 2)

LMT ⟹ ∇LX (X, μ, s) = 0; ∇Lμ(X, μ, s) = 0; and ∇Ls(X, μ, s) = 0


∂L
n-variables; n-equations LX (X, μ, s) = 0 ⟹ = 0 ⟹ 2(x1 − 1.5) + μ = 0 … Eqn(1)
J-inequality; ∂x1
K-equality ∂L
⟹ = 0 ⟹ 2(x2 − 1.5) + μ = 0 … Eqn(2)
∂x2
∂L
J-equations =0 ⟹ ⟹ ⟹ ⟹ x1 + x2 − 2 + s 2 = 0 … Eqn(3)
∂μ These w.r.t.
n+2J+K slack variables (s)
equations ∂L are called
J-equations =0 ⟹ ⟹ ⟹ ⟹ 2 μ s = 0… … Eqn(4)
for ∂s “Switching equations”
leading to
n+2J+K
2 Solution cases
J
unknowns K-equations ....if there were K equality constraints

30
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2

Both μ; s can be zero simultaneously


However, this case could arrive from
1 Switching conditions; 2 solution cases either μ = 0 or s = 0
Solution case-I: s =0 (no slack required implies g=0) Solution case-II: =0

Eqn(1) and Eqn(2) ⟹ x1 = x2 Eqn(1) and Eqn(2) ⟹ x1 = x2 = 1.5


Eqn(3) ⟹ x1 + x2 = 2
} x1 = x2 = 1 Eqn(3) ⟹ s 2 = − 1
Eqn(1) or Eqn(2)

μ=1 Infeasible solution case

Graphical Insights in the case of a single Inequality Constraint

LMT: ∇LX (X, μ, s) = 0; ∇Lμ(X, μ, s) = 0; and ∇Ls(X, μ, s) = 0

− ∇g ∇f (X ) + μ ∇g(x) = 0 ⟹ ∇f (X ) = μ[ − ∇g(x)]

⟹ ∇f (X ) is a scalar multiple of − ∇g(x)

In the specific situation above happens to be Positive

In General: Would always be positive, or can it be negative or Zero??

31
𝜇
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2
subject to g(x1, x2) ≡ x1 + x2 − 2 ≤ 0 subject to g1(X ) ≡ x1 + x2 − 2 ≤ 0 and g2(X ) ≡ x1 − 1 ≤ 0

x*
1
= x*
2
= μ* = 1

− ∇g

LMT ⟹ ∇f (X ) is a scalar multiple of − ∇g(x)

Graphically: scalar multiple μ could only be positive


In General: Would always be positive? LMT: x*
1
= x*
2
= 1; μ*
1
= 1 with g1 = 0 ≡ s1 = 0; μ*
2
= 0 with g2 = 0 ≡ s2 = 0

Critical Revelations

∙ μ could also be zero besides being positive


∙ μ Active constraints may have both Positive or Zero Mutipliers

∙ Switching conditions μs = 0 led to two solution cases μ = 0 or s = 0


though both can simultaneously be zero (no need to worry as the case
of both being simultaneously zero arrives from either μ = 0 or s = 0)

In General: Would always be Non-negative (positive or zero) or could it be negative too?


32
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Can the Lagrange Multiplier for Inequality Constraint ( ) be negative as for Equality Constraint ( )?

∇f + μ ∇g = 0⃗ ⟹ S ⋅ 0⃗ = 0 ⟹ S T[ ∇f + μ ∇g] = 0

⟹ S T ∇f + μS T ∇g = 0 …(a)
S

}
}
If X* is to be a local minimum, then along S : f value By geometrical construct: ∇g and S are on the
should increase or at worst remain the same, that is, the opposite side to the tangent at X*
angle between ∇f and S SHOULD BE ACUTE ⟹ the angle between them is OBTUSE

⟹ S T ∇f ≥ 0 …(b) ⟹ S T ∇g < 0 …(c)

S T ∇f + μS T ∇g = 0 ⟹ μ[−S T ∇g] = S T ∇f
⟹ μ[positive] = S T ∇f

⟹ μ[positive] ≥ 0 ⟹ μ ≥ 0

33
𝜇
𝛌
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Can the Lagrange Multiplier for Inequality Constraint ( ) be negative as for Equality Constraint ( )?

At the Constraint boundary


...two characteristics...
For g ≤ 0 : ∇g will always point away
from the feasible region ⟹ S T ∇ g < 0
S T ∇f = μ[positive]
At X* : a feasible direction At X* : NO feasible direction
S may exist, along which S T ∇f < 0 μ≡<0 μ≡≥0 S T ∇f ≥ 0 S may exist, along which
reduction in f: possible reduction in f: possible
34
𝜇
𝛌
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2

The Lagrange Multiplier in the case of Inequality Constraint ( ) has to NON-NEGATIVE (either Zero or Positive)

Candidate Optimum “inside” the feasible region: =0


Candidate Optimum “at” the constraint boundary: > 0 or =0

Note: LMT could be applied for two tasks:


∙ to test whether a given point is a local minimum or not?
∙ to find a candidate optimum

Here: when X* is not even known: how does one decide which of the constraints are to be treated as active
so they could be included in the Lagrange Function

Well! include ALL …because LMT will itself lead to μ = 0 for inactive constraint

g1 ≤ 0 g2 ≤ 0 If LMT leads to:


X*
12
∙ X*
12
: then LMT shall also ⟹ μ1 ≥ 0; μ2 ≥ 0; μ3 = 0

X* X* ∙ X*
13
: then LMT shall also ⟹ μ1 ≥ 0; μ3 ≥ 0; μ2 = 0
13 23

g3 ≤ 0 ∙ X*
23
: then LMT shall also ⟹ μ2 ≥ 0; μ3 ≥ 0; μ1 = 0

35
𝜇
𝜇
𝜇
𝜇
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Formulation
Let X* be a regular point of the feasible set, that is a local minimum for
Minimize f (X ) subject to: gj ≤ 0 ∀ j = i, …, J and hk = 0
∀ k = i, …, K
Then there exist unique Lagrange Multipliers μ ⃗ (J-vector: μj ≥ 0) and λ ⃗ (K-vector) such that the Lagrange F
is stationary w.r.t. X ,⃗ s,⃗ μ ,⃗ and λ,⃗ where the Lagrange function is of the form:
J K
μj [gj(X ) + sj2] +
∑ ∑
L(X, μ, λ, s) = f (X ) + λk hk(X )
j=1 k=1

Stationary w.r.t X j=J k=K

∑ ∑
∇LX (X*) = 0 ∇f (X*) + μj ∇gj(X*) + λk ∇hj(X*) = 0
= + ∑!j + ∑"K =0; where !j ≥ 0
j=1 k=1

where μj ≥ 0
Stationary w.r.t s Switching conditions: J;
∇Ls(X*) = 0 Solution cases: 2J
2 μj sj = 0 ∀ j = 1,…, J

Stationary w.r.t

∇Lμ(X*) = 0 gj + sj2 = 0 ⟹ gj ≤ 0 or sj2 ≥ 0 Constraint Satisfaction/


Feasibility Check
Stationary w.r.t hk = 0
∇Lλ(X*) = 0
Regularity check At X* the gradient vectors of all the ACTIVE constraints:
gj = 0 & hk = 0 should be linearly independent
36
𝛌
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-I
Minimize f(X ) = x12 + x22 − 3x1x2 subject to g(X ) ≡ x12 + x22 − 6 ≤ 0
L(X, μ, s) = x12 + x22 − 3x1x2 + μ(x12 + x22 − 6 + s 2) = 0
Stationary w.r.t X
∂L 3x2 − 2x1
∇LX (X*) = 0 = 2x1 − 3x2 + μ2x1 = 0 ⟹ μ = … Eqn(1)
∂x1 2x1
∂L 3x1 − 2x2
= 2x2 − 3x1 + μ2x2 = 0 ⟹ μ = … Eqn(2)
∂x2 2x2
Stationary w.r.t ∂L
= x12 + x22 − 6 + s 2 = 0… Eqn(3)
∇Lμ(X*) = 0 ∂μ

Stationary w.r.t s ∂L I s=0


= 2μs = 0… Eqn(4) Switching conditions: J = 1; Solution cases: 2J = 2
∇Ls(X*) = 0 ∂s II =0
Solution Case-I s=0 Eqn(3): x12 + x22 = 6,… Eqn(5)
Equating μ in Eqns(1) and (2) ⟹
x2 x
= 1 ⟹ x12 = x22, … Eqn(6) }⟹ x1 = ± 3; x2 = ± 3
x1 x2

x1 = 3 x1 = − 3 Solving cases x1 = 3 x1 = − 3 Two candidate Optimum from case-I


I(a) to I(d)
x2 = 3 I(a) I(b) x2 = 3 =0.5 =-2.5 x1 = 3 x2 = 3 =0.5
using
x2 = − 3 I(c) I(d) x2 = − 3 =-2.5 =0.5 x1 = − 3 x2 = − 3 =0.5
Eqns(1) and (2)

37
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-I
Minimize f(X ) = x12 + x22 − 3x1x2 subject to g(X ) ≡ x12 + x22 − 6 ≤ 0
L(X, μ, s) = x12 + x22 − 3x1x2 + μ(x12 + x22 − 6 + s 2) = 0
Stationary w.r.t X
∂L 3x2 − 2x1
∇LX (X*) = 0 = 2x1 − 3x2 + μ2x1 = 0 ⟹ μ = … Eqn(1)
∂x1 2x1
∂L 3x1 − 2x2
= 2x2 − 3x1 + μ2x2 = 0 ⟹ μ = … Eqn(2)
∂x2 2x2
Stationary w.r.t ∂L
= x12 + x22 − 6 + s 2 = 0… Eqn(3)
∇Lμ(X*) = 0 ∂μ

Stationary w.r.t s ∂L I s=0


= 2μs = 0… Eqn(4) Switching conditions: J = 1; Solution cases: 2J = 2
∇Ls(X*) = 0 ∂s II =0
Solution Case-II =0 Eqn(1) ⟹ 2x1 = 3x2 … Eqn(7)

Candidate
⟹ x1 = 0; x2 = 0; s 2 = 6 (feasible)
Eqn(2) ⟹ 3x1 = 2x2 … Eqn(8) Optimum

f =0

f =−3

Regularity Check is trivial here

38
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-II
Maximize f (X ) = 2x1 + 2x2 − x12 − x22 − 2 subject to g1(X ) ≡ 2x1 + x2 ≥ 4; and g2(X ) ≡ x1 + 2x2 ≥ 4

Minimize f (X ) = x12 + x22 − 2x1 − 2x2 + 2 subject to g1(X ) ≡ 4 − 2x1 − x2 ≤ 0; and g2(X ) ≡ 4 − x1 − 2x2 ≤ 0

L(X, μ1, s1, μ2, s2) = x12 + x22 − 2x1 − 2x2 + 2 + μ1[4 − 2x1 − x2 + s12] + μ2[4 − x1 − 2x2 + s22]

Stationary w.r.t X
∂L
= 2x1 − 2 − 2μ1 − μ2 = 0… Eqn(1) Case-I 1=0 & 2=0
∂x1
∇LX (X*) = 0 Eqn(1) ⟹ x1 = 1 Eqn(3) ⟹ s12 = − 1
∂L
= 2x2 − 2 − μ1 − 2μ2 = 0… Eqn(2) Eqn(2) ⟹ x2 = 1 Eqn(4) ⟹ s22 =−1

Infeasible
∂x2

∂L Case-II s1=0 & 2=0 x1 = 1.4


Stationary w.r.t = 4 − 2x1 − x2 + s12 = 0… Eqn(3) x2 = 1.2
∂μ1 Eqn(1) ⟹ 2x1 − 2 − 2μ1 = 0… Eqn(7)
∇Lμ(X*) = 0 μ1 = 0.4
∂L 2
= 4 − x1 − 2x2 + s2 = 0… Eqn(4)
Eqn(2) ⟹ 2x2 − 2 − μ1 = 0… Eqn(8) } s 2 : ( − )ve
∂μ2 Eqn(3) ⟹ 4 − 2x1 − x2 = 0… Eqn(9) 2
Eqn(4) ⟹ remains intact Infeasible
Stationary w.r.t s ∂L
= 2μ1s1 = 0… Eqn(5)
∂s1
∇Ls(X*) = 0 ⟹ μ1 = 0 or s1 = 0 Case-III 1=0 & s2=0 x1 = 1.2
Eqn(1) ⟹ 2x1 − 2 − μ2 = 0… Eqn(10) x2 = 1.4
∂L
∂s2
= 2μ2s2 = 0… Eqn(6)
Eqn(2) ⟹ 2x2 − 2 − 2μ2 = 0… Eqn(11) }2 μ2 = 0.4
⟹ μ2 = 0 or s2 = 0 Eqn(3) ⟹ remains intact s1 : ( − )ve
Eqn(4) ⟹ 4 − x1 − 2x2 = 0… Eqn(12) Infeasible
μ1 = 0 s1 = 0
Switching conditions: J = 2;
μ2 = 0 I II
J
Solution cases: 2 = 4 s2 = 0 III IV
39
𝛍
𝛍
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-II
Maximize f (X ) = 2x1 + 2x2 − x12 − x22 − 2 subject to g1(X ) ≡ 2x1 + x2 ≥ 4; and g2(X ) ≡ x1 + 2x2 ≥ 4

Minimize f (X ) = x12 + x22 − 2x1 − 2x2 + 2 subject to g1(X ) ≡ 4 − 2x1 − x2 ≤ 0; and g2(X ) ≡ 4 − x1 − 2x2 ≤ 0

L(X, μ1, s1, μ2, s2) = x12 + x22 − 2x1 − 2x2 + 2 + μ1[4 − 2x1 − x2 + s12] + μ2[4 − x1 − 2x2 + s22]

Stationary w.r.t X
∂L
= 2x1 − 2 − 2μ1 − μ2 = 0… Eqn(1) Case-I 1=0 & 2=0 Case-II s1=0 & 2=0
∂x1
}Infeasible


∇LX (X*) = 0 Case-III 1=0 & s2=0
∂L
= 2x2 − 2 − μ1 − 2μ2 = 0… Eqn(2) Case-IV s1=0 & s2=0
∂x2
Eqn(3) ⟹ 2x1 + x2 = 4… Eqn(13) 4
∂L Eqn(4) ⟹ x1 + 2x2 = 4… Eqn(14) }
x1 = x2 =
3 2
Stationary w.r.t = 4 − 2x1 − x2 + s12 = 0… Eqn(3)
∇Lμ(X*) = 0
∂μ1 Plugging x1 & x2 in Eqn(1) and (2) ⟹ μ1 = μ2 = > 0
9
∂L 4/3 −2 −1
= 4 − x1 − 2x2 + s22 = 0… Eqn(4) Regularity check X*=
4/3
g1(X*)=
−1
g2(X*)=
−2
∂μ2
∂L Since ∇g1 ≠ α ∇g2 KKT conditions are met and X* is a
Stationary w.r.t s = 2μ1s1 = 0… Eqn(5) candidate optima
∂s1
∇Ls(X*) = 0 ⟹ μ1 = 0 or s1 = 0
∂L
= 2μ2s2 = 0… Eqn(6)
∂s2
⟹ μ2 = 0 or s2 = 0

μ1 = 0 s1 = 0
Switching conditions: J = 2;
μ2 = 0 I II
J
Solution cases: 2 = 4 s2 = 0 III IV
40
𝛍
𝛍
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Alternative Formulation
Let X* be a regular point of the feasible set, that is a local minimum for
Minimize f (X ) subject to: gj ≤ 0 ∀ j = i, …, J ∀ k = i, …, K
and hk ≤ 0
Then there exist unique Lagrange Multipliers μ ⃗ (a J-vector: μj ≥ 0) and λ ⃗ (a K-vector) such that the Lagrange
function is stationary w.r.t. X ,⃗ s,⃗ μ ,⃗ and λ,⃗ where the Lagrange function is of the form:
J K J K

∑ ∑
μj [gj(X ) + sj2] + L(X, μ, λ) = f (X ) + μj gj(X ) + λk hk(X )
∑ ∑
L(X, μ, λ, s) = f (X ) + λk hk(X )
j=1 k=1 j=1 k=1

Stationary w.r.t X j=J k=K

∑ ∑
∇LX (X*) = 0 ∇f (X*) + μj ∇gj(X*) + λk ∇hj(X*) = 0
j=1 k=1 } Same

where μj ≥ 0 not all simultaneously 0


Stationary w.r.t s 2 μj sj = 0 ∀ j = 1,…, J μj gj = 0 ∀ j = 1,…, J
∇Ls(X*) = 0
(2μj sj = 0 ≡ 2μj sj2 = 0 ≡ − 2μj gj = 0 since gj + sj2 = 0)

Stationary w.r.t gj + sj2 = 0 ⟹ gj ≤ 0 or sj2 ≥ 0 gj ≤ 0


∇Lμ(X*) = 0

Stationary w.r.t hk = 0 } Same


∇Lλ(X*) = 0

At X* the gradient vectors of all the ACTIVE


Regularity check
constraints should be linearly independent } Same
41
𝛌
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-III with Alternative Formulation
Minimize f (X ) = (x − 10)2 + (y − 8)2 subject to g1(X ) ≡ x + y − 12 ≤ 0; and g2(X ) ≡ x − 8 ≤ 0

L(X, μ1, μ2) = (x − 10)2 + (y − 8)2 + μ1(x + y − 12) + μ2(x − 8)

∂L Solution cases: 2J = 4
Stationary w.r.t X
= 2(x − 10) + μ1 + μ2 = 0… Eqn(1) Straight to switching conditions μ1 = 0 g1 = 0
∂x
∇LX (X*) = 0 μ1g1 = 0 and μ2g2 = 0 μ2 = 0
∂L I II
= 2(y − 8) + μ1 = 0… Eqn(2) g2 = 0
∂y III IV

Case-I 1=0 & 2=0 Case-III 1=0 & g2=0


g2(X ) = 0 ⟹ x = 8
Eqn(1) ⟹ x = 10
} g1(X ) ≡ 6 ≰ 0 Infeasible μ1 = 0 ⟹ from Eqn(2):y = 8
}g1(X ) ≡ 4 ≰ 0 Infeasible
Eqn(2) ⟹ y = 8

Case-II g1=0 & 2=0 Case-IV g1=0 & g2=0


g1(X ) = 0 ⟹ x + y = 12,…, Eqn(3) g2(X ) = 0 ⟹ x = 8 Eqn(2) ⟹ μ1 = 8
μ2 = 0 ⟹ from Eqn(1):μ1 = 20 − 2x g1(X ) = 0 ⟹ y = 4 }
Eqn(2) ⟹ μ2 = − 4 ≱ 0 Infeasible
⟹ from Eqn(2):μ1 = 16 − 2y
μ1 = 20 − 2x = 16 − 2y ⟹ x − y = 2,…, Eqn(4)
Eqn(3) and Eqn(4) ⟹ x = 7, y = 5
⟹ μ1 = 6, g2 = − 1 < 0
Its a candidate optimum since regularity is
ensured due to only one active constraint

42
𝛍
𝛍
𝛍
𝛍
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Attention! Type-IV M=1 J≥1 K≥1 n≥2

43
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Attention! Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-IV

(a) Infer the optimum graphically


(b) Validate the optimalit y of the above point using KKT conditions

2m

44
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Attention!
Karush Kuhn Tucker Conditions: Example-IV

2m

2m 2m

2m

45
SOME CONCEPTS
FACILITATING
AN INTERESTING INTERPRETATION
OF KKT CONDITIONS

….. AHEAD
VECTOR SPACE: SPANNING SET
A set of vectors X1, X2, …, XK is said to span the Vector space V, if any vector X ∈ V can be expressed through this


set of K vectors, in that: X = α1Xi
1 1 0 −1 1
X3 X2 Consider these vectors in R 2 : X1= X2= X3= X4= X5=
0 1 1 0 −1
Any vector X ∈ R 2 can be expressed as a linear combination of X1, X2, …, X5 .

X1 Hence, X1, X2, …, X5 can be said to span the Vector space V.


X4
Notably, X2 = X1 + X3; X4 = − X1; X5 = − X1 − X3
X5 This spanning set seems to be over defined because up to three vectors can be
expressed in terms of just two of them. Only {X1, X3} are enough

For a spanning set to be efficient, linear independence of its members is important

VECTOR SPACE: BASIS


A set of vectors X1, X2, …, XK is said to be a "basis" for a Vector space V, if: X1, X2, …, XK :

Notably:

∙ Different Basis for the same V may exist (but all will have the same cardinality)
∙ For a given Basis: its elements need not be Orthogonal (though its the most convenient one)

Example: Standard basis for R n : {e1, e2, …, en} where ei is an n-dimensional vector
with all elements=0, except for the i th element = 1
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Proof & Physical Interpretation of 1st and IInd Order KKT Conditions for Optimality
Minimize f (X ) subject to gj(x) ≤ 0 ∀ j = 1,…, J; hk(x) = 0 ∀ k = 1,…, K; X ∈ Rn

Difference in the nature of Equality and Inequality Constraints

Equality: hk(X)=0; X Rn Inequality: gj(X) ≤ 0; X Rn

Both sides of the Equality Constraint surface are infeasible Only one side of the Inequality Constraint surface is infeasible

If you move along or opposite to ∇h If you move along ∇g you become infeasible
you become infeasible but you remain feasible if you go opposite to ∇g

Each h reduces the domain (Rn) to a lower dimensional Each g does not reduce the domain (Rn) to a lower
subspace (hypersurface) which has a tangent plane dimensional subspace (hypersurface) but a subset of n-
possible at every point over it. dimensional space
The feasible directions 'd' are given by: The feasible directions 'd' are given by:
∇h T d = 0 ∇g T d ≤ 0
d belongs to the plane tangent to ∇h d belongs to the cone of feasibility

48
𝝴
𝝴
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Each h reduces the domain (Rn) to a lower dimensional subspace Each g does not reduce the domain (Rn) to a lower dimensional
(hypersurface) with a defined tangent plane at every point over it. subspace (hypersurface) but a subset of n-dimensional space

TP ≡ {d : ∇hkT d = 0 ∀ k = 1,…, K} CF ≡ {d : ∇gjT d ≤ 0 ∀ j ∈ JA}

Tangent plane Active (gA) & Inactive (gI) Inequality Constraint


The TP at any point is the “orthogonal complement” of the Constraint satisfaction/violation is:
Gradient of the “vector function” h(X): ∙ not altered, if X*2
is perturbed
∇h(X) ≡ [∇h1(X) ∇h2(X)….∇hK(X)] an (n x K) matrix Hence g(X* 2
) : Inactive
∙ Altered, if X*
1
is perturbed
TP: orthogonal to each ∇hk(X) Hence g(X* ) : Active
1
Constraint Qualification Collection of Inequality Constraints’ indices
∇h(X) ≡ [∇h1(X) ∇h2(X)….∇hK(X)] the (n x K) matrix JA = {j : gj = 0} JI = {j : gj ≠ 0}
is full rank
Active gs collectively: gA Inactive gs collectively: gI
All column vectors are linearly independent

Regular point Active (gA) Inequality Constraint ≡ Equality Constraint


If a feasible point X* satisfies the constraint qualification, it is a ∙ Like the direction 'd' constituting TP cannot be along ∇h
regular point (a feasible point where all the gradients of active 'd' cannot be along ∇gA
constraints are linearly independent). It implies that in the local
∙ Unlike the direction 'd' constituting TP cannot be along − ∇h
neighborhood of X*: all the constraints are applicable/relevant
'd' can still be along − ∇gA (in cone of feasibility)
Tangent Plane at a Feasible; Regular point Cone of Feasibility: CF
TP ≡ {d : ∇hT d = 0} ⟹ {d : ∇hkT d = 0 ∀ k = 1,…, K} CF ≡ {d : ∇gAT d ≤ 0} ⟹ {d : ∇gjT d ≤ 0 ∀ j ∈ JA}
d can not be along ‘K’ gradients ∇hk . Hence: DoF(d ) = n − K Two strategies to handle Inequality Constraints
Max. No. of Lin. Ind. vectors that could constitute TP = n − K Active set (keep gA list) Slack variable (just values like X)
49
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Necessary Condition of Optimality for a point X*: If at X* you CAN NOT FIND a direction ‘d’ which is BOTH
a FEASIBLE direction a DESCENT (USABLE) direction

Whether you can find a feasible direction or not depends on A direction would be a descent direction if it can offer
whether or not there are linearly independent vectors f (X ) < f (X*)
to form the basis (B) for THG ≡ Tp ∩ CF : Let rB = Rank(B)
A necessary condition to prevent a descent direction at X*
rB = Rank(B) = n − K − | JA | ⟹ n = rB + K + | JA | is that: − ∇f (X*) (which promises freduction) should have
0 component in the Tangent Plane ⟹ γ(B) = 0
X* ∈ R n ⟹ ∇f (X*) ∈ R n
⟹ − ∇f (X*) = γ(B) + λ( ∇h(X*)) + μA( ∇gA(X*)) − ∇f (X*) = λ( ∇h(X*)) + μA( ∇gA(X*))
⟹ − ∇f (X ) can be expressed as a linear combination of the ⟹ ∇f (X*) + λ( ∇h(X*)) + μ ( ∇g (X*)) = 0
A A
∙ linearly independent vectors constituting B
∙ Gradients of Equality Constraints ⟹ ∇f (X*) + λ( ∇h(X*)) + μA( ∇gA(X*)) + μI ( ∇gI (X*)) = 0
∙ Gradients of Active Inequality Constraints (with the insistance that μI = 0 where ∇gI ≠ 0)

⟹ ∇f (X*) + λ ∇h(X*) + μ ∇g(X*) = 0, where


μ = [μA μI ]T ∇g(X*) = [ ∇gA(X*) ∇gI (X*)]T

}
Complementary Condition:
∙ When gI ≠ 0 : μI = 0
∙ When gA = 0 : μA ≥ 0

50
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2

A necessary condition to prevent a


descent direction at X* is that
− ∇f (X*) (which promises freduction)
should have NO component in
the Tangent Plane : THG ≡ Tp ∩ CF
⟹ γ(B) = 0


μA ≥ 0 ensures that component of − ∇f (X*) can exist Even if ∃ a component of − ∇f (X*) along ± ∇h(X*),
only along + ∇gA(X*), which being infeasible is it is inconsequential to local optimality of X* since
inconsequential to local optimlity of X* the directions marked by ∇h are infeasible
51
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2

μA ≥ 0

52
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Necessary Condition of Optimality for a point X*: If at X* you CAN NOT FIND a direction ‘d’ which is BOTH
a FEASIBLE direction a DESCENT (USABLE) direction

whether or not there are linearly independent vectors


to form the basis (B) of the THG ≡ Tp ∩ CF

∙ If n − K − | JA | = 0 : FONCC met ∙ If n − K − | JA | ≥ 1 : check for Descent

∙ − ∇f (X*) is orthogonal to TP ∙ μA ≥ 0

What about the second order change?

1 T 2
d ∇LXX d
2

Second order Necessary Condition … that the second order change along d ∈ THG ≡ {TP ∩ CF} ≥ 0

Second order Sufficient Condition … that the second order change along d ∈ THG ≡ {TP ∩ CF} > 0 … with one additional
qualifier

53
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY

Equality Constrained Inequality + Equality Constrained


First order NCC LMT KKT
Second order NCC d ∈ TH ≡ {d : ∇hk(X*)T d = 0 ∀ k} d ∈ THG(X*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA}
(The Hessian of the
Lagrange in the Tangent dT ∇2 LX(X*)d ≥ 0 dT ∇2 LX(X*)d ≥ 0
Space to the Constrained
Hypersurface be positive
semi-definite)

Second order SFC d ∈ TH ≡ {d : ∇hk(X*)T d = 0 ∀ k} d ∈ THG


+
(X*, μ*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA+}
(The Hessian of the
Lagrange in the Tangent dT ∇2 LX(X*)d > 0 dT ∇2 LX(X*)d > 0
Space to the Constrained
Hypersurface be positive
definite)
NCC for X*: If at X* you CAN NOT FIND a direction ‘d’ which is BOTH - a Feasible and a Descent direction
z = Rank(B) = n − K − | JA | ⟹ NCC automatically satisfied if no of Equality + Active constraints = no of variables
+
Need for THG over THG :
∙ The possibility of finding d in the tangent plane depends on Rank(B) = n − K − | JA |
∙ Consider:gj = 0 withμj = 0 : Such a gj does not influence local optimality of X* YET reduces the possibility of finding a d
KKT (F0NCC) ⟹ Second order NCC if n − k − | JA | = 0
JA+ is less constrained than JA ⟹ more d qualify for T +(X*, μ*)
KKT (F0NCC) ⟹ Second order SFC if n − k − | JA+ | = 0
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Type-IV M=1 J≥1 K≥1 n≥2
Minimize f (X ) = x12 + x22 − 3x1x2 subject to g(X ) ≡ x12 + x22 − 6 ≤ 0 n − K − | JA | = n − K − | JA+ | = 1 ≠ 0
L(X, μ, s) = x12 + x22 − 3x1x2 + μ(x12 + x22 − 6 + s 2) = 0 ⟹ KKT is not S0NCC or SOSFC
Second order NCC dT ∇2 LX(X*)d ≥ 0
d ∈ THG(X*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA}
g=0
g=0 Second order SFC dT ∇2 LX(X*)d > 0
g≠0 +
d ∈ THG (X*, μ*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA+}

2#1 − 3#2 2 −3 2$1 − 3$2 + 2)$1 2 + 2% −3 2#1


f= 2
f= L= 2
L= g=
2#2 − 3#1 −3 2 2$2 − 3$1 + 2)$2 −3 2 + 2% 2#2
Case-II: X* = [0 0]T ; μ = 0 ; g ≠ 0

∗ 0 ∗ 2 −3 ∗ )1
g(X ) = 2
L(X ) = gT X d = [0 0] =0 ⟹ No relation can be induced between d1 and d2
0 −3 2 )2

2 −3 '
d T ∇2 LX (X*)d = d1 d2 −3 2 '1 = 2(d12 − 3d1d2 + d22) = 2[(d1 − d2)2 − d1d2] ≱ 0 conclusively ⟹ SONCC is not met
2

Case-I(a): X* = [ 3 3]T ; μ = 0.5 ; g = 0

2√3 ∗ 3 −3 ∗ ,1
⟹ d2 = − d1
∗ 2
g X = L X = gT X d = [2 3 2√3] =0
2√3 −3 3 ,2

3 −3 '
d T ∇2 LX (X*)d = d1 d2 −3 3 1 = 3(d1 − d2)2 = 12d12 ≥ 0 and also > 0 ⟹ both SONCC and SOSFC are met
'2

Case-I(b): X* = [− 3 − 3]T ; μ = 0.5 ; g = 0 Repeat the process to prove that SONCC & SOSFC are met

55
Thank You

You might also like