2.NCC-SFC-LMT-KKT 2

NECESSARY
&
SUFFICIENT
CONDITIONS
FOR OPTIMALITY
Dhish Kumar Saxena

Professor
Department of Mechanical & Industrial Engineering
(Joint Faculty, MFSDSAI)
IIT Roorkee
NECESSARY & SUFFICIENT CONDITIONS OF OPTIMALITY
Inequality Equality
Objectives Variables
Category Constraints Constraints
(M) (n)
(J) (K)
Type-I 1 0 0 1
⎬ Taylor Series
Type-II 1 0 0 ≥2
Lagrange
Type-III 1 0 1 ≥2
Multiplier T.
Type-IV 1 ≥1 ≥1 ≥2 Karush Kuhn

⎬ Tucker (KKT)
Type-V ≥2 ≥1 ≥1 ≥2 Conditions
2
Type-I M=1 J=0 K=0 n=1
Taylor’s Series helps represent a function as an infinite sum of terms that are calculated from
the function value and the derivatives of the function at a single point
h x h2 h3 hp p
x0 f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
h2 h3 hp p
f(x) − f(x0) = hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
h2 h3 hp p
hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1 ≥ 0
2! 3! 3!
First-order
hf′(x0) ≥ 0
Approximation
Suppose at x0: f′(x0) > 0 a negative h would violate the above condition
f′(x0) < 0 a positive h would violate the above condition
hf′(x0) ≥ 0 for this to hold true: f′(x0) = 0
First-order Necessary Condition for Optimality (FO-NCC) f′(x0) = 0

(if x0 satisfies the FO-NCC, it is an stationary point)

3
h x
x0 For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
h2 h3 hp p
hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1 ≥ 0
2! 3! 3!
Say, x0 satisfies the FO-NCC: it is an stationary point f′(x0) = 0
You need more conditions to infer about x0
Second-order h2
Approximation f′′(x0) ≥ 0
2!
Suppose at x0: f′′(x0) > 0 • a negative/positive h would not impact the above condition
∙ x0 is guaranteed to be a local minimum
f′′(x0) < 0 • a negative/positive h would not impact the above condition

∙ x0 is guaranteed to be a local maximum
f′′(x0) = 0 ∙ nothing conclusive can be inferred about x0 .
∙ it may be a local minimum or maximum
Second-order Necessary Condition for Optimality (SO-NCC) f′′(x0) ≥ 0

Second-order Sufficient Condition for Optimality (SO-SFC) f′′(x0) > 0

4

h x
x0 For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
h2 h3 hp p
hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1 ≥ 0 f′(x0) = 0 f′′(x0) = 0
2! 3! 3!
Third-order h3
f′′′(x0) ≥ 0
Approximation 3!
Suppose at x0: f′′′(x0) > 0 a negative h would violate the above condition
f′′′(x0) < 0 a positive h would violate the above condition
h 3f′′′(x0) ≥ 0 for this to hold true: f′′′(x0) = 0
Third-order Necessary Condition for Optimality (TO-NCC) f′′′(x0) = 0
Fourth-order h 4 IV
Approximation
f (x0) ≥ 0
4!
Suppose at x0: f IV (x0) > 0 • a negative/positive h would not impact the above condition
∙ x0 is guaranteed to be a local minimum
f IV (x0) < 0 • a negative/positive h would not impact the above condition
∙ x0 is guaranteed to be a local maximum
f IV (x0) = 0 ∙ nothing conclusive can be inferred about x0 .
∙ it may be a local minimum or maximum
Fourth-order Necessary Condition for Optimality (FrO-NCC) fIV(x0) ≥ 0
Fourth-order Sufficient Condition for Optimality (FrO-SFC) fIV(x0) > 0

5

Taylor’s Series helps represent a function as an infinite sum of terms that are calculated from
the function value and the derivatives of the function at a single point
h x h2 h3 hp p
x0 f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
h2 h3 hp p
f(x) − f(x0) = hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
For x0 to be a local minimum, LHS ≥ 0, implying RHS ≥ 0
First-order Necessary Condition for Optimality (FO-NCC) f′(x0) = 0

Second-order Necessary Condition for Optimality (SO-NCC) f′′(x0) ≥ 0
Second-order Sufficient Condition for Optimality (SO-SFC) f′′(x0) > 0
Third-order Necessary Condition for Optimality (TO-NCC) f′′′(x0) = 0
Fourth-order Necessary Condition for Optimality (FrO-NCC) fIV(x0) ≥ 0
Fourth-order Sufficient Condition for Optimality (FrO-SFC) fIV(x0) > 0
A point x0 is a local minimum iff the first non-zero element in the sequence
f k(x) is positive, and occures at even positive k

6
Type-II M=1 J=0 K=0 n≥2
h x h2 h3 hp p
n=1 x f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0) + f′′′(x0) + … + f (x0) + Rp+1
2! 3! 3!
n≥2 Unlike the case of n=1 where h was a scalar; in the case of n ≥ 2, h is a vector
ℎ"
representing the component-wise difference between the known point and the h=
ℎ#
unknown point
∂f ∂f 1 2 ∂2f ∂2f 2 ∂ 2
f
f(X ) = f(x, y) = f(x0 + hx, y0 + hy) = f(x0, y0) + [hx + hy ]x0,y0 + [hx 2 + 2hxhy + hy 2 ]x0,y0 + …
∂x ∂y 2 ∂ x ∂x∂y ∂y
1 2
f(X ) = f(x, y) = f(x0 + hx, y0 + hy) = f(x0, y0) + [hx fx + hy fy]x0,y0 + [hx fxx + 2hxhy fxy + hy2 fyy]x0,y0 + …
2
Again, h is a vector representing the component-wise difference ℎ1 &1 − &01
n ℎ2 &2 − &02
between the known point and the unknown point. Indices x, y can’t h= =
: :
serve more than 2 dimensions. Hence, indices 1,2,….n are used. ℎ% &% − &0%
n n n 2
∂f 1 ∂ f
f(X ) = f(x1, x2, …, xn) = f(x10 + h1, x20 + h2, …, xn0 + hn) = f (x10, x20, …, xn0) +
∑ ∂xj j 2 ∑ ∑ ∂xj∂xk j k
h+ hh +…
j=1 j=1 k=1
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h + …
2
∇f T (X0) :Gradient of the H :Hessian of the
function, evaluated at function, evaluated at
the known point X0 the known point X0

7
n n n 2
0 0 0 0 0 0 ∂f 1 ∂ f
f(X ) = f(x1, x2, …, xn) = f(x1 + h1, x2 + h2, …, xn + hn) = f (x1 , x2 , …, xn ) + ∑ hj + ∑ ∑ hj hk + …
j=1
∂xj 2 j=1 k=1 ∂xj∂xk
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h + …
2
∇f T (X0):Gradient of the function, H :Hessian of the function,
evaluated at the known point X0 evaluated at the known point X0
!" !"#
fi = !#$
f= H= Hij=
!$%!$&
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h
2
8
Next steps: • With reference to an example assess how good is this 2nd order approximation
• assess how good is this 2nd order approximation
{ • learn to compute Gradient & Hessian
• Understand more about Gradient and Hessian
Obtain a second-order Taylor’s expansion for the function f(X ) = 3x13x2 at X0 = (1,1)
T 1 T
f(X ) = f(X0) + ∇f (X0)h + h H h
2
9"21 "2 18#1#2 9#21 !1 − 1 9 18 9
f(X0) = 3 f= H= h= f(X0)= H(X0) =
3"31 9#21 0 !2 − 1 3 9 0
!1 − 1 18 9 !1 − 1
f(X) 3 + [9 3] + 0.5 [x1-1 x2-1]
!2 − 1 9 0 !2 − 1
f(X) ≃ 9x21 + 9x1 x2 – 18x1 – 6x2 + 9
Quality of Approximation
1 1.3 f(X) = 3 x31 x2 = 8.5683

X0 = X=
1 1.3 f(X) ≃ 9x21 + 9x1 x2 – 18x1 – 6x2 + 9 = 8.220
For a 30% change in the given point, the Taylor’s series approximation underestimates the original function by 4%
9
Type-II M=1 J=0 K=0 n≥2 Understanding the Gradient
hx = h cosθx Direction cosines of the vector joining X0(x0, y0) and X(x, y) hx = h ux
h hy = h cosθy Call these ux and uy respectively h y = h uy
x = x0 + hx = x0 + h ux y = y0 + hy = y0 + h uy
dϕ ∂ϕ d x ∂ϕ dy ∂ϕ dz
The variation of a scalar function ϕ over a distance h, is given by = ( )+ ( )+ ( )
dh ∂x dh ∂y dh ∂z dh
dϕ
=[
∂ϕ ∂ϕ ∂ϕ
]
x = x0 + h ux dϕ ∂ϕ ∂ϕ ∂ϕ !" dϕ
dh ∂x ∂y ∂z =[ ] !# = ∇ϕ ∙ u
dx dh ∂x ∂y ∂z ! dh
= ux $
dh
dϕ
= ∇ϕ ∙ u
dh
!"
!#
!$
⎬
⎬
Gradient of a u:direction cosines of the

scalar function displacement vector
10
dϕ !"
= ∇ϕ ∙ u where ∇ϕ = u ≡ !#
dh !$
Rate of change of a scalar function over a distance h = the dot

product between the function’s gradient and the direction cosines
of the displacement vector
dϕ
= ∇ϕ ∙ u = | ∇ϕ | | u | cosθ∇ϕ/u
dh
dϕ
| = | ∇ϕ | ……at θ∇ϕ/u = 0
dh max
dϕ
| = − | ∇ϕ | ……at θ∇ϕ/u = 180
dh min
Rate of change of a scalar function over a distance h is maximum
if one moves along the direction the function’s gradient
Now the natural question is: What is the direction the function’s gradient?
11
dϕ dϕ
= ∇ϕ ∙ u = | ∇ϕ | | u | cosθ∇ϕ/u | = | ∇ϕ | ……at θ∇ϕ/u = 0 or 180
dh dh max
Rate of change of a scalar function over a distance h is maximum
if one moves along the direction the function’s gradient
Now the natural question is: What is the direction the function’s gradient?
!"
=0
!ℎ
∇ϕ
=c !" '" '" '" !(
on
sta
nt !#
= [ '( ') '* !+
=∇1 . r’(t)
,-
{x(t), y(t), z(t)} ,.
t)
,/
r′(t)
r(
,.
∇" . r’(t)=0 The direction the function’s gradient at a point is

perpendicular to the tangent at that point
Rate of change of a scalar function over a distance h is maximum if one moves along the
direction the function’s gradient, that is, along the normal at that point

12
𝛷
Type-II M=1 J=0 K=0 n≥2 Understanding the HESSIAN
T 1 T !11 !12 h1
f(X ) = f(X0) + ∇f (X0)h + h H h hT H h = [h1 h2] = !11 h12 + 2!22 h1h2+ !22 h22
2 !21 !22 h2
Scalar function (in Quadratic Form)
Any Quadratic function can be expressed as: Q = X T MX
! # $ x1
f(x1, x2, x3) = 2x12 + 2x1x2 + 4x1x3 − 6x22 − 4x2 x3 + 5x32 =[x1 x2 x3] % −' ( x2
) * + x3
Lets compute a, b, c, d, e, f can be anything as long as a + d = 2; b + e = 4; c + f = − 4

Hessian
Matrix for the
same function Special case for M is when it is symmetrical
! # !
T
⟹ Q = X SX ⟹ a = d; b = e; c = f ⟹ S= # −% −!
! −! &
! # !
T H T
H= H= # −%# −! Q = X SX = X X
! −! %& 2
13
h2 Realize that Necessary and Sufficient Conditions
f(x) = f(x0 + h) = f(x0) + hf′(x0) + f′′(x0)
2! were defined based on the sign of f ′′(x0)
Similarly the sign of the scalar function h T H h should be known Theory of Quadratic functions helps
Characterization through
Qf(X) = X T SX is Definition
Eigenvalues ( λ )
Positive definite (PD) Qf(X) > 0 ∀ X ≠ 0 λi > 0 ∀ i = 1…n
Positive semi-definite (PSD) Qf(X) ≥ 0 ∀ X ≠ 0 λi ≥ 0 ∀ i = 1…n
Negative definite (ND) Qf(X) < 0 ∀ X ≠ 0 λi < 0 ∀ i = 1…n
Negative semi-definite (NSD) Qf(X) ≤ 0 ∀ X ≠ 0 λi ≤ 0 ∀ i = 1…n
Qf(X) > 0 ∀ for some X λi > 0 for some i

Indefinite (ID)
Qf(X) < 0 ∀ for some X λi < 0 for some i

14

h2 T 1 T
f(x) = f(x0) + hf′(x0) + f′′(x0) f(X ) = f(X0) + ∇f (X0)h + h H h
2! 2
For x0 to be a local minimum h2 1 T

hf′(x0) + f′′(x0) ≥ 0 ∇f T (X0)h + h H h≥0
f(x) − f(x0) ≥ 0 2! 2
First order NCC f′(x0) = 0 ∇f(X0) = 0⃗
′′ H is positive semi-definite
Second order NCC f (x0) ≥ 0
(λ ≥ 0)
Second order SFC ′′ H is positive definite

f (x0) > 0
(λ > 0)

15

Type-III M=1 J=0 K=1 n≥2
Method-I: Geometrical
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0
The best value of objective function is 0, at U(1.5, 1.5) - an infeasible point
The farther you move away from U:

the objective function value undesirably increases
you get closer to the feasible region
Balancing act: Move away from U but only the minimum you need to
become feasible - P is that point
Method-II: Algebraic
Product of Slopes of two perpendicular lines = -1

x2 − 1.5
( )(−1) = − 1 x1 = x2…Eqn(1)
x1 − 1.5
x1 + x2 = 2…Eqn(2)
x1 = 1; x2 = 1
Slope=-1
16
Method-III: Variable Reduction; Unconstrained Optimization
(Use of Equality Constraint

x2 = 2 − x1
for Variable reduction)
Minimize f (x1) = (x1 − 1.5)2 + (2 − x1 − 1.5)2
Minimize f (x1) = (x1 − 1.5)2 + (0.5 − x1)2 (Unconstrained Optimization Prob)
df
First-order Necessary Condition = 0, leading to x1 = 1; x2 = 1
d x1
d 2f
Second-order Necessary Condition = 4 ≥ 0, hence satisfied
d x12
Second-order Sufficiency Condition d 2f

= 4 > 0, hence satisfied
d x12
17
Method-IV: Formulation & Application of Necessary & Sufficient Conditions
Assume x2 is expressible as a function of x1: x2 = ϕ(x1)
Apply the first-order Necessary Condition (unconstrained, single-variable

problem) at the point (x*, x*)
1 2
df (x*
1 , x*
2) ∂f (x*
1 , x*
2) ∂f (x*
1 , x*
2 ) dx2
= 0 implying + =0
dx1 ∂x1 ∂x2 dx1
Forgoing (x* , x*)

1 2 for ease of representation
df ∂f ∂f dx2 x2 = ϕ(x1) ∂f ∂f dϕ
= 0 implying + =0 + = 0… Eqn(1)
dx1 ∂x1 ∂x2 dx1 ∂x1 ∂x2 dx1
dh(x*
1 , x*
2) ∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2 ) dx2
Since (x* , x*) ∈ h(x*
1 2
, x*) = 0
1 2 = 0 implying + =0
dx1 ∂x1 ∂x2 dx1
Forgoing (x* , x*)

1 2 for ease of representation
∂h ∂h dϕ
+ = 0… Eqn(2)
∂x1 ∂x2 dx1
18
∂f ∂f dϕ ∂h ∂h dϕ
+ = 0… Eqn(1) + = 0… Eqn(2)
∂x1 ∂x2 dx1 ∂x1 ∂x2 dx1
∂f ∂h
dϕ ∂x1 ∂x1
− = ∂f
= ∂h
… Eqn(3)
d x1
∂x2 ∂x2
Collect Partial Derivatives w.r.t x1 and x2 together
∂f ∂f ∂f ∂h
+λ =0
∂x1 ∂x2 λ : free in sign ∂x1 ∂x1
Eqn(4)…
∂h
= ∂h
=−λ
∂f ∂h
∂x1 ∂x2 +λ =0
∂x2 ∂x2
To be a candidate optima, a point X*,
that is, (x* , x*) needs to satisfy:
1 2
+! =0
∂f (x*
1 , x*
2) ∂h(x*
1 , x*
2)
+λ =0
∂x1 ∂x1
h(x* , x*) = 0 Grad f
1 2 λ=−
∂f (x*
1 , x*
2) ∂h(x*
1 , x*
2) Grad h
+λ =0
∂x2 ∂x2 ∇f (X*) + λ ∇h(X*) = 0 (free in sign)
h(x* , x*) = 0
1 2 h(x* , x*) = 0
1 2
19
Method-IV: Formulation of Necessary & Sufficient Conditions (Alternative Derivation)
At a stationary point X* ≡ {x* , x*} the total derivative of the function = 0, that is
1 2
∂f ∂f
df = d x1 + d x2 = 0 ≡ ∇f T d x = 0…Eqn(1)
∂x1 ∂x2
Unlike unconstrained problem, the infinitesimal vector d x ≡ {d x1, d x2} in a constrained problem can not be arbitrary .
Any variation d x1 and d x2 in (x*, x*) is permissible only when h(x*

1 2 1
+ d x1, x*
2
+ d x2) = 0
∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2)
∂h ∂h
h(x* + d x1, x* + d x2) = h(x* , x*) + d x1 + d x2 = 0 dh = d x1 + d x2 = 0 ≡ ∇h T d x = 0……Eqn(2)
1 2 1 2 ∂x1 ∂x2 ∂x1 ∂x2
∂h ∂h
∂x1 ∂f ∂f ∂x1
dx2 = − dx1 …Eqn(3) Plugging Eqn(3) in Eqn(1) df = d x1 − d x1 = 0…Eqn(4)
∂h ∂x1 ∂x2 ∂h
∂x2 ∂x2
∂h
∂f ∂f ∂x1
( − )d x1 = 0
∂x1 ∂x2 ∂h
∂x2
∂f
∂f ∂f
∂f ∂h ∂f ∂h ∂f ∂x2 ∂h
− =0 − =0 ∂x2 ∂x1
∂x1 ∂x2 ∂x2 ∂x1 ∂x1 ∂h ∂x1 −λ = −λ =
∂x2 ∂h ∂h
∂x2 ∂x1
!" !ℎ
∂f ∂h ∂f ∂h
!#1 !#1 = 0 +λ =0 +λ =0
!" !ℎ ∂x1 ∂x1 ∂x2 ∂x2
!#2 !#2
Columns representing the gradients of f and h Grad f +! =0

(λ = − : free in sign)
are proportional to each other Grad h
20
Method-IV: Formulation of Necessary & Sufficient Conditions (Alternative Derivation)
∂f ∂f
At a stationary point X* ≡ {x* , x*} the total derivative of the function = 0, that is df =
1 2
d x1 + d x2 = 0 ≡ ∇f T d x = 0…Eqn(1)
∂x1 ∂x2
Unlike unconstrained problem, the infinitesimal vector d X ≡ {d x1, d x2} in a constrained problem can not be arbitrary .
Any variation d x1 and d x2 in (x*, x*) is permissible only whenh(x*

1 2 1
+ d x1, x*
2
+ d x2) = 0
∂h(x*
1 , x*
2) ∂h(x*
1 , x*
2)
∂h ∂h
h(x* + d x1, x* + d x2) = h(x* , x*) + d x1 + d x2 = 0 dh = d x1 + d x2 = 0 ≡ ∇h T d x = 0……Eqn(2)
1 2 1 2 ∂x1 ∂x2 ∂x1 ∂x2
Recall that the gradient of a function is orthogonal to its contours. Thus, since the displacement dX satisfies the constraint contour,
(the straight line in this case), it follows that dX is orthogonal to ∇h .
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(X ) ≡ x1 + x2 − 2 = 0 Minimize f (x1) = (x1 − 1.5)2 + (2 − x1 − 1.5)2
Lagrange : one could multiply each constraint variation by a scalar & add it to the objective function
K n K n n K
∂f ∂h ∂f ∂h
∑
df + λk dhk ≡
∑ ∂xi ∑ k ∑ ∂xi ∑ ∂xi ∑ k ∂xi
( ) d xi + λ ( d xi ) ≡ ( + λ ) d xi
k=1 i=1 k=1 i=1 i=1 k=1
}
}
This formulation jointly accounts for the variation in the objective function and the constraints. This equals 0, if f
n K
Now that the constraints do not need to be accounted for separately, the infinitesimal vector dx ∂f ∂h
∑ ∂xi ∑ ∂xi
n ( + λk ) 0 ∀i = 1,…, n
∑
becomes independent and arbitrary. ⟹ βi d xi = 0 if f βi = 0 ∀ i = 1,…, n i=1 k=1
i=1
∇f(X*) + λ ∇h(X*) = 0
Method-IV: Formulation of Necessary & Sufficient Conditions (Example)
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(X ) ≡ x1 + x2 − 2 = 0
∇f(X*) + λ ∇h(X*) = 0
h(X ) ≡ x1 + x2 − 2 = 0
2(#1 − 1.5) 1 !1 + !2 = 2 !1 = 1
2(#2 − 1.5)
+) =0 ⟹ !1 = !2 ⟹!=1
1 !2 = 1
h(X ) ≡ 2 − x1 − x2 = 0
2(#1 − 1.5) −1 !1 + !2 = 2 !1 = 1
2(#2 − 1.5)
+)
−1
=0 ⟹ !1 = !2 ⟹ ! = -1
!2 = 1
∇f (X*) + λ ∇h(X*) = 0
Due to + here: is +ve, when the two gradients point opposite

Due to + here: is -ve, when the two gradients point to the same direction
𝜆
𝜆
Type-III M=1 J=0 K≥1 n≥2
‘`K Equality Constraints in n variables: Form a Lagrange function L(X, ) = f(X) + ∑ k hk(X)
= + !1 + !2 + ……. + !K ∇LX ≡ ∇f + λ1 ∇h1 + … + λK ∇hK = ∇Lλk ≡ hk
∇f (X*) + λ1 ∇h1(X*) + λ2 ∇h2(X*) + … + λK ∇hK (X*) = 0 ≡ ∇LX (X*) = 0 hk(X*) = 0 ≡ ∇Lλk(X*) = 0
To Avoid degeneracy of this equation to ∇f = 0

∇h1, ∇h2… ∇hK , should be linearly independent
Formalization through “Regular Point”
Given a problem: Min f (X ) subject to hk(X ) = 0 ∀ k = 1 … K
A point X* satisfying the constraints hk(X* = 0) is called a Regular Point of the Feasible Set if:
∙ f (X*) is differentiable K
∑
∙ the gradient vectors of all constraints at X* are Linearly Independent, that is αk hk = 0 iff αk = 0 ∀ k = 0 … K
k=1
a set of vectors is linearly dependent if and only if one of them is zero or a linear combination of the others
25
𝛌
𝛌
Lagrange Multiplier Theorem: First order Necessary Condition for Optimality
Given a problem: Min f (X ) subject to hk(X ) = 0 ∀ k = 1 … K
If X* is a Regular Point that is a local minimum for the problem, then K
∑
If L(X, λ) = f (X ) + λk hk(X )
there exist "unique" Lagrange Multipliers, λk ∀ k = 1,…, K such that: k=1
∙ ∇f(X*) + λ1 ∇h1(X*) + λ2 ∇h2(X*) + … + λK ∇hK (X*) = 0 ∇LX (X*) = 0

∙ hk(X*) = 0 ∀ k = 1,…, K ∇Lλk(X*) = 0 ∀ k = 1,…, K
Lagrange Multiplier Theorem: First order Necessary Condition for Optimality;

Second order Conditions?
Geometrically, if we move away from a stationary point (X*, λ*) For feasibility: h(X ) = 0 and h(X + d ) = 0
Also h(X + d ) = h(X ) + ∇h T d + R2
along a direction d that satisfies the linearized constraints, ∇hkT (X ) d = 0 ∀ k = 1,…, K Hence, the linearized constraint is given
by ∇h T d = 0
then, the Hessian of the Lagrangian in the subspace defined by d, given by:
∙ d T ∇2 LX (X*) d should be non-negative : ……………………………d T ∇2 LX (X*) d ≥ 0 Second-oder Necessary C
∙ d T ∇2 LX (X*) d should be greater than Zero : ………………………d T ∇2 LX (X*) d > 0 Second-oder Sufficient C
26
Lagrange Multiplier Theorem: Necessary Condition for Optimality - An Example
Minimize f (X ) = x1 + x2 subject to h(X ) ≡ x12 + x22 − 2 = 0
∙ Form the Lagrange function: L(x1, x2, λ) = x1 + x2 + λ(x12 + x22 − 2)
∙ Apply the NCC - Lagrange Multiplier Theorem:
∇LX,λ(X*) = 0 ⟹ ∇LX (X*) = ⟹
1
⟹ ∇Lλ(X*) = x12 + x22 −2=0 ⟹ λ =±
2
∙ Sufficient condition
Directions d = (d1, d2)T that satisfy the linearized constraints are given by:
" $1 d1 + d2 = 0
T
∇h d = 0 [2x1 2x2]
!1
!
=0
2
"
[- - ]
#
=0
# $2 d =−d { 2 1
2" 0
Hessian of the Lagrangian at the stationary points is ∇2 LX (X*) =
0 2"
2" 0 d1 2
Consequently, the Hessian of the Lagrangian in the subspace defined by d is d T ∇2 LX (X*)d = [d1 -d1] = 4"d1
0 2" −d1
x1 x2 d T ∇2 LX d = 4λd12 Nature Summary: we seek Positive Definite Hessian Matrix:

1/2 -1 -1 2d12 PD
-1/2 1 1 −2d12 ND • of the Lagrange Function ∇2 LX (X*)
X = [x1 x2]T = [−1 − 1]T satisfies both NCC/SFC • in a subspace defined by the linearized constraints: ∇h T d = 0
X = [x1 x2]T = [−1 − 1]T is optimum
27
𝜆
Lagrange Multiplier Theorem: An Application
Design for minimum cost, a cylindrical can closed at both the ends, with a Volume V
Treating the radius and height of the can as the design variables, namely, r and b, respectively
the problem can be formulated as: Minimize f (r, b) = r 2 + rb subject to h ≡ πr 2b − V = 0
∙ Form the Lagrange function: L(r, b, λ) = r 2 + rb + λ(πr 2b − V )
∙ Apply the NCC - Lagrange Multiplier Theorem:
∂L ∂L
∇LX (X*) = 0 ≡ ∇Lr,b(X*) = 0 = 2r + b + 2πrbλ = 0 … Eqn(1) = r + b + πr 2 λ = 0 … Eqn(2)
∂r ∂b
∂L V
∇Lλ(X*) = 0 = πr 2b − V = 0 … Eqn(3) b= … Eqn(4)
∂λ πr 2
−(2πr 3 + V )
∙ Substituting Eqn(4) in Eqn(1): λ = … Eqn(5)
2πrV
−1
∙ Substituting Eqn(4) in Eqn(2): λ = … Eqn(6)
πr
V 2 1/3
Equating λ in Eqn(5) and Eqn(6) leads to: r* = [ ]1/3 λ* = − [ 2 ]
2π πV
4V 1/3 V 2/3
… and subsequently b* = [ ] f * = 3[ ]
π 2π
28
Minimize f (r, b) = r 2 + rb V 1/3 4V 1/3 2 1/3 V 2/3
r* = [ ] b* = [ ] λ* = − [ ] f * = 3[ ]
subject to h ≡ πr 2b − V = 0 2π π πV
2 2π
2 2
L(r, b, λ) = r + rb + λ(πr b − V ) 2%&' ∂2 L ∂2 L ∂2 L
!h = = 2 + 2πbλ = 1 + 2πrλ =1
%&2 ∂r∂r ∂r∂b ∂b∂b
∙ Sufficient condition
Directions d = (d1, d2)T that satisfy the linearized constraints are given by:∇h T d = 0
' &1
[2#$% #$2] 1 = 0
'2 [2# %]
&2
=0 d2 = − 4d1
Hessian of the Lagrangian at the stationary points is ∇2 LX (X*) = = =
Consequently, the Hessian of the Lagrangian in the subspace defined by d is d T ∇2 LX (X*)d :

−2 −1 d1 2
[d1 -4d1] = 22d1
−1 1 −4d1
V 1/3 4V 1/3 2 1/3 V 2/3

r* = [ ] b* = [ ] λ* = − [ ] f * = 3[ ]
2π π πV
2 2π
Satisfies both the Necessary and Sufficient Conditions: Hence - optimum
29
Type-IV M=1 J≥1 K≥1 n≥2
We have seen: Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 = 0
Now consider: Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to g(x1, x2) ≡ x1 + x2 − 2 ≤ 0
If we can convert this to equality: LMT can be applied

How about h(x1, x2) ≡ x1 + x2 − 2 + s = 0 where s ≥ 0 ?
An attempt to eliminate one inequality has given rise to another inequality
This could be avoided through: h(x1, x2) ≡ x1 + x2 − 2 + s 2 = 0
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 subject to h(x1, x2) ≡ x1 + x2 − 2 + s 2 = 0
Let the Lagrange Function be L(X, μ, s) ≡ L(x1, x2, μ, s) = (x1 − 1.5)2 + (x2 − 1.5)2 + μ (x1 + x2 − 2 + s 2)
LMT ⟹ ∇LX (X, μ, s) = 0; ∇Lμ(X, μ, s) = 0; and ∇Ls(X, μ, s) = 0

∂L
n-variables; n-equations LX (X, μ, s) = 0 ⟹ = 0 ⟹ 2(x1 − 1.5) + μ = 0 … Eqn(1)
J-inequality; ∂x1
K-equality ∂L
⟹ = 0 ⟹ 2(x2 − 1.5) + μ = 0 … Eqn(2)
∂x2
∂L
J-equations =0 ⟹ ⟹ ⟹ ⟹ x1 + x2 − 2 + s 2 = 0 … Eqn(3)
∂μ These w.r.t.
n+2J+K slack variables (s)
equations ∂L are called
J-equations =0 ⟹ ⟹ ⟹ ⟹ 2 μ s = 0… … Eqn(4)
for ∂s “Switching equations”
leading to
n+2J+K
2 Solution cases
J
unknowns K-equations ....if there were K equality constraints
30
Both μ; s can be zero simultaneously

However, this case could arrive from
1 Switching conditions; 2 solution cases either μ = 0 or s = 0
Solution case-I: s =0 (no slack required implies g=0) Solution case-II: =0
Eqn(1) and Eqn(2) ⟹ x1 = x2 Eqn(1) and Eqn(2) ⟹ x1 = x2 = 1.5

Eqn(3) ⟹ x1 + x2 = 2
} x1 = x2 = 1 Eqn(3) ⟹ s 2 = − 1
Eqn(1) or Eqn(2)
μ=1 Infeasible solution case
Graphical Insights in the case of a single Inequality Constraint
LMT: ∇LX (X, μ, s) = 0; ∇Lμ(X, μ, s) = 0; and ∇Ls(X, μ, s) = 0
− ∇g ∇f (X ) + μ ∇g(x) = 0 ⟹ ∇f (X ) = μ[ − ∇g(x)]
⟹ ∇f (X ) is a scalar multiple of − ∇g(x)
In the specific situation above happens to be Positive
In General: Would always be positive, or can it be negative or Zero??
31
𝜇
𝛍
𝛍
Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2 Minimize f (x1, x2) = (x1 − 1.5)2 + (x2 − 1.5)2
subject to g(x1, x2) ≡ x1 + x2 − 2 ≤ 0 subject to g1(X ) ≡ x1 + x2 − 2 ≤ 0 and g2(X ) ≡ x1 − 1 ≤ 0
x*
1
= x*
2
= μ* = 1
− ∇g
LMT ⟹ ∇f (X ) is a scalar multiple of − ∇g(x)
Graphically: scalar multiple μ could only be positive

In General: Would always be positive? LMT: x*
1
= x*
2
= 1; μ*
1
= 1 with g1 = 0 ≡ s1 = 0; μ*
2
= 0 with g2 = 0 ≡ s2 = 0
Critical Revelations
∙ μ could also be zero besides being positive

∙ μ Active constraints may have both Positive or Zero Mutipliers
∙ Switching conditions μs = 0 led to two solution cases μ = 0 or s = 0

though both can simultaneously be zero (no need to worry as the case
of both being simultaneously zero arrives from either μ = 0 or s = 0)
In General: Would always be Non-negative (positive or zero) or could it be negative too?

32
𝛍
𝛍
Can the Lagrange Multiplier for Inequality Constraint ( ) be negative as for Equality Constraint ( )?
∇f + μ ∇g = 0⃗ ⟹ S ⋅ 0⃗ = 0 ⟹ S T[ ∇f + μ ∇g] = 0
⟹ S T ∇f + μS T ∇g = 0 …(a)
S
}
}
If X* is to be a local minimum, then along S : f value By geometrical construct: ∇g and S are on the
should increase or at worst remain the same, that is, the opposite side to the tangent at X*
angle between ∇f and S SHOULD BE ACUTE ⟹ the angle between them is OBTUSE
⟹ S T ∇f ≥ 0 …(b) ⟹ S T ∇g < 0 …(c)
S T ∇f + μS T ∇g = 0 ⟹ μ[−S T ∇g] = S T ∇f
⟹ μ[positive] = S T ∇f
⟹ μ[positive] ≥ 0 ⟹ μ ≥ 0
33
𝜇
𝛌
Can the Lagrange Multiplier for Inequality Constraint ( ) be negative as for Equality Constraint ( )?
At the Constraint boundary

...two characteristics...
For g ≤ 0 : ∇g will always point away
from the feasible region ⟹ S T ∇ g < 0
S T ∇f = μ[positive]
At X* : a feasible direction At X* : NO feasible direction
S may exist, along which S T ∇f < 0 μ≡<0 μ≡≥0 S T ∇f ≥ 0 S may exist, along which
reduction in f: possible reduction in f: possible
34
𝜇
𝛌
The Lagrange Multiplier in the case of Inequality Constraint ( ) has to NON-NEGATIVE (either Zero or Positive)
Candidate Optimum “inside” the feasible region: =0

Candidate Optimum “at” the constraint boundary: > 0 or =0
Note: LMT could be applied for two tasks:

∙ to test whether a given point is a local minimum or not?
∙ to find a candidate optimum
Here: when X* is not even known: how does one decide which of the constraints are to be treated as active
so they could be included in the Lagrange Function
Well! include ALL …because LMT will itself lead to μ = 0 for inactive constraint
g1 ≤ 0 g2 ≤ 0 If LMT leads to:

X*
12
∙ X*
12
: then LMT shall also ⟹ μ1 ≥ 0; μ2 ≥ 0; μ3 = 0
X* X* ∙ X*
13
13 23
g3 ≤ 0 ∙ X*
23
35
𝜇
𝜇
𝜇
𝜇
Karush Kuhn Tucker Conditions: Formulation
Let X* be a regular point of the feasible set, that is a local minimum for
Minimize f (X ) subject to: gj ≤ 0 ∀ j = i, …, J and hk = 0
∀ k = i, …, K
Then there exist unique Lagrange Multipliers μ ⃗ (J-vector: μj ≥ 0) and λ ⃗ (K-vector) such that the Lagrange F
is stationary w.r.t. X ,⃗ s,⃗ μ ,⃗ and λ,⃗ where the Lagrange function is of the form:
J K
μj [gj(X ) + sj2] +
∑ ∑
L(X, μ, λ, s) = f (X ) + λk hk(X )
j=1 k=1
Stationary w.r.t X j=J k=K
∑ ∑
∇LX (X*) = 0 ∇f (X*) + μj ∇gj(X*) + λk ∇hj(X*) = 0
= + ∑!j + ∑"K =0; where !j ≥ 0
j=1 k=1
where μj ≥ 0
Stationary w.r.t s Switching conditions: J;
∇Ls(X*) = 0 Solution cases: 2J
2 μj sj = 0 ∀ j = 1,…, J
Stationary w.r.t
∇Lμ(X*) = 0 gj + sj2 = 0 ⟹ gj ≤ 0 or sj2 ≥ 0 Constraint Satisfaction/

Feasibility Check
Stationary w.r.t hk = 0
∇Lλ(X*) = 0
Regularity check At X* the gradient vectors of all the ACTIVE constraints:
gj = 0 & hk = 0 should be linearly independent
36
𝛌
𝛍
Karush Kuhn Tucker Conditions: Example-I
Minimize f(X ) = x12 + x22 − 3x1x2 subject to g(X ) ≡ x12 + x22 − 6 ≤ 0
L(X, μ, s) = x12 + x22 − 3x1x2 + μ(x12 + x22 − 6 + s 2) = 0
Stationary w.r.t X
∂L 3x2 − 2x1
∇LX (X*) = 0 = 2x1 − 3x2 + μ2x1 = 0 ⟹ μ = … Eqn(1)
∂x1 2x1
∂L 3x1 − 2x2
= 2x2 − 3x1 + μ2x2 = 0 ⟹ μ = … Eqn(2)
∂x2 2x2
Stationary w.r.t ∂L
= x12 + x22 − 6 + s 2 = 0… Eqn(3)
∇Lμ(X*) = 0 ∂μ
Stationary w.r.t s ∂L I s=0

= 2μs = 0… Eqn(4) Switching conditions: J = 1; Solution cases: 2J = 2
∇Ls(X*) = 0 ∂s II =0
Solution Case-I s=0 Eqn(3): x12 + x22 = 6,… Eqn(5)
Equating μ in Eqns(1) and (2) ⟹
x2 x
= 1 ⟹ x12 = x22, … Eqn(6) ｝⟹ x1 = ± 3; x2 = ± 3
x1 x2
x1 = 3 x1 = − 3 Solving cases x1 = 3 x1 = − 3 Two candidate Optimum from case-I

I(a) to I(d)
x2 = 3 I(a) I(b) x2 = 3 =0.5 =-2.5 x1 = 3 x2 = 3 =0.5
using
x2 = − 3 I(c) I(d) x2 = − 3 =-2.5 =0.5 x1 = − 3 x2 = − 3 =0.5
Eqns(1) and (2)
37
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
𝛍
Karush Kuhn Tucker Conditions: Example-I
Minimize f(X ) = x12 + x22 − 3x1x2 subject to g(X ) ≡ x12 + x22 − 6 ≤ 0
L(X, μ, s) = x12 + x22 − 3x1x2 + μ(x12 + x22 − 6 + s 2) = 0
Stationary w.r.t X
∂L 3x2 − 2x1
∇LX (X*) = 0 = 2x1 − 3x2 + μ2x1 = 0 ⟹ μ = … Eqn(1)
∂x1 2x1
∂L 3x1 − 2x2
= 2x2 − 3x1 + μ2x2 = 0 ⟹ μ = … Eqn(2)
∂x2 2x2
Stationary w.r.t ∂L
= x12 + x22 − 6 + s 2 = 0… Eqn(3)
∇Lμ(X*) = 0 ∂μ
Stationary w.r.t s ∂L I s=0

= 2μs = 0… Eqn(4) Switching conditions: J = 1; Solution cases: 2J = 2
∇Ls(X*) = 0 ∂s II =0
Solution Case-II =0 Eqn(1) ⟹ 2x1 = 3x2 … Eqn(7)
｝
Candidate
⟹ x1 = 0; x2 = 0; s 2 = 6 (feasible)
Eqn(2) ⟹ 3x1 = 2x2 … Eqn(8) Optimum
f =0
f =−3
Regularity Check is trivial here
38
𝛍
𝛍
𝛍
Karush Kuhn Tucker Conditions: Example-II
Maximize f (X ) = 2x1 + 2x2 − x12 − x22 − 2 subject to g1(X ) ≡ 2x1 + x2 ≥ 4; and g2(X ) ≡ x1 + 2x2 ≥ 4
Minimize f (X ) = x12 + x22 − 2x1 − 2x2 + 2 subject to g1(X ) ≡ 4 − 2x1 − x2 ≤ 0; and g2(X ) ≡ 4 − x1 − 2x2 ≤ 0
L(X, μ1, s1, μ2, s2) = x12 + x22 − 2x1 − 2x2 + 2 + μ1[4 − 2x1 − x2 + s12] + μ2[4 − x1 − 2x2 + s22]
Stationary w.r.t X
∂L
= 2x1 − 2 − 2μ1 − μ2 = 0… Eqn(1) Case-I 1=0 & 2=0
∂x1
∇LX (X*) = 0 Eqn(1) ⟹ x1 = 1 Eqn(3) ⟹ s12 = − 1
∂L
= 2x2 − 2 − μ1 − 2μ2 = 0… Eqn(2) Eqn(2) ⟹ x2 = 1 Eqn(4) ⟹ s22 =−1
｝
Infeasible
∂x2
∂L Case-II s1=0 & 2=0 x1 = 1.4

Stationary w.r.t = 4 − 2x1 − x2 + s12 = 0… Eqn(3) x2 = 1.2
∂μ1 Eqn(1) ⟹ 2x1 − 2 − 2μ1 = 0… Eqn(7)
∇Lμ(X*) = 0 μ1 = 0.4
∂L 2
= 4 − x1 − 2x2 + s2 = 0… Eqn(4)
Eqn(2) ⟹ 2x2 − 2 − μ1 = 0… Eqn(8) ｝ s 2 : ( − )ve
∂μ2 Eqn(3) ⟹ 4 − 2x1 − x2 = 0… Eqn(9) 2
Eqn(4) ⟹ remains intact Infeasible
Stationary w.r.t s ∂L
= 2μ1s1 = 0… Eqn(5)
∂s1
∇Ls(X*) = 0 ⟹ μ1 = 0 or s1 = 0 Case-III 1=0 & s2=0 x1 = 1.2
Eqn(1) ⟹ 2x1 − 2 − μ2 = 0… Eqn(10) x2 = 1.4
∂L
∂s2
= 2μ2s2 = 0… Eqn(6)
Eqn(2) ⟹ 2x2 − 2 − 2μ2 = 0… Eqn(11) ｝2 μ2 = 0.4
⟹ μ2 = 0 or s2 = 0 Eqn(3) ⟹ remains intact s1 : ( − )ve
Eqn(4) ⟹ 4 − x1 − 2x2 = 0… Eqn(12) Infeasible
μ1 = 0 s1 = 0
Switching conditions: J = 2;
μ2 = 0 I II
J
Solution cases: 2 = 4 s2 = 0 III IV
39
𝛍
𝛍
𝛍
𝛍
𝛍
Karush Kuhn Tucker Conditions: Example-II
Maximize f (X ) = 2x1 + 2x2 − x12 − x22 − 2 subject to g1(X ) ≡ 2x1 + x2 ≥ 4; and g2(X ) ≡ x1 + 2x2 ≥ 4
Minimize f (X ) = x12 + x22 − 2x1 − 2x2 + 2 subject to g1(X ) ≡ 4 − 2x1 − x2 ≤ 0; and g2(X ) ≡ 4 − x1 − 2x2 ≤ 0
L(X, μ1, s1, μ2, s2) = x12 + x22 − 2x1 − 2x2 + 2 + μ1[4 − 2x1 − x2 + s12] + μ2[4 − x1 − 2x2 + s22]
Stationary w.r.t X
∂L
= 2x1 − 2 − 2μ1 − μ2 = 0… Eqn(1) Case-I 1=0 & 2=0 Case-II s1=0 & 2=0
∂x1
｝Infeasible
｝
∇LX (X*) = 0 Case-III 1=0 & s2=0
∂L
= 2x2 − 2 − μ1 − 2μ2 = 0… Eqn(2) Case-IV s1=0 & s2=0
∂x2
Eqn(3) ⟹ 2x1 + x2 = 4… Eqn(13) 4
∂L Eqn(4) ⟹ x1 + 2x2 = 4… Eqn(14) ｝
x1 = x2 =
3 2
Stationary w.r.t = 4 − 2x1 − x2 + s12 = 0… Eqn(3)
∇Lμ(X*) = 0
∂μ1 Plugging x1 & x2 in Eqn(1) and (2) ⟹ μ1 = μ2 = > 0
9
∂L 4/3 −2 −1
= 4 − x1 − 2x2 + s22 = 0… Eqn(4) Regularity check X*=
4/3
g1(X*)=
−1
g2(X*)=
−2
∂μ2
∂L Since ∇g1 ≠ α ∇g2 KKT conditions are met and X* is a
Stationary w.r.t s = 2μ1s1 = 0… Eqn(5) candidate optima
∂s1
∇Ls(X*) = 0 ⟹ μ1 = 0 or s1 = 0
∂L
= 2μ2s2 = 0… Eqn(6)
∂s2
⟹ μ2 = 0 or s2 = 0
μ1 = 0 s1 = 0
Switching conditions: J = 2;
μ2 = 0 I II
J
Solution cases: 2 = 4 s2 = 0 III IV
40
𝛍
𝛍
𝛍
𝛍
𝛍
Karush Kuhn Tucker Conditions: Alternative Formulation
Let X* be a regular point of the feasible set, that is a local minimum for
Minimize f (X ) subject to: gj ≤ 0 ∀ j = i, …, J ∀ k = i, …, K
and hk ≤ 0
Then there exist unique Lagrange Multipliers μ ⃗ (a J-vector: μj ≥ 0) and λ ⃗ (a K-vector) such that the Lagrange
function is stationary w.r.t. X ,⃗ s,⃗ μ ,⃗ and λ,⃗ where the Lagrange function is of the form:
J K J K
∑ ∑
μj [gj(X ) + sj2] + L(X, μ, λ) = f (X ) + μj gj(X ) + λk hk(X )
∑ ∑
L(X, μ, λ, s) = f (X ) + λk hk(X )
j=1 k=1 j=1 k=1
Stationary w.r.t X j=J k=K
∑ ∑
∇LX (X*) = 0 ∇f (X*) + μj ∇gj(X*) + λk ∇hj(X*) = 0
j=1 k=1 } Same
where μj ≥ 0 not all simultaneously 0

Stationary w.r.t s 2 μj sj = 0 ∀ j = 1,…, J μj gj = 0 ∀ j = 1,…, J
∇Ls(X*) = 0
(2μj sj = 0 ≡ 2μj sj2 = 0 ≡ − 2μj gj = 0 since gj + sj2 = 0)
Stationary w.r.t gj + sj2 = 0 ⟹ gj ≤ 0 or sj2 ≥ 0 gj ≤ 0

∇Lμ(X*) = 0
Stationary w.r.t hk = 0 } Same

∇Lλ(X*) = 0
At X* the gradient vectors of all the ACTIVE

Regularity check
constraints should be linearly independent } Same
41
𝛌
𝛍
Karush Kuhn Tucker Conditions: Example-III with Alternative Formulation
Minimize f (X ) = (x − 10)2 + (y − 8)2 subject to g1(X ) ≡ x + y − 12 ≤ 0; and g2(X ) ≡ x − 8 ≤ 0
L(X, μ1, μ2) = (x − 10)2 + (y − 8)2 + μ1(x + y − 12) + μ2(x − 8)
∂L Solution cases: 2J = 4
Stationary w.r.t X
= 2(x − 10) + μ1 + μ2 = 0… Eqn(1) Straight to switching conditions μ1 = 0 g1 = 0
∂x
∇LX (X*) = 0 μ1g1 = 0 and μ2g2 = 0 μ2 = 0
∂L I II
= 2(y − 8) + μ1 = 0… Eqn(2) g2 = 0
∂y III IV
Case-I 1=0 & 2=0 Case-III 1=0 & g2=0

g2(X ) = 0 ⟹ x = 8
Eqn(1) ⟹ x = 10
｝ g1(X ) ≡ 6 ≰ 0 Infeasible μ1 = 0 ⟹ from Eqn(2):y = 8
｝g1(X ) ≡ 4 ≰ 0 Infeasible
Eqn(2) ⟹ y = 8
Case-II g1=0 & 2=0 Case-IV g1=0 & g2=0

g1(X ) = 0 ⟹ x + y = 12,…, Eqn(3) g2(X ) = 0 ⟹ x = 8 Eqn(2) ⟹ μ1 = 8
μ2 = 0 ⟹ from Eqn(1):μ1 = 20 − 2x g1(X ) = 0 ⟹ y = 4 ｝
Eqn(2) ⟹ μ2 = − 4 ≱ 0 Infeasible
⟹ from Eqn(2):μ1 = 16 − 2y
μ1 = 20 − 2x = 16 − 2y ⟹ x − y = 2,…, Eqn(4)
Eqn(3) and Eqn(4) ⟹ x = 7, y = 5
⟹ μ1 = 6, g2 = − 1 < 0
Its a candidate optimum since regularity is
ensured due to only one active constraint
42
𝛍
𝛍
𝛍
𝛍
Attention! Type-IV M=1 J≥1 K≥1 n≥2
43
Attention! Type-IV M=1 J≥1 K≥1 n≥2
Karush Kuhn Tucker Conditions: Example-IV
(a) Infer the optimum graphically

(b) Validate the optimalit y of the above point using KKT conditions
2m
44
Attention!
Karush Kuhn Tucker Conditions: Example-IV
2m
2m 2m
2m
45
SOME CONCEPTS
FACILITATING
AN INTERESTING INTERPRETATION
OF KKT CONDITIONS
….. AHEAD
VECTOR SPACE: SPANNING SET
A set of vectors X1, X2, …, XK is said to span the Vector space V, if any vector X ∈ V can be expressed through this
∑
set of K vectors, in that: X = α1Xi
1 1 0 −1 1
X3 X2 Consider these vectors in R 2 : X1= X2= X3= X4= X5=
0 1 1 0 −1
Any vector X ∈ R 2 can be expressed as a linear combination of X1, X2, …, X5 .
X1 Hence, X1, X2, …, X5 can be said to span the Vector space V.

X4
Notably, X2 = X1 + X3; X4 = − X1; X5 = − X1 − X3
X5 This spanning set seems to be over defined because up to three vectors can be
expressed in terms of just two of them. Only {X1, X3} are enough
For a spanning set to be efficient, linear independence of its members is important
VECTOR SPACE: BASIS

A set of vectors X1, X2, …, XK is said to be a "basis" for a Vector space V, if: X1, X2, …, XK :
Notably:
∙ Different Basis for the same V may exist (but all will have the same cardinality)
∙ For a given Basis: its elements need not be Orthogonal (though its the most convenient one)
Example: Standard basis for R n : {e1, e2, …, en} where ei is an n-dimensional vector
with all elements=0, except for the i th element = 1
Proof & Physical Interpretation of 1st and IInd Order KKT Conditions for Optimality
Minimize f (X ) subject to gj(x) ≤ 0 ∀ j = 1,…, J; hk(x) = 0 ∀ k = 1,…, K; X ∈ Rn
Difference in the nature of Equality and Inequality Constraints
Equality: hk(X)=0; X Rn Inequality: gj(X) ≤ 0; X Rn
Both sides of the Equality Constraint surface are infeasible Only one side of the Inequality Constraint surface is infeasible
If you move along or opposite to ∇h If you move along ∇g you become infeasible
you become infeasible but you remain feasible if you go opposite to ∇g
Each h reduces the domain (Rn) to a lower dimensional Each g does not reduce the domain (Rn) to a lower
subspace (hypersurface) which has a tangent plane dimensional subspace (hypersurface) but a subset of n-
possible at every point over it. dimensional space
The feasible directions 'd' are given by: The feasible directions 'd' are given by:
∇h T d = 0 ∇g T d ≤ 0
d belongs to the plane tangent to ∇h d belongs to the cone of feasibility
48
𝝴
𝝴
Each h reduces the domain (Rn) to a lower dimensional subspace Each g does not reduce the domain (Rn) to a lower dimensional
(hypersurface) with a defined tangent plane at every point over it. subspace (hypersurface) but a subset of n-dimensional space
TP ≡ {d : ∇hkT d = 0 ∀ k = 1,…, K} CF ≡ {d : ∇gjT d ≤ 0 ∀ j ∈ JA}
Tangent plane Active (gA) & Inactive (gI) Inequality Constraint

The TP at any point is the “orthogonal complement” of the Constraint satisfaction/violation is:
Gradient of the “vector function” h(X): ∙ not altered, if X*2
is perturbed
∇h(X) ≡ [∇h1(X) ∇h2(X)….∇hK(X)] an (n x K) matrix Hence g(X* 2
) : Inactive
∙ Altered, if X*
1
is perturbed
TP: orthogonal to each ∇hk(X) Hence g(X* ) : Active
1
Constraint Qualification Collection of Inequality Constraints’ indices
∇h(X) ≡ [∇h1(X) ∇h2(X)….∇hK(X)] the (n x K) matrix JA = {j : gj = 0} JI = {j : gj ≠ 0}
is full rank
Active gs collectively: gA Inactive gs collectively: gI
All column vectors are linearly independent
Regular point Active (gA) Inequality Constraint ≡ Equality Constraint

If a feasible point X* satisfies the constraint qualification, it is a ∙ Like the direction 'd' constituting TP cannot be along ∇h
regular point (a feasible point where all the gradients of active 'd' cannot be along ∇gA
constraints are linearly independent). It implies that in the local
∙ Unlike the direction 'd' constituting TP cannot be along − ∇h
neighborhood of X*: all the constraints are applicable/relevant
'd' can still be along − ∇gA (in cone of feasibility)
Tangent Plane at a Feasible; Regular point Cone of Feasibility: CF
TP ≡ {d : ∇hT d = 0} ⟹ {d : ∇hkT d = 0 ∀ k = 1,…, K} CF ≡ {d : ∇gAT d ≤ 0} ⟹ {d : ∇gjT d ≤ 0 ∀ j ∈ JA}
d can not be along ‘K’ gradients ∇hk . Hence: DoF(d ) = n − K Two strategies to handle Inequality Constraints
Max. No. of Lin. Ind. vectors that could constitute TP = n − K Active set (keep gA list) Slack variable (just values like X)
49
Necessary Condition of Optimality for a point X*: If at X* you CAN NOT FIND a direction ‘d’ which is BOTH
a FEASIBLE direction a DESCENT (USABLE) direction
Whether you can find a feasible direction or not depends on A direction would be a descent direction if it can offer
whether or not there are linearly independent vectors f (X ) < f (X*)
to form the basis (B) for THG ≡ Tp ∩ CF : Let rB = Rank(B)
A necessary condition to prevent a descent direction at X*
rB = Rank(B) = n − K − | JA | ⟹ n = rB + K + | JA | is that: − ∇f (X*) (which promises freduction) should have
0 component in the Tangent Plane ⟹ γ(B) = 0
X* ∈ R n ⟹ ∇f (X*) ∈ R n
⟹ − ∇f (X*) = γ(B) + λ( ∇h(X*)) + μA( ∇gA(X*)) − ∇f (X*) = λ( ∇h(X*)) + μA( ∇gA(X*))
⟹ − ∇f (X ) can be expressed as a linear combination of the ⟹ ∇f (X*) + λ( ∇h(X*)) + μ ( ∇g (X*)) = 0
A A
∙ linearly independent vectors constituting B
∙ Gradients of Equality Constraints ⟹ ∇f (X*) + λ( ∇h(X*)) + μA( ∇gA(X*)) + μI ( ∇gI (X*)) = 0
∙ Gradients of Active Inequality Constraints (with the insistance that μI = 0 where ∇gI ≠ 0)
⟹ ∇f (X*) + λ ∇h(X*) + μ ∇g(X*) = 0, where

μ = [μA μI ]T ∇g(X*) = [ ∇gA(X*) ∇gI (X*)]T
}
Complementary Condition:
∙ When gI ≠ 0 : μI = 0
∙ When gA = 0 : μA ≥ 0
50
A necessary condition to prevent a

descent direction at X* is that
− ∇f (X*) (which promises freduction)
should have NO component in
the Tangent Plane : THG ≡ Tp ∩ CF
⟹ γ(B) = 0
⤐
⤐
μA ≥ 0 ensures that component of − ∇f (X*) can exist Even if ∃ a component of − ∇f (X*) along ± ∇h(X*),
only along + ∇gA(X*), which being infeasible is it is inconsequential to local optimality of X* since
inconsequential to local optimlity of X* the directions marked by ∇h are infeasible
51
μA ≥ 0
52
Necessary Condition of Optimality for a point X*: If at X* you CAN NOT FIND a direction ‘d’ which is BOTH
a FEASIBLE direction a DESCENT (USABLE) direction
whether or not there are linearly independent vectors

to form the basis (B) of the THG ≡ Tp ∩ CF
∙ If n − K − | JA | = 0 : FONCC met ∙ If n − K − | JA | ≥ 1 : check for Descent
∙ − ∇f (X*) is orthogonal to TP ∙ μA ≥ 0
What about the second order change?
1 T 2
d ∇LXX d
2
Second order Necessary Condition … that the second order change along d ∈ THG ≡ {TP ∩ CF} ≥ 0
Second order Sufficient Condition … that the second order change along d ∈ THG ≡ {TP ∩ CF} > 0 … with one additional
qualifier
53
Equality Constrained Inequality + Equality Constrained

First order NCC LMT KKT
Second order NCC d ∈ TH ≡ {d : ∇hk(X*)T d = 0 ∀ k} d ∈ THG(X*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA}
(The Hessian of the
Lagrange in the Tangent dT ∇2 LX(X*)d ≥ 0 dT ∇2 LX(X*)d ≥ 0
Space to the Constrained
Hypersurface be positive
semi-definite)
Second order SFC d ∈ TH ≡ {d : ∇hk(X*)T d = 0 ∀ k} d ∈ THG

+
(X*, μ*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA+}
(The Hessian of the
Lagrange in the Tangent dT ∇2 LX(X*)d > 0 dT ∇2 LX(X*)d > 0
Space to the Constrained
Hypersurface be positive
definite)
NCC for X*: If at X* you CAN NOT FIND a direction ‘d’ which is BOTH - a Feasible and a Descent direction
z = Rank(B) = n − K − | JA | ⟹ NCC automatically satisfied if no of Equality + Active constraints = no of variables
+
Need for THG over THG :
∙ The possibility of finding d in the tangent plane depends on Rank(B) = n − K − | JA |
∙ Consider:gj = 0 withμj = 0 : Such a gj does not influence local optimality of X* YET reduces the possibility of finding a d
KKT (F0NCC) ⟹ Second order NCC if n − k − | JA | = 0
JA+ is less constrained than JA ⟹ more d qualify for T +(X*, μ*)
KKT (F0NCC) ⟹ Second order SFC if n − k − | JA+ | = 0
Minimize f (X ) = x12 + x22 − 3x1x2 subject to g(X ) ≡ x12 + x22 − 6 ≤ 0 n − K − | JA | = n − K − | JA+ | = 1 ≠ 0
L(X, μ, s) = x12 + x22 − 3x1x2 + μ(x12 + x22 − 6 + s 2) = 0 ⟹ KKT is not S0NCC or SOSFC
Second order NCC dT ∇2 LX(X*)d ≥ 0
d ∈ THG(X*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA}
g=0
g=0 Second order SFC dT ∇2 LX(X*)d > 0
g≠0 +
d ∈ THG (X*, μ*) = {d ∈ R n : ∇hkT d = 0 ∀ k; ∇gjT d ≤ 0, j ∈ JA+}
2#1 − 3#2 2 −3 2$1 − 3$2 + 2)$1 2 + 2% −3 2#1

f= 2
f= L= 2
L= g=
2#2 − 3#1 −3 2 2$2 − 3$1 + 2)$2 −3 2 + 2% 2#2
Case-II: X* = [0 0]T ; μ = 0 ; g ≠ 0
∗ 0 ∗ 2 −3 ∗ )1
g(X ) = 2
L(X ) = gT X d = [0 0] =0 ⟹ No relation can be induced between d1 and d2
0 −3 2 )2
2 −3 '
d T ∇2 LX (X*)d = d1 d2 −3 2 '1 = 2(d12 − 3d1d2 + d22) = 2[(d1 − d2)2 − d1d2] ≱ 0 conclusively ⟹ SONCC is not met
2
Case-I(a): X* = [ 3 3]T ; μ = 0.5 ; g = 0
2√3 ∗ 3 −3 ∗ ,1
⟹ d2 = − d1
∗ 2
g X = L X = gT X d = [2 3 2√3] =0
2√3 −3 3 ,2
3 −3 '
d T ∇2 LX (X*)d = d1 d2 −3 3 1 = 3(d1 − d2)2 = 12d12 ≥ 0 and also > 0 ⟹ both SONCC and SOSFC are met
'2
Case-I(b): X* = [− 3 − 3]T ; μ = 0.5 ; g = 0 Repeat the process to prove that SONCC & SOSFC are met
55
Thank You

2.NCC-SFC-LMT-KKT 2

Uploaded by

Copyright:

Available Formats

You might also like

2.NCC-SFC-LMT-KKT 2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2.NCC-SFC-LMT-KKT 2

Uploaded by

Copyright:

Available Formats

NECESSARY

Dhish Kumar Saxena

Type-IV 1 ≥1 ≥1 ≥2 Karush Kuhn

f′(x0) < 0 a positive h would violate the above condition

hf′(x0) ≥ 0 for this to hold true: f′(x0) = 0

First-order Necessary Condition for Optimality (FO-NCC) f′(x0) = 0

f′′(x0) < 0 • a negative/positive h would not impact the above condition

Second-order Necessary Condition for Optimality (SO-NCC) f′′(x0) ≥ 0

First-order Necessary Condition for Optimality (FO-NCC) f′(x0) = 0

• Understand more about Gradient and Hessian

f(X) ≃ 9x21 + 9x1 x2 – 18x1 – 6x2 + 9

1 1.3 f(X) = 3 x31 x2 = 8.5683

Gradient of a u:direction cosines of the

Rate of change of a scalar function over a distance h = the dot

∇" . r’(t)=0 The direction the function’s gradient at a point is

Scalar function (in Quadratic Form)

Any Quadratic function can be expressed as: Q = X T MX

Lets compute a, b, c, d, e, f can be anything as long as a + d = 2; b + e = 4; c + f = − 4

Positive definite (PD) Qf(X) > 0 ∀ X ≠ 0 λi > 0 ∀ i = 1…n

Positive semi-definite (PSD) Qf(X) ≥ 0 ∀ X ≠ 0 λi ≥ 0 ∀ i = 1…n

Negative definite (ND) Qf(X) < 0 ∀ X ≠ 0 λi < 0 ∀ i = 1…n

Negative semi-definite (NSD) Qf(X) ≤ 0 ∀ X ≠ 0 λi ≤ 0 ∀ i = 1…n

Qf(X) > 0 ∀ for some X λi > 0 for some i

For x0 to be a local minimum h2 1 T

First order NCC f′(x0) = 0 ∇f(X0) = 0⃗

Second order SFC ′′ H is positive definite

The best value of objective function is 0, at U(1.5, 1.5) - an infeasible point

The farther you move away from U:

Product of Slopes of two perpendicular lines = -1

Method-III: Variable Reduction; Unconstrained Optimization

(Use of Equality Constraint

Minimize f (x1) = (x1 − 1.5)2 + (2 − x1 − 1.5)2

Minimize f (x1) = (x1 − 1.5)2 + (0.5 − x1)2 (Unconstrained Optimization Prob)

Second-order Sufficiency Condition d 2f

Assume x2 is expressible as a function of x1: x2 = ϕ(x1)

Apply the first-order Necessary Condition (unconstrained, single-variable

Forgoing (x* , x*)

Forgoing (x* , x*)

Collect Partial Derivatives w.r.t x1 and x2 together

Any variation d x1 and d x2 in (x*, x*) is permissible only when h(x*

Columns representing the gradients of f and h Grad f +! =0

Any variation d x1 and d x2 in (x*, x*) is permissible only whenh(x*

Due to + here: is +ve, when the two gradients point opposite

= + !1 + !2 + ……. + !K ∇LX ≡ ∇f + λ1 ∇h1 + … + λK ∇hK = ∇Lλk ≡ hk

∇f (X*) + λ1 ∇h1(X*) + λ2 ∇h2(X*) + … + λK ∇hK (X*) = 0 ≡ ∇LX (X*) = 0 hk(X*) = 0 ≡ ∇Lλk(X*) = 0

To Avoid degeneracy of this equation to ∇f = 0

Formalization through “Regular Point”

Given a problem: Min f (X ) subject to hk(X ) = 0 ∀ k = 1 … K

Lagrange Multiplier Theorem: First order Necessary Condition for Optimality

Given a problem: Min f (X ) subject to hk(X ) = 0 ∀ k = 1 … K

If X* is a Regular Point that is a local minimum for the problem, then K

∙ ∇f(X*) + λ1 ∇h1(X*) + λ2 ∇h2(X*) + … + λK ∇hK (X*) = 0 ∇LX (X*) = 0

Lagrange Multiplier Theorem: First order Necessary Condition for Optimality;

∙ d T ∇2 LX (X*) d should be non-negative : ……………………………d T ∇2 LX (X*) d ≥ 0 Second-oder Necessary C

∇LX,λ(X*) = 0 ⟹ ∇LX (X*) = ⟹

x1 x2 d T ∇2 LX d = 4λd12 Nature Summary: we seek Positive Definite Hessian Matrix:

Lagrange Multiplier Theorem: An Application

Hessian of the Lagrangian at the stationary points is ∇2 LX (X*) = = =

Consequently, the Hessian of the Lagrangian in the subspace defined by d is d T ∇2 LX (X*)d :

Any variation d x1 and d x2 in (x, x) is permissible only when h(x*

Any variation d x1 and d x2 in (x, x) is permissible only whenh(x*

∇f (X) + λ1 ∇h1(X) + λ2 ∇h2(X) + … + λK ∇hK (X) = 0 ≡ ∇LX (X) = 0 hk(X) = 0 ≡ ∇Lλk(X*) = 0

∙ ∇f(X) + λ1 ∇h1(X) + λ2 ∇h2(X) + … + λK ∇hK (X) = 0 ∇LX (X*) = 0

∙ d T ∇2 LX (X) d should be non-negative : ……………………………d T ∇2 LX (X) d ≥ 0 Second-oder Necessary C

∇LX,λ(X) = 0 ⟹ ∇LX (X) = ⟹

⟹ ∇f (X) + λ ∇h(X) + μ ∇g(X*) = 0, where