Karush–Kuhn–Tucker conditions explained

In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order necessary conditions) for a solution in nonlinear programming to be optimal, provided that some regularity conditions are satisfied.

Allowing inequality constraints, the KKT approach to nonlinear programming generalizes the method of Lagrange multipliers, which allows only equality constraints. Similar to the Lagrange approach, the constrained maximization (minimization) problem is rewritten as a Lagrange function whose optimal point is a global maximum or minimum over the domain of the choice variables and a global minimum (maximum) over the multipliers. The Karush–Kuhn–Tucker theorem is sometimes referred to as the saddle-point theorem.[1]

The KKT conditions were originally named after Harold W. Kuhn and Albert W. Tucker, who first published the conditions in 1951.[2] Later scholars discovered that the necessary conditions for this problem had been stated by William Karush in his master's thesis in 1939.[3] [4]

Nonlinear optimization problem

Consider the following nonlinear optimization problem in standard form:

minimize

f(x)

subject to

gi(x)\leq0,

hj(x)=0.

where

x\inX

is the optimization variable chosen from a convex subset of

Rn

,

f

is the objective or utility function,

gi(i=1,\ldots,m)

are the inequality constraint functions and

hj(j=1,\ldots,\ell)

are the equality constraint functions. The numbers of inequalities and equalities are denoted by

m

and

\ell

respectively. Corresponding to the constrained optimization problem one can form the Lagrangian function

\mathcal(\mathbf,\mathbf,\mathbf) = f(\mathbf) + \mathbf^\top \mathbf(\mathbf) + \mathbf^\top \mathbf(\mathbf)=L(\mathbf, \mathbf)=f(\mathbf)+\mathbf^\top \begin\mathbf(\mathbf) \\\mathbf(\mathbf) \end

where

\mathbf\left(\mathbf\right) = \begin g_\left(\mathbf\right) \\ \vdots \\ g_\left(\mathbf\right) \\ \vdots \\ g_\left(\mathbf\right) \end, \quad \mathbf\left(\mathbf\right) = \begin h_\left(\mathbf\right) \\ \vdots \\ h_\left(\mathbf\right) \\ \vdots \\ h_\left(\mathbf\right) \end, \quad \mathbf = \begin \mu_ \\ \vdots \\ \mu_ \\ \vdots \\ \mu_ \\ \end, \quad \mathbf = \begin \lambda_ \\ \vdots \\ \lambda_ \\ \vdots \\ \lambda_ \end \quad \text \quad \mathbf = \begin \mu \\ \lambda \end. The Karush–Kuhn–Tucker theorem then states the following.

Since the idea of this approach is to find a supporting hyperplane on the feasible set

\Gamma=\left\{x\inX:gi(x)\leq0,i=1,\ldots,m\right\}

, the proof of the Karush–Kuhn–Tucker theorem makes use of the hyperplane separation theorem.[5]

The system of equations and inequalities corresponding to the KKT conditions is usually not solved directly, except in the few special cases where a closed-form solution can be derived analytically. In general, many optimization algorithms can be interpreted as methods for numerically solving the KKT system of equations and inequalities.[6]

Necessary conditions

f\colonRnR

and the constraint functions
n
g
i\colonR

R

and
n
h
j\colonR

R

have subderivatives at a point

x*\inRn

. If

x*

is a local optimum and the optimization problem satisfies some regularity conditions (see below), then there exist constants

\mui(i=1,\ldots,m)

and

λj(j=1,\ldots,\ell)

, called KKT multipliers, such that the following four groups of conditions hold:[7]
Stationarity
  • For minimizing

    f(x)

    :

    \partialf(x*)+

    \ell
    \sum
    j=1

    λj\partial

    *)
    h
    j(x

    +

    m
    \sum
    i=1

    \mui\partial

    *)
    g
    i(x

    \ni0

    For maximizing

    f(x)

    :

    -\partialf(x*)+

    \ell
    \sum
    j=1

    λj\partial

    *)
    h
    j(x

    +

    m
    \sum
    i=1

    \mui\partial

    *)
    g
    i(x

    \ni0

    Primal feasibility
    *)
    h
    j(x

    =0,forj=1,\ldots,\ell

    *)
    g
    i(x

    \le0,fori=1,\ldots,m

    Dual feasibility

    \mui\ge0,fori=1,\ldots,m

    Complementary slackness
    m
    \sum
    i=1

    \muigi(x*)=0.

    The last condition is sometimes written in the equivalent form:

    \muigi(x*)=0,fori=1,\ldots,m.

    In the particular case

    m=0

    , i.e., when there are no inequality constraints, the KKT conditions turn into the Lagrange conditions, and the KKT multipliers are called Lagrange multipliers.

    Interpretation: KKT conditions as balancing constraint-forces in state space

    The primal problem can be interpreted as moving a particle in the space of

    x

    , and subjecting it to three kinds of force fields:

    f

    is a potential field that the particle is minimizing. The force generated by

    f

    is

    -\partialf

    .

    gi

    are one-sided constraint surfaces. The particle is allowed to move inside

    gi\leq0

    , but whenever it touches

    gi=0

    , it is pushed inwards.

    hj

    are two-sided constraint surfaces. The particle is allowed to move only on the surface

    hj

    .

    Primal stationarity states that the "force" of

    \partialf(x*)

    is exactly balanced by a linear sum of forces

    \partial

    *)
    h
    j(x
    and

    \partial

    *)
    g
    i(x
    .

    Dual feasibility additionally states that all the

    \partial

    *)
    g
    i(x
    forces must be one-sided, pointing inwards into the feasible set for

    x

    .

    Complementary slackness states that if

    *)
    g
    i(x

    <0

    , then the force coming from

    \partial

    *)
    g
    i(x
    must be zero i.e.,
    *)
    \mu
    i(x

    =0

    , since the particle is not on the boundary, the one-sided constraint force cannot activate.

    Matrix representation

    The necessary conditions can be written with Jacobian matrices of the constraint functions. Let

    g(x):RnRm

    be defined as

    g(x)=\left(g1(x),\ldots,gm(x)\right)\top

    and let

    h(x):RnR\ell

    be defined as

    h(x)=\left(h1(x),\ldots,h\ell(x)\right)\top

    . Let

    \boldsymbol{\mu}=\left(\mu1,\ldots,\mum\right)\top

    and

    \boldsymbol{λ}=\left(λ1,\ldots,λ\ell\right)\top

    . Then the necessary conditions can be written as:
    Stationarity
  • For maximizing

    f(x)

    :

    \partialf(x*)-Dg(x*)\top\boldsymbol{\mu}-Dh(x*)\top\boldsymbol{λ}=0

    For minimizing

    f(x)

    :

    \partialf(x*)+Dg(x*)\top\boldsymbol{\mu}+Dh(x*)\top\boldsymbol{λ}=0

    Primal feasibility

    g(x*)\le0

    h(x*)=0

    Dual feasibility

    \boldsymbol\mu\ge0

    Complementary slackness

    \boldsymbol\mu\topg(x*)=0.

    Regularity conditions (or constraint qualifications)

    One can ask whether a minimizer point

    x*

    of the original, constrained optimization problem (assuming one exists) has to satisfy the above KKT conditions. This is similar to asking under what conditions the minimizer

    x*

    of a function

    f(x)

    in an unconstrained problem has to satisfy the condition

    \nablaf(x*)=0

    . For the constrained case, the situation is more complicated, and one can state a variety of (increasingly complicated) "regularity" conditions under which a constrained minimizer also satisfies the KKT conditions. Some common examples for conditions that guarantee this are tabulated in the following, with the LICQ the most frequently used one:
    ConstraintAcronymStatement
    Linearity constraint qualificationLCQIf

    gi

    and

    hj

    are affine functions, then no other condition is needed.
    Linear independence constraint qualificationLICQThe gradients of the active inequality constraints and the gradients of the equality constraints are linearly independent at

    x*

    .
    Mangasarian-Fromovitz constraint qualificationMFCQThe gradients of the equality constraints are linearly independent at

    x*

    and there exists a vector

    d\inRn

    such that

    \nabla

    *)
    g
    i(x

    \topd<0

    for all active inequality constraints and

    \nabla

    *)
    h
    j(x

    \topd=0

    for all equality constraints.[8]
    Constant rank constraint qualificationCRCQFor each subset of the gradients of the active inequality constraints and the gradients of the equality constraints the rank at a vicinity of

    x*

    is constant.
    Constant positive linear dependence constraint qualificationCPLDFor each subset of gradients of active inequality constraints and gradients of equality constraints, if the subset of vectors is linearly dependent at

    x*

    with non-negative scalars associated with the inequality constraints, then it remains linearly dependent in a neighborhood of

    x*

    .
    Quasi-normality constraint qualificationQNCQIf the gradients of the active inequality constraints and the gradients of the equality constraints are linearly dependent at

    x*

    with associated multipliers

    λj

    for equalities and

    \mui\geq0

    for inequalities, then there is no sequence

    xk\tox*

    such that

    λj0λjhj(xk)>0

    and

    \mui0\muigi(xk)>0.

    Slater's conditionSCFor a convex problem (i.e., assuming minimization,

    f,gi

    are convex and

    hj

    is affine), there exists a point

    x

    such that

    hj(x)=0

    and

    gi(x)<0.

    The strict implications can be shown

    LICQ ⇒ MFCQ ⇒ CPLD ⇒ QNCQ

    and

    LICQ ⇒ CRCQ ⇒ CPLD ⇒ QNCQ

    In practice weaker constraint qualifications are preferred since they apply to a broader selection of problems.

    Sufficient conditions

    In some cases, the necessary conditions are also sufficient for optimality. In general, the necessary conditions are not sufficient for optimality and additional information is required, such as the Second Order Sufficient Conditions (SOSC). For smooth functions, SOSC involve the second derivatives, which explains its name.

    The necessary conditions are sufficient for optimality if the objective function

    f

    of a maximization problem is a differentiable concave function, the inequality constraints

    gj

    are differentiable convex functions, the equality constraints

    hi

    are affine functions, and Slater's condition holds.[9] Similarly, if the objective function

    f

    of a minimization problem is a differentiable convex function, the necessary conditions are also sufficient for optimality.

    It was shown by Martin in 1985 that the broader class of functions in which KKT conditions guarantees global optimality are the so-called Type 1 invex functions.[10] [11]

    Second-order sufficient conditions

    For smooth, non-linear optimization problems, a second order sufficient condition is given as follows.

    The solution

    x*,λ*,\mu*

    found in the above section is a constrained local minimum if for the Lagrangian,

    L(x,λ,\mu)=f(x)+

    m
    \sum
    i=1

    \muigi(x)+

    \ell
    \sum
    j=1

    λjhj(x)

    then,

    sT\nabla

    2
    xx

    L(x*,λ*,\mu*)s\ge0

    where

    s\ne0

    is a vector satisfying the following,

    \left[\nablax

    *),
    g
    i(x

    \nablax

    *)
    h
    j(x

    \right]Ts=

    0
    R2

    where only those active inequality constraints

    gi(x)

    corresponding to strict complementarity (i.e. where

    \mui>0

    ) are applied. The solution is a strict constrained local minimum in the case the inequality is also strict.

    If

    sT\nabla

    2
    xx

    L(x*,λ*,\mu*)s=0

    , the third order Taylor expansion of the Lagrangian should be used to verify if

    x*

    is a local minimum. The minimization of

    f(x1,x2)=(x2-x

    2)(x
    2-3x
    2)
    1
    is a good counter-example, see also Peano surface.

    Economics

    See also: Profit maximization. Often in mathematical economics the KKT approach is used in theoretical models in order to obtain qualitative results. For example,[12] consider a firm that maximizes its sales revenue subject to a minimum profit constraint. Letting

    Q

    be the quantity of output produced (to be chosen),

    R(Q)

    be sales revenue with a positive first derivative and with a zero value at zero output,

    C(Q)

    be production costs with a positive first derivative and with a non-negative value at zero output, and

    Gmin

    be the positive minimal acceptable level of profit, then the problem is a meaningful one if the revenue function levels off so it eventually is less steep than the cost function. The problem expressed in the previously given minimization form is

    Minimize

    -R(Q)

    subject to

    Gmin\leR(Q)-C(Q)

    Q\ge0,

    and the KKT conditions are

    \begin{align} &\left(

    dR
    dQ

    \right)(1+\mu)-\mu\left(

    dC
    dQ

    \right)\le0,\\[5pt] &Q\ge0,\\[5pt] &Q\left[\left(

    dR
    dQ

    \right)(1+\mu)-\mu\left(

    dC
    dQ

    \right)\right]=0,\\[5pt] &R(Q)-C(Q)-Gmin\ge0,\\[5pt] &\mu\ge0,\\[5pt] &\mu[R(Q)-C(Q)-Gmin]=0. \end{align}

    Since

    Q=0

    would violate the minimum profit constraint, we have

    Q>0

    and hence the third condition implies that the first condition holds with equality. Solving that equality gives
    dR
    dQ

    =

    \mu
    1+\mu

    \left(

    dC
    dQ

    \right).

    Because it was given that

    dR/dQ

    and

    dC/dQ

    are strictly positive, this inequality along with the non-negativity condition on

    \mu

    guarantees that

    \mu

    is positive and so the revenue-maximizing firm operates at a level of output at which marginal revenue

    dR/dQ

    is less than marginal cost

    dC/dQ

    — a result that is of interest because it contrasts with the behavior of a profit maximizing firm, which operates at a level at which they are equal.

    Value function

    If we reconsider the optimization problem as a maximization problem with constant inequality constraints:

    Maximizef(x)

    subjectto

    gi(x)\leai,hj(x)=0.

    The value function is defined as

    V(a1,\ldots,an)=\sup\limitsxf(x)

    subjectto

    gi(x)\leai,hj(x)=0

    j\in\{1,\ldots,\ell\},i\in\{1,\ldots,m\},

    so the domain of

    V

    is

    \{a\inRm\midforsomex\inX,gi(x)\leqai,i\in\{1,\ldots,m\}\}.

    Given this definition, each coefficient

    \mui

    is the rate at which the value function increases as

    ai

    increases. Thus if each

    ai

    is interpreted as a resource constraint, the coefficients tell you how much increasing a resource will increase the optimum value of our function

    f

    . This interpretation is especially important in economics and is used, for instance, in utility maximization problems.

    Generalizations

    With an extra multiplier

    \mu0\geq0

    , which may be zero (as long as

    (\mu0,\mu,λ) ≠ 0

    ), in front of

    \nablaf(x*)

    the KKT stationarity conditions turn into

    \begin{align} &\mu0\nablaf(x*)+

    m
    \sum
    i=1

    \mui\nabla

    *)
    g
    i(x

    +

    \ell
    \sum
    j=1

    λj\nabla

    *)
    h
    j(x

    =0,\\[4pt] &\mujg

    *)=0,
    i(x

    i=1,...,m, \end{align}

    which are called the Fritz John conditions. This optimality conditions holds without constraint qualifications and it is equivalent to the optimality condition KKT or (not-MFCQ).

    The KKT conditions belong to a wider class of the first-order necessary conditions (FONC), which allow for non-smooth functions using subderivatives.

    See also

    Further reading

    External links

    Notes and References

    1. Book: Daniel . Tabak . Benjamin C. . Kuo . Optimal Control by Mathematical Programming . Englewood Cliffs, NJ . Prentice-Hall . 1971 . 0-13-638106-5 . 19–20 .
    2. H. W. . Kuhn . Harold W. Kuhn . A. W. . Tucker . Albert W. Tucker . Proceedings of 2nd Berkeley Symposium . 481–492 . Nonlinear programming . University of California Press . 1951 . Berkeley . 47303.
    3. W. Karush. Minima of Functions of Several Variables with Inequalities as Side Constraints. M.Sc. thesis. Dept. of Mathematics, Univ. of Chicago, Chicago, Illinois. 1939.
    4. Kjeldsen . Tinne Hoff. Tinne Hoff Kjeldsen . A contextualized historical analysis of the Kuhn-Tucker theorem in nonlinear programming: the impact of World War II . Historia Math. . 27 . 2000 . 4 . 331–361 . 1800317 . 10.1006/hmat.2000.2289. free .
    5. Book: Murray C. . Kemp . Yoshio . Kimura . Introduction to Mathematical Economics . New York . Springer . 1978 . 0-387-90304-6 . 38–44 .
    6. Book: Boyd. Stephen. Vandenberghe. Lieven. Convex Optimization. Cambridge University Press. Cambridge . 2004. 244. 0-521-83378-7. 2061575.
    7. Book: Ruszczyński, Andrzej . Nonlinear Optimization . . 2006 . 978-0691119151 . Princeton, NJ . 2199043 . Andrzej Piotr Ruszczyński.
    8. Book: Dimitri Bertsekas. Dimitri Bertsekas. Nonlinear Programming. 1999. Athena Scientific. 2. 329–330. 9781886529007.
    9. Book: Boyd. Stephen. Vandenberghe. Lieven. Convex Optimization. Cambridge University Press. Cambridge . 2004. 244. 0-521-83378-7. 2061575.
    10. D. H. . Martin . J. Optim. Theory Appl. . 47 . 1 . 65–76 . The Essence of Invexity . 1985 . 10.1007/BF00941316 . 122906371 .
    11. M. A. . Hanson . Invexity and the Kuhn-Tucker Theorem . J. Math. Anal. Appl. . 236 . 2 . 594–604 . 1999 . 10.1006/jmaa.1999.6484 . free .
    12. Chiang, Alpha C. Fundamental Methods of Mathematical Economics, 3rd edition, 1984, pp. 750–752.