Self-concordant function explained

A self-concordant function is a function satisfying a certain differential inequality, which makes it particularly easy for optimization using Newton's method^[1] A self-concordant barrier is a particular self-concordant function, that is also a barrier function for a particular convex set. Self-concordant barriers are important ingredients in interior point methods for optimization.

Self-concordant functions

Multivariate self-concordant function

Here is the general definition of a self-concordant function.^[2]

Let C be a convex nonempty open set in Rⁿ. Let f be a function that is three-times continuously differentiable defined on C. We say that f is self-concordant on C if it satisfies the following properties:

1. Barrier property: on any sequence of points in C that converges to a boundary point of C, f converges to ∞.

2. Differential inequality: for every point x in C, and any direction h in Rⁿ, let g_h be the function f restricted to the direction h, that is: g_h(t) = f(x+t*h). Then the one-dimensional function g_h should satisfy the following differential inequality:

|g_h'''(x)|\leq2

3/2
g
h''(x)

.

Equivalently:^[3]

\left.

d
d\alpha

\nabla²f(x+\alphay)\right|_\alpha\preceq2\sqrt{y^T\nabla²f(x)y}\nabla²f(x)

Univariate self-concordant function

f:R → R

is self-concordant on

if:

|f'''(x)|\leq2f''(x)^3/2

Equivalently: if wherever

f''(x)>0

it satisfies:

\left|

	d
	dx

	1
	\sqrt{f''(x)

} \right| \leq 1

and satisfies

f'''(x)=0

elsewhere.

Examples

Linear and convex quadratic functions are self-concordant, since their third derivative is zero.
Any function

f(x)=-log(-g(x))-logx

where

g(x)

is defined and convex for all

x>0

and verifies

|g'''(x)|\leq3g''(x)/x

, is self concordant on its domain which is

\{x\midx>0,g(x)<0\}

. Some examples are

g(x)=-x^p

for

0<p\leq1

g(x)=-logx

g(x)=x^p

for

-1\leqp\leq0

g(x)=(ax+b)²/x

- for any function

satisfying the conditions, the function

g(x)+ax²+bx+c

with

a\geq0

also satisfies the conditions.

Some functions that are not self-concordant:

f(x)=e^x

f(x)=

	1
	x^p

,x>0,p>0

f(x)=|x^p|,p>2

Self-concordant barriers

Here is the general definition of a self-concordant barrier (SCB).

Let C be a convex closed set in Rⁿ with a non-empty interior. Let f be a function from interior(C) to R. Let M>0 be a real parameter. We say that f is a M-self-concordant barrier for C if it satisfies the following:

1. f is a self-concordant function on interior(C).

2. For every point x in interior(C), and any direction h in Rⁿ, let g_h be the function f restricted to the direction h, that is: g_h(t) = f(x+t*h). Then the one-dimensional function g_h should satisfy the following differential inequality:

|g_h'(x)|\leqM^1/2 ⋅

1/2
g
h''(x)

.

Constructing SCBs

Due to the importance of SCBs in interior-point methods, it is important to know how to construct SCBs for various domains.

In theory, it can be proved that every closed convex domain in Rⁿ has a self-concordant barrier with parameter O(n). But this “universal barrier” is given by some multivariate integrals, and it is too complicated for actual computations. Hence, the main goal is to construct SCBs that are efficiently computable.^[4]

SCBs can be constructed from some basic SCBs, that are combined to produce SCBs for more complex domains, using several combination rules.

Basic SCBs

Every constant is a self-concordant barrier for all Rⁿ, with parameter M=0. It is the only self-concordant barrier for the entire space, and the only self-concordant barrier with M < 1. [Note that linear and quadratic functions are self-concordant functions, but they are ''not'' self concordant barriers].

For the positive half-line

R₊

(

x>0

f(x)=-lnx

is a self-concordant barrier with parameter

M=1

. This can be proved directly from the definition.

Substitution rule

Let G be a closed convex domain in Rⁿ, and g an M-SCB for G. Let x = Ay+b be an affine mapping from R^k to Rⁿ with its image intersecting the interior of G. Let H be the inverse image of G under the mapping: H = . Let h be the composite function h(y) := g(Ay+b). Then, h is an M-SCB for H.

For example, take n=1, G the positive half-line, and

g(x)=-lnx

. For any k, let a be a k-element vector and b a scalar. Let H = = a k-dimensional half-space. By the substitution rule,

h(y)=-ln(a^Ty+b)

is a 1-SCB for H. A more common format is H =, for which the SCB is

h(y)=-ln(b-a^Ty)

The substitution rule can be extended from affine mappings to a certain class of "appropriate" mappings, and to quadratic mappings.

Cartesian product rule

For all i in 1,...,m, let G_i be a closed convex domains in Rⁿⁱ, and let g_i be an M_i-SCB for G_i. Let G be the cartesian product of all G_i. Let g(x₁,...,x_m) := sum_i g_i(x_i). Then, g is a SCB for G, with parameter sum_i M_i.

For example, take all G_i to be the positive half-line, so that G is the positive orthant

	m
R
	+

. Let

g(x)=

	m
-\sum
	i=1

lnx_i

is an m-SCB for G.

We can now apply the substitution rule. We get that, for the polytope defined by the linear inequalities a_j^Tx ≤ b_j for j in 1,...,m, if it satisfies Slater's condition, then

f(x)=

	m
-\sum
	i=1

ln(b_j-a

	T

	j

is an m-SCB. The linear functions

b_j-a

	T

	j

can be replaced by quadratic functions.

Intersection rule

Let G₁,...,G_m be closed convex domains in Rⁿ. For each i in 1,...,m, let g_i be an M_i-SCB for G_i, and r_i a real number. Let G be the intersection of all G_i, and suppose its interior is nonempty. Let g := sum_i r_i*g_i. Then, g is a SCB for G, with parameter sum_i r_i*M_i.

Therefore, if G is defined by a list of constraints, we can find a SCB for each constraint separately, and then simply sum them to get a SCB for G.

For example, suppose the domain is defined by m linear constraints of the form a_j^Tx ≤ b_j, for j in 1,...,m. Then we can use the Intersection rule to construct the m-SCB

f(x)=

	m
-\sum
	i=1

ln(b_j-a

	T

	j

(the same one that we previously computed using the Cartesian product rule).

SCBs for epigraphs

The epigraph of a function f(x) is the area above the graph of the function, that is,

\{(x,t)\inR^2:t\geqf(x)\}

. The epigraph of f is a convex set if and only if f is a convex function. The following theorems present some functions f for which the epigraph has an SCB.

Let g(t) be a 3-times continuously-differentiable concave function on t>0, such that

t ⋅ |g'''(t)|/|g''(t)|

is bounded by a constant (denoted 3*b) for all t>0. Let G be the 2-dimensional convex domain:

G=closure(\{(x,t)\inR^2:t>0,x\leqg(t)\}).

Then, the function f(x,t) = -ln(f(t)-x) - max[1,b2]*ln(t) is a self-concordant barrier for G, with parameter (1+max[1,b2]).

Examples:

Let g(t) = t^1/p, for some p≥1, and b=(2p-1)/(3p). Then

G_1=\{(x,t)\inR^2:

	p
(x
	+)

\leqt\}

has a 2-SCB. Similarly,

G_2=\{(x,t)\inR^2:

	p
([-x]
	+)

\leqt\}

has a 2-SCB. Using the Intersection rule, we get that

G=G_1\capG₂₌\{(x,t)\inR^2:|x|^p\leqt\}

has a 4-SCB.

Let g(t)=ln(t) and b=2/3. Then

G=\{(x,t)\inR^2:e^x\leqt\}

has a 2-SCB.We can now construct a SCB for the problem of minimizing the p-norm:

min_x

	n
\sum
	j=1

|v_j-x^T

	p
u
	j\|

, where v_j are constant scalars, u_j are constant vectors, and p>0 is a constant. We first convert it into minimization of a linear objective:

min_x

	n
\sum
	j=1

t_j

, with the constraints:

t_j\geq|v_j-x^T

	p
u
	j\|

for all j in [''m'']. For each constraint, we have a 4-SCB by the affine substitution rule. Using the Intersection rule, we get a (4n)-SCB for the entire feasible domain.

Similarly, let g be a 3-times continuously-differentiable convex function on the ray x>0, such that:

x ⋅ |g'''(x)|/|g''(x)|\leq3b

for all x>0. Let G be the 2-dimensional convex domain: closure. Then, the function f(x,t) = -ln(t-f(x)) - max[1,b2]*ln(x) is a self-concordant barrier for G, with parameter (1+max[1,b2]).

Examples:

Let g(x) = x^-p, for some p>0, and b=(2+p)/3. Then

G_1=\{(x,t)\inR^2:x^-p\leqt,x\geq0\}

has a 2-SCB.

Let g(x)=x ln(x) and b=1/3. Then

G=\{(x,t)\inR^2:xlnx\leqt,x\geq0\}

has a 2-SCB.

SCBs for cones

\{(x,y)\inR^n-1 x R\mid\|x\|\leqy\}

, the function

f(x,y)=-log(y²-x^Tx)

is a self-concordant barrier.

For the cone of positive semidefinite of m*m symmetric matrices, the function

f(A)=-log\detA

is a self-concordant barrier.

For the quadratic region defined by

\phi(x)>0

where

\phi(x)=\alpha+\langlea,x\rangle-

	1
	2

\langleAx,x\rangle

where

A=A^T\geq0

is a positive semi-definite symmetric matrix, the logarithmic barrier

f(x)=-log\phi(x)

is self-concordant with

M=2

\{(x,y,z)\inR³\midye^x/y\leqz,y>0\}

, the function

f(x,y,z)=-log(ylog(z/y)-x)-logz-logy

is a self-concordant barrier.

\{(x_1,x_2,y)\in

	2
R
	+

x R\mid|y|\leq

	\alpha
x
	1

	1-\alpha
x
	2

, the function

f(x_1,x_2,y)=

	2\alpha
-log(x
	1

	2(1-\alpha)
x
	2

-y²⁾-logx₁-logx₂

is a self-concordant barrier.

History

As mentioned in the "Bibliography Comments"^[5] of their 1994 book, self-concordant functions were introduced in 1988 by Yurii Nesterov^[6] ^[7] and further developed with Arkadi Nemirovski.^[8] As explained in^[9] their basic observation was that the Newton method is affine invariant, in the sense that if for a function

f(x)

we have Newton steps

x_k+1=x_k-

	-1
[f''(x
	k)]

f'(x_k)

then for a function

\phi(y)=f(Ay)

where

is a non-degenerate linear transformation, starting from

y₀=A^-1x₀

we have the Newton steps

y_k=A^-1x_k

which can be shown recursively

y_k+1=y_k-

	-1
[\phi''(y
	k)]

\phi'(y_k)=y_k-[A^Tf''(Ay_k)A]^-1A^Tf'(Ay_k)=A^-1x_k-A^-1

	-1
[f''(x
	k)]

f'(x_k)=A^-1x_k+1

However, the standard analysis of the Newton method supposes that the Hessian of

is Lipschitz continuous, that is

\|f''(x)-f''(y)\|\leqM\|x-y\|

for some constant

. If we suppose that

is 3 times continuously differentiable, then this is equivalent to

|\langlef'''(x)[u]v,v\rangle|\leqM\|u\|\|v\|²

for all

u,v\inRⁿ

where

f'''(x)[u]=\lim_\alpha\alpha^-1[f''(x+\alphau)-f''(x)]

. Then the left hand side of the above inequality is invariant under the affine transformation

f(x)\to\phi(y)=f(Ay),u\toA^-1u,v\toA^-1v

, however the right hand side is not.

The authors note that the right hand side can be made also invariant if we replace the Euclidean metric by the scalar product defined by the Hessian of

defined as

\|w\|_f''(x)=\langlef''(x)w,w\rangle^1/2

for

w\inRⁿ

. They then arrive at the definition of a self concordant function as

|\langlef'''(x)[u]u,u\rangle|\leqM\langlef''(x)u,u\rangle^3/2

Properties

Linear combination

f₁

and

f₂

are self-concordant with constants

M₁

and

M₂

and

\alpha,\beta>0

, then

\alphaf₁+\betaf₂

is self-concordant with constant

max(\alpha^-1/2M_1,\beta^-1/2M₂₎

Affine transformation

is self-concordant with constant

and

Ax+b

is an affine transformation of

Rⁿ

, then

\phi(x)=f(Ax+b)

is also self-concordant with parameter

Convex conjugate

is self-concordant, then its convex conjugate

f^*

is also self-concordant.^[10] ^[11]

Non-singular Hessian

is self-concordant and the domain of

contains no straight line (infinite in both directions), then

f''

is non-singular.

Conversely, if for some

in the domain of

and

u\inR^n,u ≠ 0

we have

\langlef''(x)u,u\rangle=0

, then

\langlef''(x+\alphau)u,u\rangle=0

for all

\alpha

for which

x+\alphau

is in the domain of

and then

f(x+\alphau)

is linear and cannot have a maximum so all of

x+\alphau,\alpha\inR

is in the domain of

. We note also that

cannot have a minimum inside its domain.

Applications

Among other things, self-concordant functions are useful in the analysis of Newton's method. Self-concordant barrier functions are used to develop the barrier functions used in interior point methods for convex and nonlinear optimization. The usual analysis of the Newton method would not work for barrier functions as their second derivative cannot be Lipschitz continuous, otherwise they would be bounded on any compact subset of

Rⁿ

Self-concordant barrier functions

are a class of functions that can be used as barriers in constrained optimization methods
can be minimized using the Newton algorithm with provable convergence properties analogous to the usual case (but these results are somewhat more difficult to derive)
to have both of the above, the usual constant bound on the third derivative of the function (required to get the usual convergence results for the Newton method) is replaced by a bound relative to the Hessian

Minimizing a self-concordant function

A self-concordant function may be minimized with a modified Newton method where we have a bound on the number of steps required for convergence. We suppose here that

is a standard self-concordant function, that is it is self-concordant with parameter

M=2

We define the Newton decrement

λ_f(x)

as the size of the Newton step

[f''(x)]^-1f'(x)

in the local norm defined by the Hessian of

λ_f(x)=\langlef''(x)[f''(x)]^-1f'(x),[f''(x)]^-1f'(x)\rangle^1/2=\langle[f''(x)]^-1f'(x),f'(x)\rangle^1/2

Then for

in the domain of

, if

λ_f(x)<1

then it is possible to prove that the Newton iterate

x₊=x-[f''(x)]^-1f'(x)

will be also in the domain of

. This is because, based on the self-concordance of

, it is possible to give some finite bounds on the value of

f(x₊₎

. We further have

λ_f(x₊₎\leq(

	λ_f(x)
	1-λ_f(x)

)²

Then if we have

λ_f(x)<\barλ=

	3-\sqrt5
	2

then it is also guaranteed that

λ_f(x₊₎<λ_f(x)

, so that we can continue to use the Newton method until convergence. Note that for

λ_f(x₊₎<\beta

for some

\beta\in(0,\barλ)

we have quadratic convergence of

λ_f

to 0 as

λ_f(x₊₎\leq(1-\beta)^-2

	2
λ
	f(x)

. This then gives quadratic convergence of

f(x_k)

f(x^*)

and of

x^*

, where

x^*=\argminf(x)

, by the following theorem. If

λ_f(x)<1

then

\omega(λ_f(x))\leqf(x)-f(x^*)\leq\omega_*(λ_f(x))

\omega'(λ_f(x))\leq\|x-x^*\|_x\leq\omega_*'(λ_f(x))

with the following definitions

\omega(t)=t-log(1+t)

\omega_*(t)=-t-log(1-t)

\|u\|_x=\langlef''(x)u,u\rangle^1/2

If we start the Newton method from some

x₀

with

λ_f(x₀₎\geq\barλ

then we have to start by using a damped Newton method defined by

x_k+1=x_k-

	1
	1+λ_f(x_k)

	-1
[f''(x
	k)]

f'(x_k)

For this it can be shown that

f(x_k+1)\leqf(x_k)-\omega(λ_f(x_k))

with

\omega

as defined previously. Note that

\omega(t)

is an increasing function for

t>0

so that

\omega(t)\geq\omega(\barλ)

for any

t\geq\barλ

, so the value of

is guaranteed to decrease by a certain amount in each iteration, which also proves that

x_k+1

is in the domain of

Notes and References

Web site: Nemirovsky and Ben-Tal . 2023 . Optimization III: Convex Optimization .
Web site: Arkadi Nemirovsky . 2004 . Interior point polynomial time methods in convex programming .
Book: Boyd . Stephen P. . Convex Optimization . Vandenberghe . Lieven . Cambridge University Press . 2004 . 978-0-521-83378-3 . October 15, 2011.
Web site: Nemirovsky and Ben-Tal . 2023 . Optimization III: Convex Optimization .
Book: Nesterov. Yurii. Interior-Point Polynomial Algorithms in Convex Programming (Bibliography Comments). Nemirovskii. Arkadii. January 1994. Society for Industrial and Applied Mathematics. 978-0-89871-319-0. en. 10.1137/1.9781611970791.bm.
Yu. E. NESTEROV, Polynomial time methods in linear and quadratic programming, Izvestija AN SSSR, Tekhnitcheskaya kibernetika, No. 3, 1988, pp. 324-326. (In Russian.)
Yu. E. NESTEROV, Polynomial time iterative methods in linear and quadratic programming, Voprosy kibernetiki, Moscow,1988, pp. 102-125. (In Russian.)
Y.E. Nesterov and A.S. Nemirovski, Self–concordant functions and polynomial–time methods in convex programming, Technical report, Central Economic and Mathematical Institute, USSR Academy of Science, Moscow, USSR, 1989.
Book: Nesterov, I︠U︡. E.. Introductory lectures on convex optimization : a basic course. December 2013 . 978-1-4419-8853-9. Boston. 883391994.
Book: Nesterov . Yurii . Nemirovskii . Arkadii . 1994 . Interior-Point Polynomial Algorithms in Convex Programming . 13. Studies in Applied and Numerical Mathematics . 10.1137/1.9781611970791 . 978-0-89871-319-0 . 29310677.
Sun . Tianxiao . Tran-Dinh . Quoc . Generalized Self-Concordant Functions: A Recipe for Newton-Type Methods . Mathematical Programming . 2018 . Proposition 6 . 1703.04599 .