A self-concordant function is a function satisfying a certain differential inequality, which makes it particularly easy for optimization using Newton's method[1] A self-concordant barrier is a particular self-concordant function, that is also a barrier function for a particular convex set. Self-concordant barriers are important ingredients in interior point methods for optimization.
Here is the general definition of a self-concordant function.[2]
Let C be a convex nonempty open set in Rn. Let f be a function that is three-times continuously differentiable defined on C. We say that f is self-concordant on C if it satisfies the following properties:
1. Barrier property: on any sequence of points in C that converges to a boundary point of C, f converges to ∞.
2. Differential inequality: for every point x in C, and any direction h in Rn, let gh be the function f restricted to the direction h, that is: gh(t) = f(x+t*h). Then the one-dimensional function gh should satisfy the following differential inequality:
Equivalently:[3].|gh'''(x)|\leq2
3/2 g h''(x)
\left.
d d\alpha \nabla2f(x+\alphay)\right|\alpha\preceq2\sqrt{yT\nabla2f(x)y}\nabla2f(x)
f:R → R
R
|f'''(x)|\leq2f''(x)3/2
Equivalently: if wherever
f''(x)>0
\left|
d | |
dx |
1 | |
\sqrt{f''(x) |
and satisfies
f'''(x)=0
f(x)=-log(-g(x))-logx
g(x)
x>0
|g'''(x)|\leq3g''(x)/x
\{x\midx>0,g(x)<0\}
g(x)=-xp
0<p\leq1
g(x)=-logx
g(x)=xp
-1\leqp\leq0
g(x)=(ax+b)2/x
g
g(x)+ax2+bx+c
a\geq0
Some functions that are not self-concordant:
f(x)=ex
f(x)=
1 | |
xp |
,x>0,p>0
f(x)=|xp|,p>2
Here is the general definition of a self-concordant barrier (SCB).
Let C be a convex closed set in Rn with a non-empty interior. Let f be a function from interior(C) to R. Let M>0 be a real parameter. We say that f is a M-self-concordant barrier for C if it satisfies the following:
1. f is a self-concordant function on interior(C).
2. For every point x in interior(C), and any direction h in Rn, let gh be the function f restricted to the direction h, that is: gh(t) = f(x+t*h). Then the one-dimensional function gh should satisfy the following differential inequality:
.|gh'(x)|\leqM1/2 ⋅
1/2 g h''(x)
Due to the importance of SCBs in interior-point methods, it is important to know how to construct SCBs for various domains.
In theory, it can be proved that every closed convex domain in Rn has a self-concordant barrier with parameter O(n). But this “universal barrier” is given by some multivariate integrals, and it is too complicated for actual computations. Hence, the main goal is to construct SCBs that are efficiently computable.[4]
SCBs can be constructed from some basic SCBs, that are combined to produce SCBs for more complex domains, using several combination rules.
Every constant is a self-concordant barrier for all Rn, with parameter M=0. It is the only self-concordant barrier for the entire space, and the only self-concordant barrier with M < 1. [Note that linear and quadratic functions are self-concordant functions, but they are ''not'' self concordant barriers].
For the positive half-line
R+
x>0
f(x)=-lnx
M=1
Let G be a closed convex domain in Rn, and g an M-SCB for G. Let x = Ay+b be an affine mapping from Rk to Rn with its image intersecting the interior of G. Let H be the inverse image of G under the mapping: H = . Let h be the composite function h(y) := g(Ay+b). Then, h is an M-SCB for H.
For example, take n=1, G the positive half-line, and
g(x)=-lnx
h(y)=-ln(aTy+b)
h(y)=-ln(b-aTy)
The substitution rule can be extended from affine mappings to a certain class of "appropriate" mappings, and to quadratic mappings.
For all i in 1,...,m, let Gi be a closed convex domains in Rni, and let gi be an Mi-SCB for Gi. Let G be the cartesian product of all Gi. Let g(x1,...,xm) := sumi gi(xi). Then, g is a SCB for G, with parameter sumi Mi.
For example, take all Gi to be the positive half-line, so that G is the positive orthant
m | |
R | |
+ |
g(x)=
m | |
-\sum | |
i=1 |
lnxi
We can now apply the substitution rule. We get that, for the polytope defined by the linear inequalities ajTx ≤ bj for j in 1,...,m, if it satisfies Slater's condition, then
f(x)=
m | |
-\sum | |
i=1 |
ln(bj-a
T | |
j |
x)
bj-a
T | |
j |
x
Let G1,...,Gm be closed convex domains in Rn. For each i in 1,...,m, let gi be an Mi-SCB for Gi, and ri a real number. Let G be the intersection of all Gi, and suppose its interior is nonempty. Let g := sumi ri*gi. Then, g is a SCB for G, with parameter sumi ri*Mi.
Therefore, if G is defined by a list of constraints, we can find a SCB for each constraint separately, and then simply sum them to get a SCB for G.
For example, suppose the domain is defined by m linear constraints of the form ajTx ≤ bj, for j in 1,...,m. Then we can use the Intersection rule to construct the m-SCB
f(x)=
m | |
-\sum | |
i=1 |
ln(bj-a
T | |
j |
x)
The epigraph of a function f(x) is the area above the graph of the function, that is,
\{(x,t)\inR2:t\geqf(x)\}
Let g(t) be a 3-times continuously-differentiable concave function on t>0, such that
t ⋅ |g'''(t)|/|g''(t)|
G=closure(\{(x,t)\inR2:t>0,x\leqg(t)\}).
Examples:
G1=\{(x,t)\inR2:
p | |
(x | |
+) |
\leqt\}
G2=\{(x,t)\inR2:
p | |
([-x] | |
+) |
\leqt\}
G=G1\capG2=\{(x,t)\inR2:|x|p\leqt\}
G=\{(x,t)\inR2:ex\leqt\}
minx
n | |
\sum | |
j=1 |
|vj-xT
p | |
u | |
j| |
minx
n | |
\sum | |
j=1 |
tj
tj\geq|vj-xT
p | |
u | |
j| |
Similarly, let g be a 3-times continuously-differentiable convex function on the ray x>0, such that:
x ⋅ |g'''(x)|/|g''(x)|\leq3b
Examples:
G1=\{(x,t)\inR2:x-p\leqt,x\geq0\}
G=\{(x,t)\inR2:xlnx\leqt,x\geq0\}
\{(x,y)\inRn-1 x R\mid\|x\|\leqy\}
f(x,y)=-log(y2-xTx)
f(A)=-log\detA
\phi(x)>0
\phi(x)=\alpha+\langlea,x\rangle-
1 | |
2 |
\langleAx,x\rangle
A=AT\geq0
f(x)=-log\phi(x)
M=2
\{(x,y,z)\inR3\midyex/y\leqz,y>0\}
f(x,y,z)=-log(ylog(z/y)-x)-logz-logy
\{(x1,x2,y)\in
2 | |
R | |
+ |
x R\mid|y|\leq
\alpha | |
x | |
1 |
1-\alpha | |
x | |
2 |
\}
f(x1,x2,y)=
2\alpha | |
-log(x | |
1 |
2(1-\alpha) | |
x | |
2 |
-y2)-logx1-logx2
As mentioned in the "Bibliography Comments"[5] of their 1994 book,[6] self-concordant functions were introduced in 1988 by Yurii Nesterov[7] [8] and further developed with Arkadi Nemirovski.[9] As explained in[10] their basic observation was that the Newton method is affine invariant, in the sense that if for a function
f(x)
xk+1=xk-
-1 | |
[f''(x | |
k)] |
f'(xk)
\phi(y)=f(Ay)
A
y0=A-1x0
yk=A-1xk
yk+1=yk-
-1 | |
[\phi''(y | |
k)] |
\phi'(yk)=yk-[ATf''(Ayk)A]-1ATf'(Ayk)=A-1xk-A-1
-1 | |
[f''(x | |
k)] |
f'(xk)=A-1xk+1
However, the standard analysis of the Newton method supposes that the Hessian of
f
\|f''(x)-f''(y)\|\leqM\|x-y\|
M
f
|\langlef'''(x)[u]v,v\rangle|\leqM\|u\|\|v\|2
u,v\inRn
where
f'''(x)[u]=\lim\alpha\alpha-1[f''(x+\alphau)-f''(x)]
f(x)\to\phi(y)=f(Ay),u\toA-1u,v\toA-1v
The authors note that the right hand side can be made also invariant if we replace the Euclidean metric by the scalar product defined by the Hessian of
f
\|w\|f''(x)=\langlef''(x)w,w\rangle1/2
w\inRn
|\langlef'''(x)[u]u,u\rangle|\leqM\langlef''(x)u,u\rangle3/2
If
f1
f2
M1
M2
\alpha,\beta>0
\alphaf1+\betaf2
max(\alpha-1/2M1,\beta-1/2M2)
If
f
M
Ax+b
Rn
\phi(x)=f(Ax+b)
M
If
f
f*
If
f
f
f''
Conversely, if for some
x
f
u\inRn,u ≠ 0
\langlef''(x)u,u\rangle=0
\langlef''(x+\alphau)u,u\rangle=0
\alpha
x+\alphau
f
f(x+\alphau)
x+\alphau,\alpha\inR
f
f
Among other things, self-concordant functions are useful in the analysis of Newton's method. Self-concordant barrier functions are used to develop the barrier functions used in interior point methods for convex and nonlinear optimization. The usual analysis of the Newton method would not work for barrier functions as their second derivative cannot be Lipschitz continuous, otherwise they would be bounded on any compact subset of
Rn
Self-concordant barrier functions
A self-concordant function may be minimized with a modified Newton method where we have a bound on the number of steps required for convergence. We suppose here that
f
M=2
We define the Newton decrement
λf(x)
f
x
[f''(x)]-1f'(x)
f
x
λf(x)=\langlef''(x)[f''(x)]-1f'(x),[f''(x)]-1f'(x)\rangle1/2=\langle[f''(x)]-1f'(x),f'(x)\rangle1/2
x
f
λf(x)<1
x+=x-[f''(x)]-1f'(x)
f
f
f(x+)
λf(x+)\leq(
λf(x) | |
1-λf(x) |
)2
Then if we have
λf(x)<\barλ=
3-\sqrt5 | |
2 |
λf(x+)<λf(x)
λf(x+)<\beta
\beta\in(0,\barλ)
λf
λf(x+)\leq(1-\beta)-2
2 | |
λ | |
f(x) |
f(xk)
f(x*)
x
x*
x*=\argminf(x)
λf(x)<1
\omega(λf(x))\leqf(x)-f(x*)\leq\omega*(λf(x))
\omega'(λf(x))\leq\|x-x*\|x\leq\omega*'(λf(x))
\omega(t)=t-log(1+t)
\omega*(t)=-t-log(1-t)
\|u\|x=\langlef''(x)u,u\rangle1/2
If we start the Newton method from some
x0
λf(x0)\geq\barλ
xk+1=xk-
1 | |
1+λf(xk) |
-1 | |
[f''(x | |
k)] |
f'(xk)
f(xk+1)\leqf(xk)-\omega(λf(xk))
\omega
\omega(t)
t>0
\omega(t)\geq\omega(\barλ)
t\geq\barλ
f
xk+1
f