Hilbert projection theorem explained

In mathematics, the Hilbert projection theorem is a famous result of convex analysis that says that for every vector

x

in a Hilbert space

H

and every nonempty closed convex

C\subseteqH,

there exists a unique vector

m\inC

for which

\|c-x\|

is minimized over the vectors

c\inC

; that is, such that

\|m-x\|\leq\|c-x\|

for every

c\inC.

Finite dimensional case

Some intuition for the theorem can be obtained by considering the first order condition of the optimization problem.

Consider a finite dimensional real Hilbert space

H

with a subspace

C

and a point

x.

If

m\inC

is a or of the function

N:C\to\R

defined by

N(c):=\|c-x\|

(which is the same as the minimum point of

c\mapsto\|c-x\|2

), then derivative must be zero at

m.

In matrix derivative notation[1] \begin\partial \lVert x - c \rVert^2 &= \partial \langle c - x, c - x \rangle \\&= 2 \langle c - x, \partial c\rangle\endSince

\partialc

is a vector in

C

that represents an arbitrary tangent direction, it follows that

m-x

must be orthogonal to every vector in

C.

Statement

Proof by reduction to a special case

It suffices to prove the theorem in the case of

x=0

because the general case follows from the statement below by replacing

C

with

C-x.

Consequences

If

c\inC\capC\bot

then

0=\langlec,c\rangle=\|c\|2,

which implies

c=0.

\blacksquare

Let

P:=\prodcF

where

F

is the underlying scalar field of

H

and define \beginL : \,& H && \to \,&& P \\ & h && \mapsto\,&& \left(\langle \,h, \,c\, \rangle\right)_ \\\endwhich is continuous and linear because this is true of each of its coordinates

h\mapsto\langleh,c\rangle.

The set

C\bot=L-1(0)=L-1\left(\{0\}\right)

is closed in

H

because

\{0\}

is closed in

P

and

L:H\toP

is continuous. The kernel of any linear map is a vector subspace of its domain, which is why

C\bot=\kerL

is a vector subspace of

H.

\blacksquare

Let

x\inH.

The Hilbert projection theorem guarantees the existence of a unique

m\inC

such that

\|x-m\|\leq\|x-c\|forallc\inC

(or equivalently, for all

x-c\inx-C

). Let

p:=x-m

so that

x=m+p\inC+p

and it remains to show that

p\inC\bot.

The inequality above can be rewritten as:\|p\| \leq \|z\| \quad \text z \in x - C.Because

m\inC

and

C

is a vector space,

m+C=C

and

C=-C,

which implies that

x-C=x+C=p+m+C=p+C.

The previous inequality thus becomes\|p\| \leq \|z\| \quad \text z \in p + C.or equivalently,\|p\| \leq \|p + c\| \quad \text c \in C. But this last statement is true if and only if

\langlep,c\rangle=0

every

c\inC.

Thus

p\inC\bot.

\blacksquare

Properties

Expression as a global minimum

The statement and conclusion of the Hilbert projection theorem can be expressed in terms of global minimums of the followings functions. Their notation will also be used to simplify certain statements.

Given a non-empty subset

C\subseteqH

and some

x\inH,

define a function d_ : C \to [0, \infty) \quad \text{ by } c \mapsto \|x - c\|.</math> A {{em|[[global minimum point]]}} of

dC,x,

if one exists, is any point

m

in

\operatorname{domain}dC,x=C

such that d_(m) \,\leq\, d_(c) \quad \text c \in C, in which case

dC,x(m)=\|m-x\|

is equal to the of the function

dC,,

which is:\inf_ d_(c) = \inf_ \|x - c\|.

Effects of translations and scalings

When this global minimum point

m

exists and is unique then denote it by

min(C,x);

explicitly, the defining properties of

min(C,x)

(if it exists) are:\min(C, x) \in C \quad \text \quad \left\|x - \min(C, x)\right\| \leq \|x - c\| \quad \text c \in C.The Hilbert projection theorem guarantees that this unique minimum point exists whenever

C

is a non-empty closed and convex subset of a Hilbert space. However, such a minimum point can also exist in non-convex or non-closed subsets as well; for instance, just as long is

C

is non-empty, if

x\inC

then

min(C,x)=x.

If

C\subseteqH

is a non-empty subset,

s

is any scalar, and

x,x0\inH

are any vectors then\,\min\left(s C + x_0, s x + x_0\right) = s \min(C, x) + x_0which implies:\begin\min&(s C, s x) &&= s &&\min(C, x) \\\min&(- C, - x) &&= - &&\min(C, x) \\\end\begin\min\left(C + x_0, x + x_0\right) &= \min(C, x) + x_0 \\\min\left(C - x_0, x - x_0\right) &= \min(C, x) - x_0 \\\end\begin\min&(C, - x) &&= \min(C + x, 0) - x \\\min&(C, 0) \;+\; x\;\;\;\; &&= \min(C + x, x) \\\min&(C - x, 0) &&= \min(C, x) - x \\\end

Examples

The following counter-example demonstrates a continuous linear isomorphism

A:H\toH

for which

min(A(C),A(x))A(min(C,x)).

Endow

H:=\R2

with the dot product, let

x0:=(0,1),

and for every real

s\in\R,

let

Ls:=\{(x,sx):x\in\R\}

be the line of slope

s

through the origin, where it is readily verified that

min\left(Ls,x0\right)=

s
1+s2

(1,s).

Pick a real number

r0

and define

A:\R2\to\R2

by

A(x,y):=(rx,y)

(so this map scales the

x-

coordinate by

r

while leaving the

y-

coordinate unchanged). Then

A:\R2\to\R2

is an invertible continuous linear operator that satisfies

A\left(Ls\right)=Ls/r

and

A\left(x0\right)=x0,

so that

min\left(A\left(Ls\right),A\left(x0\right)\right)=

s
r2+s2

(1,s)

and

A\left(min\left(Ls,x0\right)\right)=

s
1+s2

\left(r,s\right).

Consequently, if

C:=Ls

with

s0

and if

(r,s)(\pm1,1)

then

min(A(C),A\left(x0\right))A\left(min\left(C,x0\right)\right).

Bibliography

. Walter Rudin. Real and Complex Analysis. Third. 1987.

Notes and References

  1. Web site: Petersen. Kaare. The Matrix Cookbook. 9 January 2021.