In mathematics and economics, transportation theory or transport theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781.[1]
In the 1920s A.N. Tolstoi was one of the first to study the transportation problem mathematically. In 1930, in the collection Transportation Planning Volume I for the National Commissariat of Transportation of the Soviet Union, he published a paper "Methods of Finding the Minimal Kilometrage in Cargo-transportation in space".[2] [3]
Major advances were made in the field during World War II by the Soviet mathematician and economist Leonid Kantorovich.[4] Consequently, the problem as it is stated is sometimes known as the Monge–Kantorovich transportation problem.[5] The linear programming formulation of the transportation problem is also known as the Hitchcock–Koopmans transportation problem.[6]
Suppose that we have a collection of
m
n
M
F
R2
c:R2 x R2\to[0,infty)
c(x,y)
x
y
T:M\toF
m\inM
T(m)\inF
T
c(T):=\summc(m,T(m))
is the least of all possible transport plans from
M
F
The following simple example illustrates the importance of the cost function in determining the optimal transport plan. Suppose that we have
n
n
n
c(x,y)=\alpha\|x-y\|
\alpha>0
c(x,y)=\alpha\|x-y\|2
\alpha>0
Note that the above cost functions consider only the horizontal distance traveled by the books, not the horizontal distance traveled by a device used to pick each book up and move the book into position. If the latter is considered instead, then, of the two transport plans, the second is always optimal for the Euclidean distance, while, provided there are at least 3 books, the first transport plan is optimal for the squared Euclidean distance.
The following transportation problem formulation is credited to F. L. Hitchcock:[7]
Suppose there are
m
x1,\ldots,xm
a(xi)
xi
n
y1,\ldots,yn
b(yj)
yj
a(xi, yj)
xi
yj
Tjalling Koopmans is also credited with formulations of transport economics and allocation of resources.
The transportation problem as it is stated in modern or more technical literature looks somewhat different because of the development of Riemannian geometry and measure theory. The mines-factories example, simple as it is, is a useful reference point when thinking of the abstract case. In this setting, we allow the possibility that we may not wish to keep all mines and factories open for business, and allow mines to supply more than one factory, and factories to accept iron from more than one mine.
Let
X
Y
X
Y
c:X x Y\to[0,infty]
\mu
X
\nu
Y
T:X\toY
inf\left\{\left.\intXc(x,T(x))d\mu(x) \right| T*(\mu)=\nu\right\},
where
T*(\mu)
\mu
T
T
Monge's formulation of the optimal transportation problem can be ill-posed, because sometimes there is no
T
T*(\mu)=\nu
\mu
\nu
We can improve on this by adopting Kantorovich's formulation of the optimal transportation problem, which is to find a probability measure
\gamma
X x Y
inf\left\{\left.\intXc(x,y)d\gamma(x,y)\right|\gamma\in\Gamma(\mu,\nu)\right\},
where
\Gamma(\mu,\nu)
X x Y
\mu
X
\nu
Y
c
\Gamma(\mu,\nu)
X
Y
Wp
The minimum of the Kantorovich problem is equal to
\sup\left(\intX\varphi(x)d\mu(x)+\intY\psi(y)d\nu(y)\right),
where the supremum runs over all pairs of bounded and continuous functions
\varphi:X → R
\psi:Y → R
\varphi(x)+\psi(y)\leqc(x,y).
The economic interpretation is clearer if signs are flipped. Let stand for the vector of characteristics of a worker, for the vector of characteristics of a firm, and for the economic output generated by worker matched with firm . Setting and , the Monge–Kantorovich problemrewrites:which has dual :where the infimum runs over bounded and continuous function and . If the dual problem has a solution, one can see that:so that interprets as the equilibrium wage of a worker of type , and interprets as the equilibrium profit of a firm of type .
For
1\leqp<infty
l{P}p(R)
R
p
\mu,\nu\inl{P}p(R)
c(x,y)=h(x-y)
h:R → [0,infty)
\mu
F\mu:R → [0,1]
\mu
-1 | |
F | |
\nu |
\circF\mu:R\toR
h
min\gamma
\int | |
R2 |
c(x,y)d\gamma(x,y)=
1 | |
\int | |
0 |
c\left(
-1 | |
F | |
\mu |
(s),
-1 | |
F | |
\nu |
(s)\right)ds.
The proof of this solution appears in Rachev & Rüschendorf (1998).[12]
In the case where the margins and are discrete, let and be the probability masses respectively assigned to and , and let be the probability of an assignment. The objective function in the primal Kantorovich problem is then
\sumx\in\gammaxycxy
and the constraint expresses as
\sumy\in\gammaxy=\mux,\forallx\inX
\sumx\in\gammaxy=\nuy,\forally\inY.
In order to input this in a linear programming problem, we need to vectorize the matrix by either stacking its columns or its rows, we call this operation. In the column-major order, the constraints above rewrite as
\left(11 x ⊗ I\left\vert\right)\operatorname{vec}\left(\gamma\right)=\mu
\left(I\left\vert ⊗ 11 x \right)\operatorname{vec}\left(\gamma\right)=\nu
where is the Kronecker product, is a matrix of size with all entries of ones, and is the identity matrix of size . As a result, setting , the linear programming formulation of the problem is
\begin{align} &Minimize&&\operatorname{vec}(c)\topz\\[4pt] &subjectto:&&z\ge0,\\[4pt] &&&\begin{pmatrix} 11 x ⊗ I\left\vert\\ I\left\vert ⊗ 11 x \end{pmatrix}z=\binom{\mu}{\nu} \end{align}
which can be readily inputted in a large-scale linear programming solver (see chapter 3.4 of Galichon (2016)[13]).
In the semi-discrete case, and is a continuous distribution over , while is a discrete distribution which assigns probability mass to site . In this case, we can see[14] that the primal and dual Kantorovich problems respectively boil down to:for the primal, where means that and , and:for the dual, which can be rewritten as:which is a finite-dimensional convex optimization problem that can be solved by standard techniques, such as gradient descent.
In the case when , one can show that the set of assigned to a particular site is a convex polyhedron. The resulting configuration is called a power diagram.[15]
Assume the particular case , , and where is invertible. One then has
\varphi(x)=-x\top
-1/2 | |
\Sigma | |
X |
\left(
1/2 | |
\Sigma | |
X |
A\top\SigmaY
1/2 | |
A\Sigma | |
X |
\right)1/2
-1/2 | |
\Sigma | |
X |
x/2
\psi(y)=-y\top
1/2 | |
A\Sigma | |
X |
\left(
1/2 | |
\Sigma | |
X |
A\top\SigmaY
1/2 | |
A\Sigma | |
X |
\right)-1/2
1/2 | |
\Sigma | |
X |
Ay/2
T(x)=(A\top)-1
-1/2 | |
\Sigma | |
X |
1/2 | |
\left(\Sigma | |
X |
A\top\SigmaY
1/2 | |
A\Sigma | |
X |
\right)1/2
-1/2 | |
\Sigma | |
X |
x
The proof of this solution appears in Galichon (2016).
Let
X
l{P}p(X)
X
p
r(X) | |
l{P} | |
p |
\mu\inl{P}p(X)
g
X
g(N)=0
\mu(N)=0
Let
\mu\in
r | |
l{P} | |
p |
(X)
\nu\inl{P}p(X)
c(x,y)=|x-y|p/p
p\in(1,infty),p-1+q-1=1
\kappa
r\inLp(X,\mu;X)
\kappa=(idX x r)*(\mu)\in\Gamma(\mu,\nu).
Moreover, if
\nu
r(x)=x-|\nabla\varphi(x)|q\nabla\varphi(x)
for
\mu
x\inX
c
\varphi
\nabla\varphi
\varphi
Consider a variant of the discrete problem above, where we have added an entropic regularization term to the objective function of the primal problem
\begin{align} &Minimize\sumx\in\gammaxycxy+\varepsilon\gammaxyln\gammaxy\\[4pt] &subjectto:\\[4pt] &\gamma\ge0\\[4pt] &\sumy\in\gammaxy=\mux,\forallx\inX\\[4pt] &\sumx\in\gammaxy=\nuy,\forally\inY \end{align}
One can show that the dual regularized problem is
max\varphi\sumx\in\varphix\mux+\sumy\in\psiyvy-\varepsilon\sumx\in\exp\left(
\varphix+\psiy-cxy | |
\varepsilon |
\right)
where, compared with the unregularized version, the "hard" constraint in the former dual () has been replaced by a "soft" penalization of that constraint (the sum of the terms). The optimality conditions in the dual problem can be expressed as
\mux=\sumy\in\exp\left(
\varphix+\psiy-cxy | |
\varepsilon |
\right)~\forallx\inX
\nuy=\sumx\in\exp\left(
\varphix+\psiy-cxy | |
\varepsilon |
\right)~\forally\inY
Denoting as the matrix of term , solving the dual is therefore equivalent to looking for two diagonal positive matrices and of respective sizes and , such that and . The existence of such matrices generalizes Sinkhorn's theorem and the matrices can be computed using the Sinkhorn–Knopp algorithm,[16] which simply consists of iteratively looking for to solve, and to solve . Sinkhorn–Knopp's algorithm is therefore a coordinate descent algorithm on the dual regularized problem.
The Monge–Kantorovich optimal transport has found applications in wide range in different fields. Among them are: