In computer vision, triangulation refers to the process of determining a point in 3D space given its projections onto two, or more, images. In order to solve this problem it is necessary to know the parameters of the camera projection function from 3D to 2D for the cameras involved, in the simplest case represented by the camera matrices. Triangulation is sometimes also referred to as reconstruction or intersection.
The triangulation problem is in principle trivial. Since each point in an image corresponds to a line in 3D space, all points on the line in 3D are projected to the point in the image. If a pair of corresponding points in two, or more images, can be found it must be the case that they are the projection of a common 3D point x. The set of lines generated by the image points must intersect at x (3D point) and the algebraic formulation of the coordinates of x (3D point) can be computed in a variety of ways, as is presented below.
In practice, however, the coordinates of image points cannot be measured with arbitrary accuracy. Instead, various types of noise, such as geometric noise from lens distortion or interest point detection error, lead to inaccuracies in the measured image coordinates. As a consequence, the lines generated by the corresponding image points do not always intersect in 3D space. The problem, then, is to find a 3D point which optimally fits the measured image points. In the literature there are multiple proposals for how to define optimality and how to find the optimal 3D point. Since they are based on different optimality criteria, the various methods produce different estimates of the 3D point x when noise is involved.
In the following, it is assumed that triangulation is made on corresponding image points from two views generated by pinhole cameras.
The image to the left illustrates the epipolar geometry of a pair of stereo cameras of pinhole model. A point x (3D point) in 3D space is projected onto the respective image plane along a line (green) which goes through the camera's focal point,
O1
O2
y1
y2
y1
y2
The image to the right shows the real case. The position of the image points
y1
y2
As a consequence, the measured image points are
y'1
y'2
y1
y2
y'1
y'2
y'1
y'2
This observation leads to the problem which is solved in triangulation. Which 3D point xest is the best estimate of x given
y'1
y'2
All triangulation methods produce xest = x in the case that
y1=y'1
y2=y'2
A triangulation method can be described in terms of a function
\tau
x\sim\tau(y'1,y'2,C1,C2)
where
y'1,y'2
C1,C2
\sim
\tau
Before looking at the specific methods, that is, specific functions
\tau
Some of the methods fail to correctly compute an estimate of x (3D point) if it lies in a certain subset of the 3D space, corresponding to some combination of
y'1,y'2,C1,C2
In some applications, it is desirable that the triangulation is independent of the coordinate system used to represent 3D points; if the triangulation problem is formulated in one coordinate system and then transformed into another the resulting estimate xest should transform in the same way. This property is commonly referred to as invariance. Not every triangulation method assures invariance, at least not for general types of coordinate transformations.
For a homogeneous representation of 3D coordinates, the most general transformation is a projective transformation, represented by a
4 x 4
T
\barx\simTx
then the camera matrices must transform as (Ck)
\barCk\simCkT-1
to produce the same homogeneous image coordinates (yk)
yk\sim\barCk\barx=Ckx
If the triangulation function
\tau
T
\barx\rm\simTx\rm
from which follows that
\tau(y'1,y'2,C1,C2)\simT-1\tau(y'1,y'2,C1T-1,C2T-1),
y'1,y'2
For each triangulation method, it can be determined if this last relation is valid. If it is, it may be satisfied only for a subset of the projective transformations, for example, rigid or affine transformations.
The function
\tau
\tau
\tau
Each of the two image points
y'1
y'2
L'1
L'2
C1,C2
d
d(L,x)
L
x
d(L'1,x)2+d(L'2,x)2
It turns out that xest lies exactly at the middle of the shortest line segment which joins the two projection lines.
See main article: Direct linear transformation.
The problem to be solved there is how to compute
(x1,x2,x3)
(y1,y2)
(y'1,y'2)
Let
rk
R
R=\begin{pmatrix}-r1-\ -r2-\ -r3-\end{pmatrix}
Combining the above relations between 3D coordinates in the two coordinate systems and the mapping between 3D and 2D points described earlier gives
y'1=
x'1 | |
x'3 |
=
r1 ⋅ (\tilde{x | |
- |
t)}{r3 ⋅ (\tilde{x
or
x3=
(r1-y'1r3) ⋅ t | |
(r1-y'1r3) ⋅ y |
Once
x3
\begin{pmatrix}x1\ x2\end{pmatrix}=x3\begin{pmatrix}y1\ y2\end{pmatrix}
The above derivation is not unique. It is also possible to start with an expression for
y'2
x3
x3=
(r2-y'2r3) ⋅ t | |
(r2-y'2r3) ⋅ y |
In the ideal case, when the camera maps the 3D points according to a perfect pinhole camera and the resulting 2D points can be detected without any noise, the two expressions for
x3
x3
There are also other types of extensions of the above computations which are possible. They started with an expression of the primed image coordinates and derived 3D coordinates in the unprimed system. It is also possible to start with unprimed image coordinates and obtain primed 3D coordinates, which finally can be transformed into unprimed 3D coordinates. Again, in the ideal case the result should be equal to the above expressions, but in practice they may deviate.
A final remark relates to the fact that if the essential matrix is determined from corresponding image coordinate, which often is the case when 3D points are determined in this way, the translation vector
t