In machine learning, a ranking SVM is a variant of the support vector machine algorithm, which is used to solve certain ranking problems (via learning to rank). The ranking SVM algorithm was published by Thorsten Joachims in 2002.[1] The original purpose of the algorithm was to improve the performance of an internet search engine. However, it was found that ranking SVM also can be used to solve other problems such as Rank SIFT.[2]
The ranking SVM algorithm is a learning retrieval function that employs pairwise ranking methods to adaptively sort results based on how 'relevant' they are for a specific query. The ranking SVM function uses a mapping function to describe the match between a search query and the features of each of the possible results. This mapping function projects each data pair (such as a search query and clicked web-page, for example) onto a feature space. These features are combined with the corresponding click-through data (which can act as a proxy for how relevant a page is for a specific query) and can then be used as the training data for the ranking SVM algorithm.
Generally, ranking SVM includes three steps in the training period:
Suppose
C
N
ci
r
C
r
C
N x N
ci
cj
r ci<r cj
Kendall's Tau also refers to Kendall tau rank correlation coefficient, which is commonly used to compare two ranking methods for the same data set.
Suppose
r1
r2
C
r1
r2
\tau(r1,r2)={P-Q\overP+Q}=1-{2Q\overP+Q}
where
P
Q
di
dj
ra
rb
di
dj
Information retrieval quality is usually evaluated by the following three measurements:
For a specific query to a database, let
Prelevant
Pretrieved
\begin{align} &precision=
\left|Prelevant\capPretrieved\right| | |
\left|Pretrieved\right| |
;\\[6pt] &recall=
\left|Prelevant\capPretrieved\right| | |
\left|Prelevant\right| |
;\\[6pt] &averageprecision=
1 | |
\int | |
0 |
Prec(recall)drecall,\\ \end{align}
where
Prec(Recall)
Precision
Recall
Let
r*
rf(q)
rf(q)
\operatorname{AvgPrec}(rf(q))\geqq{1\overR}\left[Q+\binom{R+1}{2}\right]-1\left(
R | |
\sum | |
i=1 |
\sqrti\right)2
where
Q
r*
rf(q)
R
Suppose
(\vecxi,yi)
\vecxi
yi
\vecxi
\begin{align} &minimizeV(\vecw,\vec\xi)={1\over2}\vecw ⋅ \vecw+CF\sum
\sigma | |
\xi | |
i |
\\[6pt] &subjectto\\[6pt] &\begin{array}{l} \sigma\geqq0;\\ \forallyi(\vecw\vecxi+b)\geqq
\sigma; | |
1-\xi | |
i |
\end{array} \\[6pt] &where\\[6pt] &\begin{array}{l} bisascalar;\\ \forallyi\in\left\{-1,1\right\};\\ \forall\xii\geqq0;\\ \end{array} \end{align}
The solution of the above optimization problem can be represented as a linear combination of the feature vectors
xi
\vecw*=\sumi\alphaiyixi
where
\alphai
Let
\tauP(f)
r*
rf(q)
\tauP(f)
rf(q)
The negative
\tauP(f)
rf(q)
Lexpected=-\tauP(f)=-\int\tau(rf(q),r*)dPr(q,r*)
where
Pr(q,r*)
r*
q
Since the expected loss function is not applicable, the following empirical loss function is selected for the training data in practice.
Lempirical=-\tauS(f)=-{1\overn}
n | |
\sum | |
i=1 |
\tau(r | |
f(qi) |
*) | |
,r | |
i |
n
n
A mapping function
\Phi(q,d)
The points generated by the training data are in the feature space, which also carry the rank information (the labels). These labeled points can be used to find the boundary (classifier) that specifies the order of them. In the linear case, such boundary (classifier) is a vector.
Suppose
ci
cj
(ci,cj)\inr
ci
cj
r
\vecw
\begin{align} &minimizeV(\vecw,\vec\xi)={1\over2}\vecw ⋅ \vecw+constant ⋅ \sum\xii,j,k\\[6pt] &subjectto\\[6pt] &\begin{array}{l} \forall\xii,j,k\geqq0\\ \forall(ci,cj)\in
*\\ | |
r | |
k |
\vecw(\Phi(q1,ci)-\Phi(q1,cj))\geqq1-\xii,j,1;\\ \vdots\\ \vecw(\Phi(qn,ci)-\Phi(qn,cj))\geqq1-\xii,j,n;\\ wherek\in\left\{1,2,\ldots,n\right\}, i,j\in\left\{1,2,\ldots\right\}. \end{array} \end{align}
The above optimization problem is identical to the classical SVM classification problem, which is the reason why this algorithm is called Ranking-SVM.
The optimal vector
\vecw*
\vecw*=\sum
*\Phi(q | |
\alpha | |
k,c |
i)
So the retrieval function could be formed based on such optimal classifier.
For new query
q
q
Ranking SVM can be applied to rank the pages according to the query. The algorithm can be trained using click-through data, where consists of the following three parts:
The combination of 2 and 3 cannot provide full training data order which is needed to apply the full SVM algorithm. Instead, it provides a part of the ranking information of the training data. So the algorithm can be slightly revised as follows.
\begin{align} &minimizeV(\vecw,\vec\xi)={1\over2}\vecw ⋅ \vecw+constant ⋅ \sum\xii,j,k\\[6pt] &subjectto\\[6pt] &\begin{array}{l} \forall\xii,j,k\geqq0\\ \forall(ci,cj)\inrk'\\ \vecw(\Phi(q1,ci)-\Phi(q1,cj))\geqq1-\xii,j,1;\\ \vdots\\ \vecw(\Phi(qn,ci)-\Phi(qn,cj))\geqq1-\xii,j,n;\\ where k\in\left\{1,2,\ldots,n\right\}, i,j\in\left\{1,2,\ldots\right\}. \end{array} \end{align}
The method
r'