Ranking SVM explained

In machine learning, a ranking SVM is a variant of the support vector machine algorithm, which is used to solve certain ranking problems (via learning to rank). The ranking SVM algorithm was published by Thorsten Joachims in 2002.^[1] The original purpose of the algorithm was to improve the performance of an internet search engine. However, it was found that ranking SVM also can be used to solve other problems such as Rank SIFT.^[2]

Description

The ranking SVM algorithm is a learning retrieval function that employs pairwise ranking methods to adaptively sort results based on how 'relevant' they are for a specific query. The ranking SVM function uses a mapping function to describe the match between a search query and the features of each of the possible results. This mapping function projects each data pair (such as a search query and clicked web-page, for example) onto a feature space. These features are combined with the corresponding click-through data (which can act as a proxy for how relevant a page is for a specific query) and can then be used as the training data for the ranking SVM algorithm.

Generally, ranking SVM includes three steps in the training period:

It maps the similarities between queries and the clicked pages onto a certain feature space.
It calculates the distances between any two of the vectors obtained in step 1.
It forms an optimization problem which is similar to a standard SVM classification and solves this problem with the regular SVM solver.

Background

Ranking method

Suppose

is a data set containing

elements

c_i

is a ranking method applied to

. Then the

can be represented as a

N x N

binary matrix. If the rank of

c_i

is higher than the rank of

c_j

, i.e.

r c_i<r c_j

, the corresponding position of this matrix is set to value of "1". Otherwise the element in that position will be set as the value "0".

Kendall's tau^[3] ^[4]

Kendall's Tau also refers to Kendall tau rank correlation coefficient, which is commonly used to compare two ranking methods for the same data set.

Suppose

r₁

and

r₂

are two ranking method applied to data set

, the Kendall's Tau between

r₁

and

r₂

can be represented as follows:

\tau(r_1,r₂₎={P-Q\overP+Q}=1-{2Q\overP+Q}

where

is the number of concordant pairs and

is the number of discordant pairs (inversions). A pair

d_i

and

d_j

is concordant if both

r_a

and

r_b

agree in how they order

d_i

and

d_j

. It is discordant if they disagree.

Information retrieval quality^[5] ^[6] ^[7]

Information retrieval quality is usually evaluated by the following three measurements:

Precision
Recall
Average precision

For a specific query to a database, let

P_relevant

be the set of relevant information elements in the database and

P_retrieved

be the set of the retrieved information elements. Then the above three measurements can be represented as follows:

\begin{align} &precision=

	\left\|P_relevant\capP_retrieved\right\|
	\left\|P_retrieved\right\|

;\\[6pt] &recall=

	\left\|P_relevant\capP_retrieved\right\|
	\left\|P_relevant\right\|

;\\[6pt] &averageprecision=

	1
\int
	0

Prec(recall)drecall,\\ \end{align}

where

Prec(Recall)

is the

Precision

Recall

Let

r^*

and

r_f(q)

be the expected and proposed ranking methods of a database respectively, the lower bound of Average Precision of method

r_f(q)

can be represented as follows:

\operatorname{AvgPrec}(r_f(q))\geqq{1\overR}\left[Q+\binom{R+1}{2}\right]^-1\left(

	R
\sum
	i=1

\sqrti\right)²

where

is the number of different elements in the upper triangular parts of matrices of

r^*

and

r_f(q)

and

is the number of relevant elements in the data set.

SVM classifier^[8]

Suppose

(\vecx_i,y_i)

is the element of a training data set, where

\vecx_i

is the feature vector and

y_i

is the label (which classifies the category of

\vecx_i

). A typical SVM classifier for such data set can be defined as the solution of the following optimization problem.

\begin{align} &minimizeV(\vecw,\vec\xi)={1\over2}\vecw ⋅ \vecw+CF\sum

	\sigma
\xi
	i

\\[6pt] &subjectto\\[6pt] &\begin{array}{l} \sigma\geqq0;\\ \forally_i(\vecw\vecx_i+b)\geqq

	\sigma;
1-\xi
	i

\end{array} \\[6pt] &where\\[6pt] &\begin{array}{l} bisascalar;\\ \forally_i\in\left\{-1,1\right\};\\ \forall\xi_i\geqq0;\\ \end{array} \end{align}

The solution of the above optimization problem can be represented as a linear combination of the feature vectors

x_i

\vecw^*=\sum_i\alpha_iy_ix_i

where

\alpha_i

is the coefficients to be determined.

Ranking SVM algorithm

Loss function

Let

\tau_P(f)

be the Kendall's tau between expected ranking method

r^*

and proposed method

r_f(q)

, it can be proved that maximizing

\tau_P(f)

helps to minimize the lower bound of the Average Precision of

r_f(q)

Expected loss function^[9]

The negative

\tau_P(f)

can be selected as the loss function to minimize the lower bound of average precision of

r_f(q)

L_{expected=-\tau}_P(f)=-\int\tau(r_f(q),r^*)dPr(q,r^*)

where

Pr(q,r^*)

is the statistical distribution of

r^*

to certain query

Empirical loss function

Since the expected loss function is not applicable, the following empirical loss function is selected for the training data in practice.

L_empirical=-\tau_S(f)=-{1\overn}

	n
\sum
	i=1

\tau(r
	f(q_i)

	*)
,r
	i

Collecting training data

i.i.d. queries are applied to a database and each query corresponds to a ranking method. The training data set has

elements. Each element contains a query and the corresponding ranking method.

Feature space

A mapping function

\Phi(q,d)

^[10] ^[11] is required to map each query and the element of database to a feature space. Then each point in the feature space is labelled with certain rank by ranking method.

Optimization problem

The points generated by the training data are in the feature space, which also carry the rank information (the labels). These labeled points can be used to find the boundary (classifier) that specifies the order of them. In the linear case, such boundary (classifier) is a vector.

Suppose

c_i

and

c_j

are two elements in the database and denote

(c_i,c_j)\inr

if the rank of

c_i

is higher than

c_j

in certain ranking method

. Let vector

\vecw

be the linear classifier candidate in the feature space. Then the ranking problem can be translated to the following SVM classification problem. Note that one ranking method corresponds to one query.

\begin{align} &minimizeV(\vecw,\vec\xi)={1\over2}\vecw ⋅ \vecw+constant ⋅ \sum\xi_i,j,k\\[6pt] &subjectto\\[6pt] &\begin{array}{l} \forall\xi_i,j,k\geqq0\\ \forall(c_i,c_j)\in

	*\\
r
	k

\vecw(\Phi(q_1,c_i)-\Phi(q_1,c_j))\geqq1-\xi_i,j,1;\\ \vdots\\ \vecw(\Phi(q_n,c_i)-\Phi(q_n,c_j))\geqq1-\xi_i,j,n;\\ wherek\in\left\{1,2,\ldots,n\right\}, i,j\in\left\{1,2,\ldots\right\}. \end{array} \end{align}

The above optimization problem is identical to the classical SVM classification problem, which is the reason why this algorithm is called Ranking-SVM.

Retrieval function

The optimal vector

\vecw^*

obtained by the training sample is

\vecw^*=\sum

	*\Phi(q
\alpha
	k,c

_i)

So the retrieval function could be formed based on such optimal classifier.
For new query

, the retrieval function first projects all elements of the database to the feature space. Then it orders these feature points by the values of their inner products with the optimal vector. And the rank of each feature point is the rank of the corresponding element of database for the query

Application of ranking SVM

Ranking SVM can be applied to rank the pages according to the query. The algorithm can be trained using click-through data, where consists of the following three parts:

Query.
Present ranking of search results
Search results clicked on by user

The combination of 2 and 3 cannot provide full training data order which is needed to apply the full SVM algorithm. Instead, it provides a part of the ranking information of the training data. So the algorithm can be slightly revised as follows.

\begin{align} &minimizeV(\vecw,\vec\xi)={1\over2}\vecw ⋅ \vecw+constant ⋅ \sum\xi_i,j,k\\[6pt] &subjectto\\[6pt] &\begin{array}{l} \forall\xi_i,j,k\geqq0\\ \forall(c_i,c_j)\inr_k'\\\vecw(\Phi(q_1,c_i)-\Phi(q_1,c_j))\geqq1-\xi_i,j,1;\\ \vdots\\ \vecw(\Phi(q_n,c_i)-\Phi(q_n,c_j))\geqq1-\xi_i,j,n;\\ where k\in\left\{1,2,\ldots,n\right\}, i,j\in\left\{1,2,\ldots\right\}. \end{array} \end{align}

The method

does not provide ranking information of the whole dataset, it's a subset of the full ranking method. So the condition of optimization problem becomes more relax compared with the original Ranking-SVM.

Notes and References

Joachims, T. (2002), "Optimizing Search Engines using Clickthrough Data", Proceedings of the ACM Conference on Knowledge Discovery and Data Mining
Bing Li; Rong Xiao; Zhiwei Li; Rui Cai; Bao-Liang Lu; Lei Zhang; "Rank-SIFT: Learning to rank repeatable local interest points", Computer Vision and Pattern Recognition (CVPR), 2011
M. Kemeny. Rank Correlation Methods, Hafner, 1955
A. Mood, F. Graybill, and D. Boes. Introduction to the Theory of Statistics. McGraw-Hill, 3rd edition, 1974
J. Kemeny and L. Snell. Mathematical Models in THE Social Sciences. Ginn & Co. 1962
Y. Yao. "Measuring retrieval effectiveness based on user preference of documents." Journal of the American Society for Information Science, 46(2): 133–145, 1995.
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley-Longman, Harlow, UK, May 1999
C. Cortes and V.N. Vapnik. "Support-vector networks." Machine Learning Journal, 20: 273–297,1995
V. Vapnik. Statistical Learning Theory. WILEY, Chichester, GB, 1998
N. Fuhr. "Optimum polynomial retrieval functions based on the probability ranking principle." ACM TRANSACTIONS on Information Systems, 7(3): 183–204
N. Fuhr, S. Hartmann, G. Lustig, M. Schwantner, K. Tzeras, and G. Knorz. "Air/x – a rule-based multistage indexing system for large subject fields." In RIAO, 1991

Ranking SVM explained

Description

Background

Ranking method

Kendall's tau[3] [4]

Information retrieval quality[5] [6] [7]

SVM classifier[8]

Ranking SVM algorithm

Loss function

Collecting training data

Feature space

Optimization problem

Retrieval function

Application of ranking SVM

Notes and References

Kendall's tau^[3] ^[4]

Information retrieval quality^[5] ^[6] ^[7]

SVM classifier^[8]