Discounted cumulative gain (DCG) is a measure of ranking quality in information retrieval. It is often normalized so that it is comparable across queries, giving Normalized DCG (nDCG or NDCG). NDCG is often used to measure effectiveness of search engine algorithms and related applications. Using a graded relevance scale of documents in a search-engine result set, DCG sums the usefulness, or gain, of the results discounted by their position in the result list.[1] NDCG is DCG normalized by the maximum possible DCG of the result set when ranked from highest to lowest gain, thus adjusting for the different numbers of relevant results for different queries.
Two assumptions are made in using DCG and its related measures.
DCG is a refinement of a simpler measure, Cumulative Gain (CG). Cumulative Gain is the sum of the graded relevance values of all results in a search result list. CG does not take into account the rank (position) of a result in the result list. The CG at a particular rank position
p
CGp |
=
p | |
\sum | |
i=1 |
reli
Where
reli
i
The value computed with the CG function is unaffected by changes in the ordering of search results. That is, moving a highly relevant document
di
dj
i,j\leqp
The premise of DCG is that highly relevant documents appearing lower in a search result list should be penalized, as the graded relevance value is reduced logarithmically proportional to the position of the result.
The usual formula of DCG accumulated at a particular rank position
p
DCGp |
=
p | |
\sum | |
i=1 |
reli | |
log2(i+1) |
=rel1+
p | |
\sum | |
i=2 |
reli | |
log2(i+1) |
Until 2013, there was no theoretically sound justification for using a logarithmic reduction factor[2] other than the fact that it produces a smooth reduction. But Wang et al. (2013)[3] gave theoretical guarantee for using the logarithmic reduction factor in Normalized DCG (NDCG). The authors show that for every pair of substantially different ranking functions, the NDCG can decide which one is better in a consistent manner.
An alternative formulation of DCG[4] places stronger emphasis on retrieving relevant documents:
DCGp |
=
p | |
\sum | |
i=1 |
| |||||||
log2(i+1) |
The latter formula is commonly used in industrial applications including major web search companies[5] and data science competition platforms such as Kaggle.[6]
These two formulations of DCG are the same when the relevance values of documents are binary;[2]
reli\in\{0,1\}
Note that Croft et al. (2010) and Burges et al. (2005) present the second DCG with a log of base e, while both versions of DCG above use a log of base 2. When computing NDCG with the first formulation of DCG, the base of the log does not matter, but the base of the log does affect the value of NDCG for the second formulation. Clearly, the base of the log affects the value of DCG in both formulations.
Convex and smooth approximations to DCG have also been developed, for use as an objective function in gradient based learning methods.[7]
Search result lists vary in length depending on the query. Comparing a search engine's performance from one query to the next cannot be consistently achieved using DCG alone, so the cumulative gain at each position for a chosen value of
p
p
nDCGp |
=
DCGp | |
IDCGp |
where IDCG is ideal discounted cumulative gain,
IDCGp |
=
|RELp| | |
\sum | |
i=1 |
| |||||||
log2(i+1) |
and
RELp
The nDCG values for all queries can be averaged to obtain a measure of the average performance of a search engine's ranking algorithm. Note that in a perfect ranking algorithm, the
DCGp
IDCGp
The main difficulty encountered in using nDCG is the unavailability of an ideal ordering of results when only partial relevance feedback is available.
Presented with a list of documents in response to a search query, an experiment participant is asked to judge the relevance of each document to the query. Each document is to be judged on a scale of 0-3 with 0 meaning not relevant, 3 meaning highly relevant, and 1 and 2 meaning "somewhere in between". For the documents ordered by the ranking algorithm as
D1,D2,D3,D4,D5,D6
the user provides the following relevance scores:
3,2,3,0,1,2
That is: document 1 has a relevance of 3, document 2 has a relevance of 2, etc. The Cumulative Gain of this search result listing is:
CG6 |
=
6 | |
\sum | |
i=1 |
reli=3+2+3+0+1+2=11
Changing the order of any two documents does not affect the CG measure. If
D3
D4
i | reli | log2(i+1) |
| |||
---|---|---|---|---|---|---|
1 | 3 | 1 | 3 | |||
2 | 2 | 1.585 | 1.262 | |||
3 | 3 | 2 | 1.5 | |||
4 | 0 | 2.322 | 0 | |||
5 | 1 | 2.585 | 0.387 | |||
6 | 2 | 2.807 | 0.712 |
So the
DCG6
DCG6 |
=
6 | |
\sum | |
i=1 |
reli | |
log2(i+1) |
=3+1.262+1.5+0+0.387+0.712=6.861
Now a switch of
D3
D4
The performance of this query to another is incomparable in this form since the other query may have more results, resulting in a larger overall DCG which may not necessarily be better. In order to compare, the DCG values must be normalized.
To normalize DCG values, an ideal ordering for the given query is needed. For this example, that ordering would be the monotonically decreasing sort of all known relevance judgments. In addition to the six from this experiment, suppose we also know there is a document
D7
D8
3,3,3,2,2,2,1,0
The ideal ranking is cut again to length 6 to match the depth of analysis of the ranking:
3,3,3,2,2,2
The DCG of this ideal ordering, or IDCG (Ideal DCG), is computed to rank 6:
IDCG6 |
=8.740
And so the nDCG for this query is given as:
nDCG6 |
=
DCG6 | |
IDCG6 |
=
6.861 | |
8.740 |
=0.785