Probabilistic relevance model explained

The probabilistic relevance model[1] [2] was devised by Stephen E. Robertson and Karen Spärck Jones as a framework for probabilistic models to come. It is a formalism of information retrieval useful to derive ranking functions used by search engines and web search engines in order to rank matching documents according to their relevance to a given search query. It is a theoretical model estimating the probability that a document dj is relevant to a query q. The model assumes that this probability of relevance depends on the query and document representations. Furthermore, it assumes that there is a portion of all documents that is preferred by the user as the answer set for query q. Such an ideal answer set is called R and should maximize the overall probability of relevance to that user. The prediction is that documents in this set R are relevant to the query, while documents not present in the set are non-relevant.

sim(dj,q)=

P(R|\vec{d
j)}{P(\bar{R}|\vec{d}

j)}

Related models

There are some limitations to this framework that need to be addressed by further development:

To address these and other concerns, other models have been developed from the probabilistic relevance framework, among them the Binary Independence Model from the same author. The best-known derivative of this framework is the Okapi (BM25) weighting scheme, along with BM25F, a modification thereof.

Notes and References

  1. Robertson . S. E. . Jones . K. Spärck . Relevance weighting of search terms . Journal of the American Society for Information Science . May 1976 . 27 . 3 . 129–146 . 10.1002/asi.4630270302.
  2. Robertson . Stephen . Zaragoza . Hugo . The Probabilistic Relevance Framework: BM25 and Beyond . Foundations and Trends in Information Retrieval . 2009 . 3 . 4 . 333–389 . 10.1561/1500000019 . 10.1.1.156.5282 .