The Binary Independence Model (BIM) in computing and information science is a probabilistic information retrieval technique. The model makes some simple assumptions to make the estimation of document/query similarity probable and feasible.
The Binary Independence Assumption is the that documents are binary vectors. That is, only the presence or absence of terms in documents are recorded. Terms are independently distributed in the set of relevant documents and they are also independently distributed in the set of irrelevant documents.The representation is an ordered set of Boolean variables. That is, the representation of a document or query is a vector with one Boolean element for each term under consideration. More specifically, a document is represented by a vector where if term t is present in the document d and if it's not. Many documents can have the same vector representation with this simplification. Queries are represented in a similar way."Independence" signifies that terms in the document are considered independently from each other and no association between terms is modeled. This assumption is very limiting, but it has been shown that it gives good enough results for many situations. This independence is the "naive" assumption of a Naive Bayes classifier, where properties that imply each other are nonetheless treated as independent for the sake of simplicity. This assumption allows the representation to be treated as an instance of a Vector space model by considering each term as a value of 0 or 1 along a dimension orthogonal to the dimensions used for the other terms.
The probability
P(R|d,q)
P(R|x,q)
P(R|x,q)=
P(x|R,q)*P(R|q) | |
P(x|q) |
where
P(x|R=1,q)
P(x|R=0,q)
P(R=1|q)
P(R=0|q)
P(R=1|x,q)+P(R=0|x,q)=1
Given a binary query and the dot product as the similarity function between a document and a query, the problem is to assign weights to theterms in the query such that the retrieval effectiveness will be high. Let
pi
qi
Yi=
pi*(1-qi) | |
(1-pi)*qi |
Yi
Yj
logYi
The Binary Independence Model was introduced by Yu and Salton. The name Binary Independence Model was coined by Robertson and Spärck Jones who used the log-odds probability of the probabilistic relevance model to derive
logYi
P(R|d,q)