Learned sparse retrieval explained

Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents.[1] It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE[2] and its successor SPLADE v2.[3] Others include DeepCT,[4] uniCOIL,[5] EPIC,[6] DeepImpact,[7] TILDE and TILDEv2,[8] Sparta,[9] SPLADE-max, and DistilSPLADE-max.

Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state-of-the-art neural rankers on in-domain data.[10]

The Official SPLADE model weights and training code is released under a Creative Commons NonCommercial license.[11] But there are other independent implementations of SPLADE++ (a variant of SPLADE models) that are released under permissive licenses.

SPRINT is a toolkit for evaluating neural sparse retrieval systems.[12]

External links

Notes

  1. Book: Nguyen . Thong . MacAvaney . Sean . Yates . Andrew . Advances in Information Retrieval . A Unified Framework for Learned Sparse Retrieval . 2023 . Kamps . Jaap . Goeuriot . Lorraine . Crestani . Fabio . Maistro . Maria . Joho . Hideo . Davis . Brian . Gurrin . Cathal . Kruschwitz . Udo . Caputo . Annalina . https://link.springer.com/chapter/10.1007/978-3-031-28241-6_7 . Lecture Notes in Computer Science . 13982 . en . Cham . Springer Nature Switzerland . 101–116 . 2303.13416 . 10.1007/978-3-031-28241-6_7 . 978-3-031-28241-6. 257585074 .
  2. Book: Formal . Thibault . Piwowarski . Benjamin . Clinchant . Stéphane . Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval . SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking . 2021-07-11 . https://doi.org/10.1145/3404835.3463098 . SIGIR '21 . New York, NY, USA . Association for Computing Machinery . 2288–2292 . 2107.05720 . 10.1145/3404835.3463098 . 978-1-4503-8037-9. 235792467 .
  3. Formal . Thibault . Piworwarski . Benjamin . Lassance . Carlos . Clinchant . Stéphane . 21 September 2021 . SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval . cs.IR . 2109.10086v1.
  4. Book: Dai . Zhuyun . Callan . Jamie . Proceedings of the Web Conference 2020 . Context-Aware Document Term Weighting for Ad-Hoc Search . 2020-04-20 . 1897–1907 . http://dx.doi.org/10.1145/3366423.3380258 . New York, NY, USA . ACM . 10.1145/3366423.3380258. 9781450370233 . 218521094 .
  5. Lin . Jimmy . Ma . Xueguang . 28 June 2021 . A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques . cs.IR . 2106.14807.
  6. Book: MacAvaney . Sean . Nardini . Franco Maria . Perego . Raffaele . Tonellotto . Nicola . Goharian . Nazli . Frieder . Ophir . Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . Expansion via Prediction of Importance with Contextualization . 2020-07-25 . https://doi.org/10.1145/3397271.3401262 . SIGIR '20 . New York, NY, USA . Association for Computing Machinery . 1573–1576 . 2004.14245 . 10.1145/3397271.3401262 . 978-1-4503-8016-4. 216641912 .
  7. Book: Mallia . Antonio . Khattab . Omar . Suel . Torsten . Tonellotto . Nicola . Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval . Learning Passage Impacts for Inverted Indexes . 2021-07-11 . https://dl.acm.org/doi/10.1145/3404835.3463030 . SIGIR '21 . New York, NY, USA . Association for Computing Machinery . 1723–1727 . 2104.12016 . 10.1145/3404835.3463030 . 978-1-4503-8037-9. 233394068 .
  8. Zhuang . Shengyao . Zuccon . Guido . 13 September 2021 . Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion . cs.IR . 2108.08513.
  9. Zhao . Tiancheng . Lu . Xiaopeng . Lee . Kyusong . 28 September 2020 . SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval . cs.CL . 2009.13013.
  10. Book: Lassance . Carlos . Clinchant . Stéphane . Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval . An Efficiency Study for SPLADE Models . 2022-07-07 . https://doi.org/10.1145/3477495.3531833 . SIGIR '22 . New York, NY, USA . Association for Computing Machinery . 2220–2226 . 2207.03834 . 10.1145/3477495.3531833 . 978-1-4503-8732-3. 250340284 .
  11. Web site: splade/LICENSE at main · naver/splade . 2023-08-25 . GitHub . en.
  12. Book: Thakur . Nandan . Wang . Kexin . Gurevych . Iryna . Lin . Jimmy . Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval . SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval . 2023-07-18 . https://doi.org/10.1145/3539618.3591902 . SIGIR '23 . New York, NY, USA . Association for Computing Machinery . 2964–2974 . 2307.10488 . 10.1145/3539618.3591902 . 978-1-4503-9408-6. 259949923 .