UBY-LMF explained

UBY-LMF^[1] ^[2] is a format for standardizing lexical resources for Natural Language Processing (NLP).^[3] UBY-LMF conforms to the ISO standard for lexicons: LMF, designed within the ISO-TC37, and constitutes a so-called serialization of this abstract standard.^[4] In accordance with the LMF, all attributes and other linguistic terms introduced in UBY-LMF refer to standardized descriptions of their meaning in ISOCat.

UBY-LMF has been implemented in Java and is actively developed as an Open Source project on Google Code.Based on this Java implementation, the large scale electronic lexicon UBY^[5] has automatically been created - it is the result of using UBY-LMF to standardize a range of diverse lexical resources frequently used for NLP applications.

In 2013, UBY contains 10 lexicons which are pairwise interlinked at the sense level:^[6] ^[7] ^[8]

English WordNet, Wiktionary, Wikipedia, FrameNet, VerbNet, OmegaWiki
German Wiktionary, Wikipedia, GermaNet, IMSLex-Subcat and
multilingual OmegaWiki.

A subset of lexicons integrated in UBY have been converted to a Semantic Web format according to the lemon lexicon model.^[9] This conversion is based on a mapping of UBY-LMF to the lemon lexicon model.

External references

Notes and References

Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek, Christian M Meyer: UBY-LMF - exploring the boundaries of language-independent lexicon models, in Gil Francopoulo, LMF Lexical Markup Framework, ISTE / Wiley 2013
Judith Eckle-Kohler, Iryna Gurevych, Silvana Hartmann, Michael Matuschek and Christian M. Meyer. UBY-LMF - A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF. In: Nicoletta Calzolari and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC), p. 275--282, May 2012.
Gottfried Herzog, Laurent Romary, Andreas Witt: Standards for Language Resources. Poster Presentation at the META-FORUM 2013 - META Exhibition, September 2013, Berlin, Germany.
Laurent Romary: TEI and LMF crosswalks. CoRR abs/1301.2444 (2013)
Iryna Gurevych, Judith Eckle-Kohler, Silvana Hartmann, Michael Matuschek, Christian M. Meyer, Christian Wirth: UBY – a large-scale unified lexical-semantic resource based on LMF, Proceedings of EACL, pp. 580–590, 2012, Avignon, France.
Christian M. Meyer and Iryna Gurevych. What Psycholinguists Know About Chemistry: Aligning Wiktionary and WordNet for Increased Domain Coverage, in: Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), p. 883–892, November 2011. Chiang Mai, Thailand.
Silvana Hartmann and Iryna Gurevych. FrameNet on the Way to Babel: Creating a Bilingual FrameNet Using Wiktionary as Interlingual Connection. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), vol. 1, p. 1363-1373, Association for Computational Linguistics, August 2013.
Michael Matuschek and Iryna Gurevych. Dijkstra-WSA: A Graph-Based Approach to Word Sense Alignment. In: Transactions of the Association for Computational Linguistics (TACL), vol. 1, p. 151-164, May 2013.
John McCrae, Guadalupe Aguado-de-Cea, Paul Buitelaar, Philipp Cimiano, Thierry Declerck, Asunción Gómez-Pérez, Jorge Gracia, Laura Hollink, Elena Montiel-Ponsoda, Dennis Spohr, Tobias Wunner. (2012) Interchanging lexical resources on the Semantic Web. Language Resources and Evaluation 46:701–719.