Hapax legomenon explained

In corpus linguistics, a hapax legomenon (also or ; hapax legomena; sometimes abbreviated to hapax, plural hapaxes) is a word or an expression that occurs only once within a context: either in the written record of an entire language, in the works of an author, or in a single text. The term is sometimes incorrectly used to describe a word that occurs in just one of an author's works but more than once in that particular work. Hapax legomenon is a transliteration of Greek Greek, Ancient (to 1453);: ἅπαξ λεγόμενον, meaning "said once".

The related terms dis legomenon, tris legomenon, and tetrakis legomenon respectively refer to double, triple, or quadruple occurrences, but are far less commonly used.

Hapax legomena are quite common, as predicted by Zipf's law,[1] which states that the frequency of any word in a corpus is inversely proportional to its rank in the frequency table. For large corpora, about 40% to 60% of the words are hapax legomena, and another 10% to 15% are dis legomena.[2] Thus, in the Brown Corpus of American English, about half of the 50,000 distinct words are hapax legomena within that corpus.[3]

Hapax legomenon refers to the appearance of a word or an expression in a body of text, not to either its origin or its prevalence in speech. It thus differs from a nonce word, which may never be recorded, may find currency and may be widely recorded, or may appear several times in the work which coins it, and so on.

Significance

Hapax legomena in ancient texts are usually difficult to decipher, since it is easier to infer meaning from multiple contexts than from just one. For example, many of the remaining undeciphered Mayan glyphs are hapax legomena, and Biblical (particularly Hebrew; see § Hebrew) hapax legomena sometimes pose problems in translation. Hapax legomena also pose challenges in natural language processing.[4]

Some scholars consider Hapax legomena useful in determining the authorship of written works. P. N. Harrison, in The Problem of the Pastoral Epistles (1921)[5] made hapax legomena popular among Bible scholars, when he argued that there are considerably more of them in the three Pastoral Epistles than in other Pauline Epistles. He argued that the number of hapax legomena in a putative author's corpus indicates his or her vocabulary and is characteristic of the author as an individual.

Harrison's theory has faded in significance due to a number of problems raised by other scholars. For example, in 1896, W. P. Workman found the following numbers of hapax legomena in each Pauline Epistle:

!Pauline Epistle!Hapax legomena
Epistle to the Romans113
First Epistle to the Corinthians110
Second Epistle to the Corinthians99
Epistle to the Galatians34
Epistle to the Ephesians43
Epistle to the Philippians41
Epistle to the Colossians38
First Epistle to the Thessalonians23
Second Epistle to the Thessalonians11
First Epistle to Timothy82
Second Epistle to Timothy53
Epistle to Titus33
Epistle to Philemon5

At first glance, the last three totals (for the Pastoral Epistles) are not out of line with the others.[6] To take account of the varying length of the epistles, Workman also calculated the average number of hapax legomena per page of the Greek text, which ranged from 3.6 to 13, as summarized in the diagram on the right. Although the Pastoral Epistles have more hapax legomena per page, Workman found the differences to be moderate in comparison to the variation among other Epistles. This was reinforced when Workman looked at several plays by Shakespeare, which showed similar variations (from 3.4 to 10.4 per page of Irving's one-volume edition), as summarized in the second diagram on the right.

Apart from author identity, there are several other factors that can explain the number of hapax legomena in a work:[7]

In the particular case of the Pastoral Epistles, all of these variables are quite different from those in the rest of the Pauline corpus, and hapax legomena are no longer widely accepted as strong indicators of authorship; those who reject Pauline authorship of the Pastorals rely on other arguments.[8]

There are also subjective questions over whether two forms amount to "the same word": dog vs. dogs, clue vs. clueless, sign vs. signature; many other gray cases also arise. The Jewish Encyclopedia points out that, although there are 1,500 hapaxes in the Hebrew Bible, only about 400 are not obviously related to other attested word forms.[9]

A final difficulty with the use of hapax legomena for authorship determination is that there is considerable variation among works known to be by a single author, and disparate authors often show similar values. In other words, hapax legomena are not a reliable indicator. Authorship studies now usually use a wide range of measures to look for patterns rather than relying upon single measurements.

Computer science

In the fields of computational linguistics and natural language processing (NLP), esp. corpus linguistics and machine-learned NLP, it is common to disregard hapax legomena (and sometimes other infrequent words), as they are likely to have little value for computational techniques. This disregard has the added benefit of significantly reducing the memory use of an application, since, by Zipf's law, many words are hapax legomena.[10]

Examples

The following are some examples of hapax legomena in languages or corpora.

Arabic

In the Qurʾān:

Chinese and Japanese

Classical Chinese and Japanese literature contains many Chinese characters that feature only once in the corpus, and their meaning and pronunciation has often been lost. Known in Japanese as, literally "lonely characters", these can be considered a type of hapax legomenon.[12] For example, the Classic of Poetry uses the character Chinese: [[:wiktionary:篪|篪]] exactly once in the verse Chinese: 「伯氏吹塤, 仲氏吹篪」, and it was only through the discovery of a description by Guo Pu (276–324 AD) that the character could be associated with a specific type of ancient flute.

English

It is fairly common for authors to "coin" new words to convey a particular meaning or for the sake of entertainment, without any suggestion that they are "proper" words. For example, P.G. Wodehouse and Lewis Carroll frequently coined novel words. Indexy, below, appears to be an example of this.

German

Ancient Greek

According to classical scholar Clyde Pharr, "the Iliad has 1097 hapax legomena, while the Odyssey has 868".[15] Others have defined the term differently, however, and count as few as 303 in the Iliad and 191 in the Odyssey.[16]

Hebrew

The number of distinct hapax legomena in the Hebrew Bible is 1,480 (out of a total of 8,679 distinct words used).[22] However, due to Hebrew roots, suffixes and prefixes, only 400 are "true" hapax legomena.[9] A full list can be seen at the Jewish Encyclopedia entry for "Hapax Legomena".[9]

Some examples include:

Hungarian

Irish

Italian

Latin

Slavic

Spanish

In popular culture

See also

External links

Notes and References

  1. Paul Baker, Andrew Hardie, and Tony McEnery, A Glossary of Corpus Linguistics, Edinburgh University Press, 2006, page 81, .
  2. András Kornai, Mathematical Linguistics, Springer, 2008, page 72, .
  3. Kirsten Malmkjær, The Linguistics Encyclopedia , 2nd ed, Routledge, 2002,, p. 87.
  4. Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing,MIT Press, 1999, page 22, .
  5. P.N. Harrison. The Problem of the Pastoral Epistles. Oxford University Press, 1921.
  6. Workman, "The Hapax Legomena of St. Paul", Expository Times, 7 (1896:418), noted in The Catholic Encyclopedia, s.v. "Epistles to Timothy and Titus" .
  7. Steven J. DeRose. "A Statistical Analysis of Certain Linguistic Arguments Concerning the Authorship of the Pastoral Epistles." Honors thesis, Brown University, 1982; Terry L. Wilder. "A Brief Defense of the Pastoral Epistles' Authenticity". Midwestern Journal of Theology 2.1 (Fall 2003), 38–4. (on-line)
  8. Mark Harding. What are they saying about the Pastoral epistles?, Paulist Press, 2001, page 12., .
  9. Article on Hapax Legomena in Jewish Encyclopedia. Includes a list of all the Old Testament hapax legomena, by book.
  10. D. Jurafsky and J.H. Martin (2009). Speech and Language Processing. Prentice Hall.
  11. Orhan Elmaz. "Die Interpretationsgeschichte der koranischen Hapaxlegomena." Doctoral thesis, University of Vienna, 2008, page 29
  12. Book: Kerr, Alex. Lost Japan. 2015-09-03. Penguin UK. 9780141979755. en. 2021-05-15. 2022-06-01. https://web.archive.org/web/20220601202345/https://books.google.com/books?id=GPexBQAAQBAJ&q=alex+kerr+%22hapax%22&pg=PT82. live.
  13. Web site: Historical Thesaurus :: Search. historicalthesaurus.arts.gla.ac.uk. 2017-10-28. 2017-10-28. https://web.archive.org/web/20171028201222/http://historicalthesaurus.arts.gla.ac.uk/category/?type=search&qsearch=Flother%20&word=Flother%20&page=1#id=7970. live.
  14. Web site: The weird world of the hapax legomenon | the Spectator. 2020-11-04. 2022-06-01. https://web.archive.org/web/20220601202346/https://www.spectator.co.uk/article/the-weird-world-of-the-hapax-legomenon/amp. live.
  15. Book: Pharr, Clyde. Homeric Greek, a book for beginners. 1920. D. C. Heath & Co., Publishers. xxii.
  16. Reece, Steve. "Hapax Legomena," in Margalit Finkelberg (ed.), Homeric Encyclopedia (Oxford: Blackwell, 2011) 330-331. Hapax Legomena in Homer
  17. (Il. 24.540)
  18. e.g. Richard Bauckham The Jewish world around the New Testament: collected essays I p431 2008: "a New Testament hapax, which occurs 19 times in Hermas. . ."
  19. John F. Walvoord and Roy B. Zuck, The Bible Knowledge Commentary: New Testament Edition, David C. Cook, 1983, page 860, .
  20. G. Klaffenbach, Lex de astynomis Pergamenorum (1954).
  21. The nature and function of water, baths, bathing, and hygiene from ... - Page 252 Cynthia Kosso, Anne Scott - 2009 "Günther Klaffenbach, "Die Astynomeninschrift von Pergamon," Abhandlungen der Deutschen Akademie der Wissenschaften zu Berlin. Klasse für Sprachen, Literatur und Kunst 6 (1953), 3–25 took charge of providing a full, yet strictly philological, commentary. "
  22. Book: Zuckermann, Ghil'ad . Ghil'ad Zuckermann . Revivalistics: From the Genesis of Israeli to Language Reclamation in Australia and Beyond . Oxford University Press . New York . 2020 . 9780199812790 . 2020-04-30 . 2020-05-05 . https://web.archive.org/web/20200505121004/https://global.oup.com/academic/product/revivalistics-9780199812790?lang=en&cc=us . live.
  23. "Ark, Design and Size" Aid to Bible Understanding, Watchtower Bible and Tract Society, 1971.
  24. Book: Blair, Judit M. . De-demonising the Old Testament: An Investigation of Azazel, Lilith, Deber, Qeteb and Reshef in the Hebrew Bible . Mohr Siebeck . 2009 . 9783161501319 . Tubingen, Germany . 92–95.
  25. http://plone.iti.mta.hu/rec.iti/Members/szerk/ghesaurus-1/GHESAURUS-SZINES.pdf Tanulmányok Szentmártoni Szabó Géza hatvanadik születésnapjára
  26. Web site: A turul-monda szövegkapcsolatai a középkori írásos hagyományunkban. In: Középkortörténeti tanulmányok 6. Szerk.: G. Tóth Péter, Szabó Pál. Szeged, 2010. 249-259. . Tibor . Szőcs .
  27. Web site: The Triads of Ireland. www.smo.uhi.ac.uk. 2019-01-28. 2016-04-09. https://web.archive.org/web/20160409215807/http://www.smo.uhi.ac.uk/gaeilge/donncha/focal/features/triads/triad169.html. live.
  28. Web site: attuiare in "Enciclopedia Dantesca". www.treccani.it. it-IT. 2019-01-28. 2018-11-17. https://web.archive.org/web/20181117233710/http://www.treccani.it/enciclopedia/attuiare_(Enciclopedia-Dantesca)/. live.
  29. Lewis, C.T. & Short, C. (1879) A Latin Dictionary, Oxford University, Clarendon Press, p.1599.
  30. Web site: Tertullian: De Pallio. 2015-11-28. 2016-03-04. https://web.archive.org/web/20160304120956/http://www.thelatinlibrary.com/tertullian/tertullian.pallio.shtml. live.
  31. Glare, P. G. W. (1968). Oxford Latin Dictionary. Oxford: Clarendon P., p. 611.
  32. Sblendorio Cugusi M. T. CLE 428 e lat. Eoigena. Studia philologica valentina, 2008, vol. 11, pp. 327–350. (in italian).
  33. Andrey Zaliznyak, Новгородская Русь по берестяным грамотам: взгляд из 2012 г. (The Novgorod Rus' according to its birch bark manuscripts: a view from 2012), transcript of a lecture.
  34. А. Л. Шилов (A.L. Shilov), ЭТНОНИМЫ И НЕСЛАВЯНСКИЕ АНТРОПОНИМЫ БЕРЕСТЯНЫХ ГРАМОТ (Ethnonyms and non-Slavic anthroponyms in birch bark manuscripts)
  35. Web site: HÁPAX .
  36. Rodríguez, Lola Pons. "Frecuencia lingüística y novedad gramatical. Propuestas sobre el hápax y las formas aisladas, con ejemplos del XV castellano." Iberoromania 2013, no. 78 (2013): 222-245.
  37. Web site: Hollis Frampton at IMDB . . 2014-04-14 . 2014-06-06 . https://web.archive.org/web/20140606223600/http://www.imdb.com/name/nm0289758/ . live .
  38. Web site: University Challenge winner Ted Loveday: I learned my answers on Wikipedia. 15 April 2015 . 2020-01-27. 2020-10-29. https://web.archive.org/web/20201029055438/https://www.telegraph.co.uk/news/bbc/11537719/University-Challenge-winner-Ted-Loveday-I-learned-my-answers-on-Wikipedia.html. live.
  39. Web site: This guy just won University Challenge with one ridiculous answer. 14 April 2015. 26 April 2017. 8 May 2017. https://web.archive.org/web/20170508121218/http://metro.co.uk/2015/04/14/this-guy-just-won-university-challenge-with-hapax-legomenon-and-an-awesome-knit-jumper-5149259/. live.
  40. Web site: 'Best ever' University Challenge contestant praised after super-fast answers. Daily Mirror. 14 April 2015. 27 January 2020. 27 January 2020. https://web.archive.org/web/20200127065535/https://www.mirror.co.uk/tv/tv-news/best-ever-university-challenge-contestant-5518411. live.
  41. http://sabotagetimes.com/funny/hapax-legonemum-a-second-by-second-deconstruction-of-that-loveday-vine sabotagetimes.com
  42. Archived at Ghostarchive and the Wayback Machine: Web site: September 15, 2015. The Zipf Mystery. Vsauce. Stevens. Michael. August 3, 2020. YouTube.
  43. Web site: Scroll origins - NetHack Wiki. 2021-02-01. 2021-02-08. https://web.archive.org/web/20210208130626/https://nethackwiki.com/wiki/Scroll_origins. live.