Diacritic Explained

A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (Greek, Ancient (to 1453);: diakritikós, "distinguishing"), from Greek, Ancient (to 1453);: διακρίνω (Greek, Ancient (to 1453);: diakrínō, "to distinguish"). The word diacritic is a noun, though it is sometimes used in an attributive sense, whereas diacritical is only an adjective. Some diacritics, such as the acute (ó), grave (ò), and circumflex (ô) (all shown above an 'o'), are often called accents. Diacritics may appear above or below a letter or in some other position such as within the letter or between two letters.

The main use of diacritics in Latin script is to change the sound-values of the letters to which they are added. Historically, English has used the diaeresis diacritic to indicate the correct pronunciation of ambiguous words, such as "coöperate", without which the letter sequence could be misinterpreted to be pronounced pronounced as //ˈkuːpəreɪt//. Other examples are the acute and grave accents, which can indicate that a vowel is to be pronounced differently than is normal in that position, for example not reduced to /ə/ or silent as in the case of the two uses of the letter e in the noun résumé (as opposed to the verb resume) and the help sometimes provided in the pronunciation of some words such as doggèd, learnèd, blessèd, and especially words pronounced differently than normal in poetry (for example movèd, breathèd).

Most other words with diacritics in English are borrowings from languages such as French to better preserve the spelling, such as the diaeresis on French: naïve and French: Noël, the acute from French: café, the circumflex in the word French: crêpe, and the cedille in French: façade. All these diacritics, however, are frequently omitted in writing, and English is the only major modern European language that does not have diacritics in common usage.

In Latin-script alphabets in other languages, diacritics may distinguish between homonyms, such as the French French: ("there") versus French: la ("the"), which are both pronounced pronounced as //la//. In Gaelic type, a dot over a consonant indicates lenition of the consonant in question. In other writing systems, diacritics may perform other functions. Vowel pointing systems, namely the Arabic harakat and the Hebrew niqqud systems, indicate vowels that are not conveyed by the basic alphabet. The Indic virama (  etc.) and the Arabic sukūn ( Arabic: ـْـ|rtl=yes ) mark the absence of vowels. Cantillation marks indicate prosody. Other uses include the Early Cyrillic titlo stroke ( ◌҃ ) and the Hebrew gershayim ( Hebrew: ״|rtl=yes ), which, respectively, mark abbreviations or acronyms, and Greek diacritical marks, which showed that letters of the alphabet were being used as numerals. In Vietnamese and the Hanyu Pinyin official romanization system for Mandarin in China, diacritics are used to mark the tones of the syllables in which the marked vowels occur.

In orthography and collation, a letter modified by a diacritic may be treated either as a new, distinct letter or as a letter–diacritic combination. This varies from language to language and may vary from case to case within a language.

In some cases, letters are used as "in-line diacritics", with the same function as ancillary glyphs, in that they modify the sound of the letter preceding them, as in the case of the "h" in the English pronunciation of "sh" and "th".[1] Such letter combinations are sometimes even collated as a single distinct letter. For example, the spelling sch was traditionally often treated as a separate letter in German. Words with that spelling were listed after all other words spelled with s in card catalogs in the Vienna public libraries, for example (before digitization).

Types

Among the types of diacritic used in alphabets based on the Latin script are:

acute (Latin: [[apex (diacritic)|apex]]); for example grave; for example circumflex; for example caron, wedge; for example double acute; for example double grave; for example – an overdot is used in many orthographies and transcriptions; for example – an underdot is also used in many orthographies and transcriptions; for example – a dot above right is used in Pe̍h-ōe-jī () are used for umlaut, diaeresis and others; (for example) ) are used in the International Phonetic Alphabet (IPA) and the ALA-LC romanization system breve; for example inverted breve; for example sicilicus, a palaeographic diacritic similar to a caron or brevetilde; for example titlo – a subscript vertical stroke is used in IPA to mark syllabicity and in German: [[Rheinische Dokumenta]]|italic=no to mark a schwa – a superscript vertical stroke is used in Pe̍h-ōe-jī macron; for example underbar vertical bar through the characterslash through the character; for example crossbar through the character overring: for example apostropheinverted apostrophe – reversed apostrophehook above (Vietnamese: dấu hỏi)horn (Vietnamese: dấu móc); for example – undercomma; for example cedilla; for example hook, left or right, sometimes superscriptogonek; for example double brevetie bar or top ligature – double circumflex – longum – double tilde – double cedilla – double ogonek – double diaeresis – double ypogegrammeni

The tilde, dot, comma, titlo, apostrophe, bar, and colon are sometimes diacritical marks, but also have other uses.

Not all diacritics occur adjacent to the letter they modify. In the Wali language of Ghana, for example, an apostrophe indicates a change of vowel quality, but occurs at the beginning of the word, as in the dialects ’Bulengee and ’Dolimi. Because of vowel harmony, all vowels in a word are affected, so the scope of the diacritic is the entire word. In abugida scripts, like those used to write Hindi and Thai, diacritics indicate vowels, and may occur above, below, before, after, or around the consonant letter they modify.

The tittle (dot) on the letter i or the letter j, of the Latin alphabet originated as a diacritic to clearly distinguish i from the minims (downstrokes) of adjacent letters. It first appeared in the 11th century in the sequence ii (as in Latin: ingeníí), then spread to i adjacent to m, n, u, and finally to all lowercase is. The j, originally a variant of i, inherited the tittle. The shape of the diacritic developed from initially resembling today's acute accent to a long flourish by the 15th century. With the advent of Roman type it was reduced to the round dot we have today.[2]

Several languages of eastern Europe use diacritics on both consonants and vowels, whereas in western Europe digraphs are more often used to change consonant sounds. Most languages in Europe use diacritics on vowels, aside from English where there are typically none (with some exceptions).

Diacritics specific to non-Latin alphabets

Arabic

Greek

These diacritics are used in addition to the acute, grave, and circumflex accents and the diaeresis:

iota subscript (Greek, Ancient (to 1453);: ᾳ, εͅ, ῃ, ιͅ, οͅ, υͅ, ῳ) rough breathing (Greek, Ancient (to 1453);: δασὺ πνεῦμα|dasỳ pneûma, Latin: spīritus asper): aspirationsmooth (or soft) breathing (Greek, Ancient (to 1453);: ψιλὸν πνεῦμα|psilòn pneûma, Latin: spīritus lēnis): lack of aspiration

Hebrew

Korean

The diacritics and  , known as Bangjeom (Korean: 방점; 傍點), were used to mark pitch accents in Hangul for Middle Korean. They were written to the left of a syllable in vertical writing and above a syllable in horizontal writing.

Syriac

In addition to the above vowel marks, transliteration of Syriac sometimes includes ə, or superscript e (or often nothing at all) to represent an original Aramaic schwa that became lost later on at some point in the development of Syriac.[3] Some transliteration schemes find its inclusion necessary for showing spirantization or for historical reasons.[4] [5]

Non-alphabetic scripts

Some non-alphabetic scripts also employ symbols that function essentially as diacritics.

Alphabetization or collation

See main article: article and Collation. Different languages use different rules to put diacritic characters in alphabetical order. French and Portuguese treat letters with diacritical marks the same as the underlying letter for purposes of ordering and dictionaries.

The Scandinavian languages and the Finnish language, by contrast, treat the characters with diacritics å, ä, and ö as distinct letters of the alphabet, and sort them after z. Usually ä (a-umlaut) and ö (o-umlaut) [used in Swedish and Finnish] are sorted as equivalent to æ (ash) and ø (o-slash) [used in Danish and Norwegian]. Also, aa, when used as an alternative spelling to å, is sorted as such. Other letters modified by diacritics are treated as variants of the underlying letter, with the exception that ü is frequently sorted as y.

Languages that treat accented letters as variants of the underlying letter usually alphabetize words with such symbols immediately after similar unmarked words. For instance, in German where two words differ only by an umlaut, the word without it is sorted first in German dictionaries (e.g. schon and then schön, or fallen and then fällen). However, when names are concerned (e.g. in phone books or in author catalogues in libraries), umlauts are often treated as combinations of the vowel with a suffixed e; Austrian phone books now treat characters with umlauts as separate letters (immediately following the underlying vowel).

In Spanish, the grapheme ñ is considered a new letter different from n and collated between n and o, as it denotes a different sound from that of a plain n. But the accented vowels á, é, í, ó, ú are not separated from the unaccented vowels a, e, i, o, u, as the acute accent in Spanish only modifies stress within the word or denotes a distinction between homonyms, and does not modify the sound of a letter.

For a comprehensive list of the collating orders in various languages, see Collating sequence.

Generation with computers

Modern computer technology was developed mostly in English-speaking countries, so data formats, keyboard layouts, etc. were developed with a bias favoring English, a language with an alphabet without diacritical marks. Efforts have been made to create internationalized domain names that further extend the English alphabet (e.g., "pokémon.com").

Depending on the keyboard layout, which differs amongst countries, it is more or less easy to enter letters with diacritics on computers and typewriters. Some have their own keys; some are created by first pressing the key with the diacritic mark followed by the letter to place it on. Such a key is sometimes referred to as a dead key, as it produces no output of its own but modifies the output of the key pressed after it.

In modern Microsoft Windows and Linux operating systems, the keyboard layouts US International and UK International feature dead keys that allow one to type Latin letters with the acute, grave, circumflex, diaeresis/umlaut, tilde, and cedilla found in Western European languages (specifically, those combinations found in the ISO Latin-1 character set) directly: + gives ë, + gives õ, etc. On Apple Macintosh computers, there are keyboard shortcuts for the most common diacritics; followed by a vowel places an acute accent, followed by a vowel gives an umlaut, gives a cedilla, etc. Diacritics can be composed in most X Window System keyboard layouts, as well as other operating systems, such as Microsoft Windows, using additional software.

On computers, the availability of code pages determines whether one can use certain diacritics. Unicode solves this problem by assigning every known character its own code; if this code is known, most modern computer systems provide a method to input it. With Unicode, it is also possible to combine diacritical marks with most characters. However, as of 2019, very few fonts include the necessary support to correctly render character-plus-diacritic(s) for the Latin, Cyrillic and some other alphabets (exceptions include Andika).

Languages with letters containing diacritics

The following languages have letters with diacritics that are orthographically distinct from those without diacritics.

Latin script

Baltic

Celtic

Finno-Ugric

Germanic

Romance

Slavic

Turkic

Other

Cyrillic letters

Diacritics that do not produce new letters

English

See main article: article and English terms with diacritical marks. English is one of the few European languages that does not have many words that contain diacritical marks. Instead, digraphs are the main way the Modern English alphabet adapts the Latin to its phonemes. Exceptions are unassimilated foreign loanwords, including borrowings from French (and, increasingly, Spanish, like jalapeño and piñata); however, the diacritic is also sometimes omitted from such words. Loanwords that frequently appear with the diacritic in English include café, résumé or resumé (a usage that helps distinguish it from the verb resume), soufflé, and naïveté (see English terms with diacritical marks). In older practice (and even among some orthographically-conservative modern writers), one may see examples such as élite, mêlée and rôle.

English speakers and writers once used the diaeresis more often than now in words such as coöperation (from Fr. coopération), zoölogy (from Grk. zoologia), and seeër (now more commonly see-er or simply seer) as a way of indicating that adjacent vowels belonged to separate syllables, but this practice has become far less common. The New Yorker magazine is a major publication that continues to use the diaeresis in place of a hyphen for clarity and economy of space.[11]

A few English words, often when used out of context, especially in isolation, can only be distinguished from other words of the same spelling by using a diacritic or modified letter. These include exposé, lamé, maté, öre, øre, résumé and rosé. In a few words, diacritics that did not exist in the original have been added for disambiguation, as in maté (from Sp. and Port. mate), saké (the standard Romanization of the Japanese has no accent mark), and Malé (from Dhivehi މާލެ), to clearly distinguish them from the English words mate, sake, and male.

The acute and grave accents are occasionally used in poetry and lyrics: the acute to indicate stress overtly where it might be ambiguous (rébel vs. rebél) or nonstandard for metrical reasons (caléndar), the grave to indicate that an ordinarily silent or elided syllable is pronounced (warnèd, parlìament).

In certain personal names such as Renée and Zoë, often two spellings exist, and the person's own preference will be known only to those close to them. Even when the name of a person is spelled with a diacritic, like Charlotte Brontë, this may be dropped in English-language articles, and even in official documents such as passports, due either to carelessness, the typist not knowing how to enter letters with diacritical marks, or technical reasons (California, for example, does not allow names with diacritics, as the computer system cannot process such characters). They also appear in some worldwide company names and/or trademarks, such as Nestlé and Citroën.

Other languages

The following languages have letter-diacritic combinations that are not considered independent letters.

Transliteration

Several languages that are not written with the Roman alphabet are transliterated, or romanized, using diacritics. Examples:

Limits

Orthographic

Possibly the greatest number of combining diacritics required to compose a valid character in any Unicode language is 8, for the "well-known grapheme cluster in Tibetan and Ranjana scripts" or Tibetan: HAKṢHMALAWARAYAṀ.[13]

It consists of

An example of rendering, may be broken depending on browser:

Unorthographic/ornamental

Some users have explored the limits of rendering in web browsers and other software by "decorating" words with excessive nonsensical diacritics per character to produce so-called Zalgo text.

List of diacritics in Unicode

Diacritics for Latin script in Unicode:

See also

External links

Notes and References

  1. Book: Sweet, Henry . 1877 . A Handbook of Phonetics . 174–175 . Even letters with accents and diacritics [...] being only cast for a few founts, act practically as new letters. [...] We may consider the h in sh and th simply as a diacritic written for convenience on a line with the letter it modifies. . Cambridge University Press . Cambridge.
  2. Oxford English Dictionary
  3. [Eberhard Nestle|Nestle, Eberhard]
  4. Coakley, J. F. (2002). Robinson's Paradigms and Exercises in Syriac Grammar (5th ed.). Oxford University Press. .
  5. Michaelis, Ioannis Davidis (1784). Grammatica Syriaca.
  6. Book: Gramática de la Llingua Asturiana . 2011-06-07 . dead . https://web.archive.org/web/20110525120027/http://www.academiadelallingua.com/diccionariu/gramatica_llingua.pdf . 2011-05-25 . Academia de la Llingua Asturiana . 3rd . 2001 . 84-8168-310-8 . section 1.2.
  7. http://www.juls.savba.sk/ediela/psp2000/psp.pdf page 12, section I.2
  8. Grønlands sprognævn (1992)
  9. Petersen (1990)
  10. S.P. Brock, "An Introduction to Syriac Studies", in J.H. Eaton (Ed.,), Horizons in Semitic Studies (1980)
  11. Norris. Mary. The Curse of the Diaeresis. The New Yorker. 26 April 2012. 18 April 2014.
  12. Book: van Geloven, Sander. Diakritische tekens in het Nederlands. Utrecht. Hellebaard. 2012. nl. dead. https://web.archive.org/web/20131029192341/http://hellebaard.nl/publicaties/poster/poster-diakritische-tekens-in-het-nederlands-4-stuks/. 2013-10-29.
  13. Web site: Most combining characters in a Unicode glyph/character/whatever. 2010-01-25. 2019-11-25. Steele. Shawn. https://web.archive.org/web/20190516190627/https://blogs.msdn.microsoft.com/shawnste/2010/01/25/most-combining-characters-in-a-unicode-glyphcharacterwhatever/. 2019-05-16. live. Microsoft.