Unicode subscripts and superscripts explained

Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals.[1] These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.

The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters:

When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts […] However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription.[2]

Uses

The intended use[2] when these characters were added to Unicode was to produce true superscripts and subscripts so that chemical and algebraic formulas could be written without markup. Thus "H₂O" (using a subscript 2 character) is supposed to be identical to "H2O" (with subscript markup).

In reality, many fonts that include these characters ignore the Unicode definition, and instead design the digits for mathematical numerator and denominator glyphs,[3] [4] which are aligned with the cap line and the baseline, respectively. When used with the solidus or the Fraction Slash, they produce an almost typographically correct diagonal fraction, such as ³/₄ for the ¾ glyph. Super and subscript markup does not produce a correct fraction (compare markup 3/4 with precomposed ¾). The change also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters.

Unicode intended that diagonal fractions be rendered by a different mechanism: the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts), it instructs the layout system that a fraction such as ¾ is to be rendered using automatic glyph substitution.[5] [6] User-end support was quite poor for a number of years, but fonts,[7] browsers,[8] word processors,[9] desktop publishing software[10] and others increasingly support the intended Unicode behavior. This browser and your default font render it as 3⁄4. (See Slash (punctuation)#Fractions for rendering in various other fonts.)

Superscripts and subscripts block

The most common superscript digits (1, 2, and 3) were included in ISO-8859-1 and were therefore carried over into those code points in the Latin-1 range of Unicode. The remainder were placed along with basic arithmetical symbols, and later some Latin subscripts, in a dedicated block at to U+209F. The table below shows these characters together. Each superscript or subscript character is preceded by a baseline x to show the height of subscripting/superscripting.

Unicode characters
- ! 0 1 2 3 4 5 6 7 8 9 A B C D E F - ! U+00Bx - ! U+207x x⁰ xⁱ x⁴ x⁵ x⁶ x⁷ x⁸ x⁹ x⁺ x⁻ x⁼ x⁽ x⁾ xⁿ - ! U+208x x₀ x₁ x₂ x₃ x₄ x₅ x₆ x₇ x₈ x₉ x₊ x₋ x₌ x₍ x₎ - ! U+209x xₐ xₑ xₒ xₓ xₔ xₕ xₖ xₗ xₘ xₙ xₚ xₛ xₜ x₝ x₞ x₟ -

Other superscript and subscript characters

Unicode also includes codepoints for subscript and superscript characters that are intended for semantic usage, in the following blocks:[1] [11]

Superscript
Combining superscript
Subscript
Combining subscript

Latin, Greek, Cyrillic, and IPA tables

Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin, Greek and Cyrillic letters. Here they are arranged in alphabetical order for comparison (or for copy and paste convenience). Since these characters appear in different Unicode ranges, they may not appear to be the same size or position due to font substitution by the browser. Shaded cells mark petite capitals that are not very distinct from minuscules, and Greek letters that are indistinguishable from Latin, and so would not be expected to be supported by Unicode.

Little punctuation is encoded. Parentheses are shown in the basic superscript block above, and the exclamation mark (IPA|ꜝ) is shown in the IPA table below. In a supporting font, a question mark may be created with a superscript gelded question mark and a combining dot below: (IPA|ˀ̣).

Latin superscript and subscript letters
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Superscript capitalᴿ
Superscript petite cap𐞄𐞒𐞖𐞪𐞲
Superscript minusculeʰʲˡ𐞥ʳˢʷˣʸ
Overscript small cap◌ᷛ ◌ᷞ◌ᷟ◌ᷡ◌ᷢ
Overscript minuscule◌ͣ ◌ᷨ◌ͨ ◌ͩ ◌ͤ ◌ᷫ◌ᷚ◌ͪ ◌ͥ ◌ᷜ◌ᷝ◌ͫ ◌ᷠ◌ͦ ◌ᷮ◌ͬ ◌ᷤ◌ͭ ◌ͧ ◌ͮ ◌ᷱ◌ͯ◌ᷦ
Subscript minuscule
Underscript minuscule◌᷊ ◌ᪿ

*Superscript versions of S, of petite capital A, D, E and P, of ƀ, and subscript versions of w, y and z have been proposed for a future version of the Unicode Standard.[13] [14] [15] [16] [14]

Æ Ƀ Ǝ Ŋ
Superscript capital
Superscript minuscule
Overscript minuscule◌ᷔ ◌ᷪ
Subscript minuscule

Some of these superscript capitals are small caps in the source documents in the Unicode proposals.

Greek superscript and subscript letters
ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩ
Superscript minusculeᶿ
Overscript minuscule◌ᷩ
Subscript minusculeͺ[17]
Underscript minuscule◌ͅ ◌̫[18]
*Superscript versons of Greek psi and omega have been proposed for a future version of the Unicode Standard.
Ҫ
Superscript ̈
Overscript ◌ⷶ ◌ⷠ ◌ⷡ ◌ⷢ ◌ⷣ ◌ⷷ ◌ꙴ ◌ⷤ ◌ⷥ ◌ꙵ ◌𞂏 ◌ꙶ ◌ⷦ ◌ⷧ ◌ⷨ ◌ⷩ ◌ⷪ ◌ⷫ ◌ⷬ ◌ⷭ
Subscript ̈
Ӏ
Superscript
Overscript ◌ⷮ ◌ꙷ ◌ⷹ ◌ꚞ ◌ⷯ ◌ꙻ ◌ⷰ ◌ⷱ ◌ⷲ ◌ⷳ ◌ꙸ ◌ꙹ ◌ꙺ ◌ⷺ ◌ⷻ ◌ⷼ ◌ꚟ ◌ⷽ ◌ⷾ ◌ⷿ ◌ⷴ
Subscript

Superscript ї, й, ў etc. are handled with diacritics, . Many of the Cyrillic characters were added to the Cyrillic Extended-D block, which was added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023.

See also small caps in Unicode.

The Latin Extended-F block was created for the remaining superscript IPA letters. They are supported by the free Gentium Plus and Andika fonts. Additional superscript characters for historical and para-IPA letters have been proposed for future versions of the Unicode Standard.

Consonant letters

The Unicode characters for superscript (modifier) IPA and extIPA consonant letters are as follows. The entire Latin Extended-F block is dedicated to superscript IPA. Characters for sounds with secondary articulation are set off in parentheses and placed below the base letters.

IPA and extIPA consonants, along with superscript variants and their Unicode code points
BilabialLabiodentalDentalAlveolarPostalveolarRetroflexPalatalVelarUvularPharyngealGlottal
Nasalpronounced as /m ᵐ/
1D50
pronounced as /ɱ ᶬ/
1DAC
pronounced as /n ⁿ/
207F
 
 
pronounced as /ɳ ᶯ/
1DAF
pronounced as /ɲ ᶮ/
1DAE
pronounced as /ŋ ᵑ/
1D51
pronounced as /ɴ ᶰ/
1DB0
Plosivepronounced as /p ᵖ/
1D56
pronounced as /b ᵇ/
1D47
pronounced as /t ᵗ/
1D57

1DB5
pronounced as /d ᵈ/
1D48
 
 
 
 
pronounced as /ʈ 𐞯/
107AF
pronounced as /ɖ 𐞋/
1078B
pronounced as /c ᶜ/
1D9C
pronounced as /ɟ ᶡ/
1DA1
pronounced as /k ᵏ/
1D4F
pronounced as /ɡ ᶢ//pronounced as /g ᵍ/
1DA2/1D4D
pronounced as /q 𐞥/
107A5
pronounced as /ɢ 𐞒/
10792
pronounced as /ʡ 𐞳/
107B3
pronounced as /ʔ ˀ/
02C0
Affricatepronounced as /ʦ 𐞬/
107AC
pronounced as /ʣ 𐞇/
10787
pronounced as /ʧ 𐞮/
107AE

107AB
pronounced as /ʤ 𐞊/
1078A

10789
pronounced as /ꭧ 𐞭/
107AD
pronounced as /ꭦ 𐞈/
10788
Fricativepronounced as /ɸ ᶲ/
1DB2
pronounced as /β ᵝ/
1D5D
pronounced as /f ᶠ/
1DA0
pronounced as /v ᵛ/
1D5B
pronounced as /θ ᶿ/
1DBF
pronounced as /ð ᶞ/
1D9E
pronounced as /s ˢ/
02E2
pronounced as /z ᶻ/
1DBB
pronounced as /ʃ ᶴ/
1DB4

1D9D
pronounced as /ʒ ᶾ/
1DBE

1DBD
pronounced as /ʂ ᶳ/
1DB3
pronounced as /ʐ ᶼ/
1DBC
pronounced as /ç ᶜ̧/
1D9C + 0327[19]
pronounced as /ʝ ᶨ/
1DA8
pronounced as /x ˣ/
02E3
(pronounced as /ɧ 𐞗/)
10797
pronounced as /ɣ ˠ/
02E0
pronounced as /χ ᵡ/
1D61
pronounced as /ʁ ʶ/
02B6
pronounced as /ħ 𐞕/
10795
(pronounced as /ʩ 𐞐/)
10790
pronounced as /ʕ ˤ/
[20]
pronounced as /h ʰ/
02B0
pronounced as /ɦ ʱ/
02B1
Approximantpronounced as /ʋ ᶹ/
1DB9
pronounced as /ɹ ʴ/
02B4
pronounced as /ɻ ʵ/
02B5
pronounced as /j ʲ/
02B2
(pronounced as /ɥ ᶣ/)
1DA3
 
 
(pronounced as /ʍ ꭩ/)
AB69
pronounced as /ɰ ᶭ/
1DAD
(pronounced as /w ʷ/)
02B7
Tap/flappronounced as /ⱱ 𐞰/
107B0
pronounced as /ɾ 𐞩/
107A9
pronounced as /ɽ 𐞨/
107A8
Trillpronounced as /ʙ 𐞄/
10784
pronounced as /r ʳ/
02B3
pronounced as /ʀ 𐞪/
107AA
pronounced as /ʜ 𐞖/
10796
pronounced as /ʢ 𐞴/
107B4
Lateral fricativepronounced as /ɬ 𐞛/
1079B
(pronounced as /ʪ 𐞙/)
10799
pronounced as /ɮ 𐞞/
1079E
(pronounced as /ʫ 𐞚/)
1079A
pronounced as /ꞎ 𐞝/
1079D
pronounced as /𝼅 𐞟/
1079F
pronounced as /𝼆 𐞡/
107A1
pronounced as /𝼄 𐞜/
1079C
Lateral approximantpronounced as /l ˡ/
02E1
(pronounced as /ᶅ ᶪ/)
1DAA
 
 
pronounced as /ɭ ᶩ/
1DA9
pronounced as /ʎ 𐞠/
107A0
pronounced as /ʟ ᶫ/
1DAB
(pronounced as /ɫ ꭞ/)[21]
AB5E
Lateral tap/flappronounced as /ɺ 𐞦/
107A6
pronounced as /𝼈 𐞧/
107A7
Implosivepronounced as /ƥ/pronounced as /ɓ 𐞅/
10785
pronounced as /ƭ/pronounced as /ɗ 𐞌/
1078C
pronounced as /ᶑ 𐞍/
1078D
pronounced as /ƈ/pronounced as /ʄ 𐞘/
10798
pronounced as /ƙ/pronounced as /ɠ 𐞓/
10793
pronounced as /ʠ/pronounced as /ʛ 𐞔/
10794
Click releasepronounced as /ʘ 𐞵/
107B5
pronounced as /ǀ 𐞶/
107B6
pronounced as /ʇ/pronounced as /ǃ ꜝ/
A71D
pronounced as /ʗ/pronounced as /𝼊 𐞹/
107B9
pronounced as /ψ/pronounced as /ǂ 𐞸/
107B8
(pronounced as /ʞ/)
Lateral click
release
pronounced as /ǁ 𐞷/
107B7
pronounced as /ʖ/
Percussivepronounced as /¡ ꜞ/
A71E[22]

The spacing diacritic for ejective consonants, U+2BC, works with superscript letters despite not being superscript itself: (IPA|ᵖʼ ᵗʼ ᶜʼ ᵏˣʼ). If a distinction needs to be made, the combining apostrophe U+315 may be used: (IPA|ᵖ̕ ᵗ̕ ᶜ̕ ᵏˣ̕). The spacing diacritic should be used for a baseline letter with a superscript release, such as pronounced as /[tˢʼ]/ or pronounced as /[kˣʼ]/, where the scope of the apostrophe includes the non-superscript letter, but the combining apostrophe U+315 might be used to indicate a weakly articulated ejective consonant like pronounced as /[ᵗ̕]/ or pronounced as /[ᵏ̕]/, where the whole consonant is written as a superscript, or together with U+2BC when separate apostrophes have scope over the base and modifier letters, as in (IPA|pʼᵏˣ̕).[23]

Spacing diacritics, as in (IPA|tʲ), cannot be secondarily superscripted in plain text: (IPA|ᵗʲ). (In this instance, the old IPA letter for pronounced as /[tʲ]/, (IPA|ƫ), has a superscript variant in Unicode, U+1DB5 (IPA|ᶵ), but that is not generally the case.)

Among older letters, (IPA|ꜧ) (U+A727) was a graphic variant of (IPA|ɮ). Its superscript is supported at (IPA|ꭜ) (U+AB5C). The most common letters with palatal hook are also supported; they are displayed in the table above. IPA once had an idiosyncratic curl on some of the palatalized letters: these are the fricative letters (IPA|ʆ ʓ). Their superscript forms have been proposed for a future version of the Unicode Standard.[16] [14] The retired letters (IPA|ƞ) and (IPA|ɼ) have also been proposed for a future version of the Unicode Standard.[16] [14]

Among para-IPA letters, superscript Sinological (IPA|ȡ ȴ ȵ ȶ) have been proposed for a future version of the Unicode Standard.[15] [14] Superscripts of the Bantuist labio-dental plosives (IPA|ȹ) and (IPA|ȸ) have been proposed for a future version of the Unicode Standard.[15] [14] The central semivowels (IPA|ɉ), pronounced as /ɥ̶/, and pronounced as /w̶/ have also been proposed for a future version of the Unicode Standard.[15] [14]

Old-style click letters have been proposed for a future version of the Unicode Standard.[24] [14]

Vowel letters

The Unicode characters for superscript (modifier) IPA vowel letters, plus a pair of extended letters (IPA|ᵻ ᵿ) found in English dictionaries, are as follows. Recently retired alternative letters such as (IPA|ɩ ɷ) are also supported; they are set off in parentheses and placed below the standard IPA letters:

IPA vowels and superscript variants
FrontCentralBack
Closepronounced as /i ⁱ/
2071
pronounced as /y ʸ/
02B8
pronounced as /ɨ ᶤ/
1DA4
pronounced as /ʉ ᶶ/
1DB6
pronounced as /ɯ ᵚ/
1D5A
pronounced as /u ᵘ/
1D58
Near-closepronounced as /ɪ ᶦ/
1DA6
(pronounced as /ɩ ᶥ/)
1DA5
pronounced as /ʏ 𐞲/
107B2




(pronounced as /ᵻ ᶧ/)
1DA7


(pronounced as /ᵿ/)



(pronounced as /ω/)

pronounced as /ʊ ᶷ/
1DB7
(pronounced as /ɷ /)
107A4
Close-midpronounced as /e ᵉ/
1D49
pronounced as /ø 𐞢/
107A2
pronounced as /ɘ 𐞎/
1078E
pronounced as /ɵ ᶱ/
1DB1
pronounced as /ɤ 𐞑/
10791
pronounced as /o ᵒ/
1D52
Midpronounced as /ə ᵊ/
1D4A
Open-midpronounced as /ɛ ᵋ/
1D4B
pronounced as /œ ꟹ/
A7F9
pronounced as /ɜ ᶟ/
1D9F
(pronounced as /ᴈ ᵌ/)
1D4C
pronounced as /ɞ 𐞏/
1078F
pronounced as /ʌ ᶺ/
1DBA
pronounced as /ɔ ᵓ/
1D53
Near-openpronounced as /æ 𐞃/
10783
pronounced as /ɶ 𐞣/
107A3
pronounced as /ɐ ᵄ/
1D44
pronounced as /ɑ ᵅ/
1D45
pronounced as /ɒ ᶛ/
1D9B
Openpronounced as /a ᵃ/
1D43

The precomposed Unicode rhotic vowel letters (IPA|ɚ ɝ) are not directly supported. The rhotic diacritic U+02DE pronounced as /◌˞/ should be used instead: (IPA|ᵊ˞ ᶟ˞).[25]

(IPA|ɜ) and (IPA|ᶟ) are reversed pronounced as /ɛ/. The older IPA turned pronounced as /ɛ/, (IPA|ᴈ), is also supported, at U+1D4C (IPA|ᵌ). However, the briefly resurrected vowel letter (IPA|ʚ) (U+029A) is not supported, only its reversed replacement (IPA|ɞ) is.

Among older letters, (IPA|ᴜ) (U+1D1C), a graphic variant of (IPA|ʊ), is supported at (IPA|ᶸ) (U+1DB8)[26] .

Among para-IPA letters, Sinological superscript (IPA|ɿ ʅ ʮ ʯ ) have been proposed for a future version of the Unicode Standard.[15] [14] [27]

Length marks

The two length marks are also supported:

These are used to add length to another superscript, such as (IPA|Cʰ) or (IPA|Cʰ) for long aspiration.

Wildcards

Superscript wildcards (full caps) are largely supported: e.g. pronounced as /ᴺC/ (prenasalized consonant), pronounced as /ꟲN/ (prestopped nasal), pronounced as /Pꟳ/ (fricative release), pronounced as /NᴾF/ (epenthetic plosive), pronounced as /CVNᵀ/ (tone-bearing syllable), pronounced as /Cᴸ/ (liquid or lateral release), pronounced as /Cᴿ/ (rhotic or resonant release), pronounced as /Vᴳ/ (off-glide/diphthong), pronounced as /Cⱽ/ (fleeting vowel). Superscript pronounced as /S/ for sibilant release has been proposed for a future version of the Unicode Standard;[27] [28] superscript pronounced as /Ʞ/ for fleeting/epenthetic click has not. Other basic Latin superscript wildcards for tone and weak indeterminate sounds, as described in the article on the International Phonetic Alphabet, are mostly supported. (See table in previous section.)

Combining marks and subscripts

In addition, a very few IPA letters beyond the basic Latin alphabet have combining forms or are supported as subscripts:

ä ɑ æ βç ð ə ʃ ʍ χʔ ʼ
Overscript◌ᷲ◌ᷧ◌ᷔ◌ᷩ◌ᷗ◌ᷙ◌ᷪ◌ᷯ◌̉[29] ◌̓
Subscript
Underscript◌ᫀ◌̦

Composite characters

Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols.[1] In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.

References

Notes and References

  1. Web site: UCD: UnicodeData.txt. The Unicode Standard. May 14, 2016.
  2. Web site: Unicode in XML and other Markup Languages . Martin Dürst, Asmus Freytag . 16 May 2007 . W3C . 13 September 2010.
  3. Web site: December 27, 2021. fraction Dart Package . September 21, 2022 . Dart packages.
  4. Web site: March 30, 2021 . MathML General layout elements Fractions . https://web.archive.org/web/20210128015254/https://www.data2type.de/xml-xslt-xslfo/math-ml/presentation-markup/layout-elements/fractions/ . January 28, 2021 . dead. January 13, 2022 . data2type GmbH . de-DE.
  5. Web site: Fraction Slash . Martin Dürst, Asmus Freytag . May 16, 2007 . W3C . September 13, 2010.
  6. For a general overview and technical information on glyph substitution (though not specifically for fractions), see GSUB — Glyph Substitution Table in the OpenType specification on the Microsoft Typography site.
  7. Such as Andika, Arno Pro, Brill, Brioso Pro, Calibri, Candara, Carlito, Cantarell, FiraGO, EB Garamond, Gentium Book, Lato, Linux Libertine, Noto Sans, Noto Serif, Open Sans and Yrsa
  8. Such as Chrome, Firefox and Falkon
  9. Such as LibreOffice Writer
  10. Such as Adobe InDesign and Scribus
  11. Web site: UCD: Scripts.txt. The Unicode Standard . September 21, 2022.
  12. Web site: L2/20-268: Revised proposal to add ten characters for Middle English to the UCS . October 5, 2020 . Michael . Everson . Andrew . West.
  13. Web site: L2/24-081: Unicode request for modifier capital S . Kirk Miller. January 30, 2024.
  14. Web site: Proposed New Characters: Pipeline Table . . September 10, 2024 . September 21, 2024 .
  15. Web site: L2/24-147: Modifier Sinological extensions to the IPA . Kirk Miller. June 14, 2024.
  16. Web site: L2/24-171: Miscellaneous historical and para-IPA modifier letters . Kirk Miller . June 6, 2024.
  17. (IPA|ͺ) is set lower than a normal subscript. It is equivalent to underscript (IPA|◌ͅ) on a space.
  18. (IPA|◌̫) is traditionally typeset as an omega.
  19. Superscript (IPA|ç) is composed of superscript pronounced as /c/ and a combining cedilla, which should display properly in a good font. Superscript c was specifically requested for this purpose in Unicode proposal L2/03-180.
  20. is the superscript variant of and is defined for IPA use. The similar character is a reversed, perhaps a gelded reversed question mark. Fonts are inconsistent in whether they look different and what the difference is.
  21. In Microsoft fonts, superscript (IPA|ɫ) was erroneously designed as a superscript (IPA|ꬸ).
  22. U+A71D (IPA|ꜝ) and A71E (IPA|ꜞ) were adopted as the Africanist equivalents of the IPA characters (IPA|ꜜ) downstep and (IPA|ꜛ) upstep. The correspondence of U+A71D (IPA|ꜝ) to the IPA click letter (IPA|ǃ) is thus accidental. Coincidentally, U+A71E (IPA|ꜞ) serves as the superscript variant of the extIPA percussive consonant (IPA|¡); the other percussive letters, (IPA|ʬ) and (IPA|ʭ), do not have superscript support in Unicode.
  23. Kirk Miller & Michael Ashby, L2/20-253R Unicode request for IPA modifier letters (b), non-pulmonic.
  24. Web site: L2/24-052R: Unicode request for modifier pre-Kiel click letters. Kirk Miller. April 26, 2024.
  25. Kirk Miller & Michael Ashby, L2/20-252R Unicode request for IPA modifier-letters (a), pulmonic
  26. Web site: L2/24-081: Latin Phonetic The for Middle Tilde . Kirk Miller. January 30, 2024.
  27. Web site: L2/24-081: Latin Phonetic Trill and Small Capital . Kirk Miller. January 30, 2024.
  28. Web site: Proposed New Characters: Pipeline Table . . September 10, 2024 . September 21, 2024 .
  29. This is actually the Vietnamese diacritic dấu hỏi, not specifically IPA, but graphically both are gelded question marks.
  30. Web site: L2/17-066R: Proposal to encode the Marca Registrada sign. March 1, 2017. Eduardo Marín. Silva.