Unicode has subscripted and superscripted versions of a number of characters including a full set of Arabic numerals.[1] These characters allow any polynomial, chemical and certain other equations to be represented in plain text without using any form of markup like HTML or TeX.
The World Wide Web Consortium and the Unicode Consortium have made recommendations on the choice between using markup and using superscript and subscript characters:
When used in mathematical context (MathML) it is recommended to consistently use style markup for superscripts and subscripts […] However, when super and sub-scripts are to reflect semantic distinctions, it is easier to work with these meanings encoded in text rather than markup, for example, in phonetic or phonemic transcription.[2]
The intended use[2] when these characters were added to Unicode was to produce true superscripts and subscripts so that chemical and algebraic formulas could be written without markup. Thus "H₂O" (using a subscript 2 character) is supposed to be identical to "H2O" (with subscript markup).
In reality, many fonts that include these characters ignore the Unicode definition, and instead design the digits for mathematical numerator and denominator glyphs,[3] [4] which are aligned with the cap line and the baseline, respectively. When used with the solidus, these glyphs are a common substitute for diagonal fractions, such as ³/₄ for the ¾ glyph. This change was made because using markup does not give a good graphic approximation of fractions (compare markup 3/4 with super/sub-script ³/₄). The change also makes the superscript letters useful for ordinal indicators, more closely matching the ª and º characters. However, it makes them incorrect for normal superscript and subscript, and so chemical and algebraic formulas are better rendered by using markup.
Unicode intended that diagonal fractions be rendered by a different mechanism: the fraction slash U+2044 is visually similar to the solidus, but when used with the ordinary digits (not the superscripts and subscripts), it instructs the layout system that a fraction such as ¾ is to be rendered using automatic glyph substitution.[5] [6] User-end support was quite poor for a number of years, but fonts, browsers,[7] word processors,[8] desktop publishing software[9] and others increasingly support the intended Unicode behavior.
A selection of supporting fonts is displayed in the table below. (These will not display properly if you do not have the fonts installed, or if your browser does not support this behavior.)
½ | 1⁄2 | C¹ C₂ | ||
Andika | ½ | 1⁄2 | C¹ C₂ | |
Arno Pro | ½ | 1⁄2 | C¹ C₂ | |
URW Bookman | ½ | 1⁄2 | C¹ C₂ | |
Brill | ½ | 1⁄2 | C¹ C₂ | |
Brioso Pro | ½ | 1⁄2 | C¹ C₂ | |
Calibri | ½ | 1⁄2 | C¹ C₂ | |
Candara | ½ | 1⁄2 | C¹ C₂ | |
Carlito | ½ | 1⁄2 | C¹ C₂ | |
Cantarell | ½ | 1⁄2 | C¹ C₂ | |
FiraGO | ½ | 1⁄2 | C¹ C₂ | |
EB Garamond | ½ | 1⁄2 | C¹ C₂ | |
Gentium Book | ½ | 1⁄2 | C¹ C₂ | |
URW Gothic | ½ | 1⁄2 | C¹ C₂ | |
Lato | ½ | 1⁄2 | C¹ C₂ | |
Linux Libertine | ½ | 1⁄2 | C¹ C₂ | |
Nimbus Roman | ½ | 1⁄2 | C¹ C₂ | |
Nimbus Sans | ½ | 1⁄2 | C¹ C₂ | |
Noto Sans | ½ | 1⁄2 | C¹ C₂ | |
Noto Serif | ½ | 1⁄2 | C¹ C₂ | |
Open Sans | ½ | 1⁄2 | C¹ C₂ | |
Yrsa | ½ | 1⁄2 | C¹ C₂ |
See main article: Superscripts and Subscripts (Unicode block). The most common superscript digits (1, 2, and 3) were in ISO-8859-1 and were therefore carried over into those positions in the Latin-1 range of Unicode. The rest were placed in a dedicated section of Unicode at to U+209F. The two tables below show these characters. Each superscript or subscript character is preceded by a normal x to show the subscripting/superscripting. The table on the left contains the actual Unicode characters; the one on the right contains the equivalents using HTML markup for the subscript or superscript.
-- outer table --> |
|
|
---|
Unicode version version=15.1 also includes subscript and superscript characters that are intended for semantic usage, in the following blocks:[1] [10]
Consolidated, the Unicode standard contains superscript and subscript versions of a subset of Latin, Greek and Cyrillic letters. Here they are arranged in alphabetical order for comparison (or for copy and paste convenience). Since these characters appear in different Unicode ranges, they may not appear to be the same size or position due to font substitution in the browser. Shaded cells mark small capitals that are not very distinct from minuscules, and Greek letters that are indistinguishable from Latin, and so would not be expected to be supported by Unicode.
Little punctuation is encoded. Parentheses and the exclamation mark are shown above. A question mark may be created with a superscript gelded question mark and a combining dot: (IPA|ˀ̣), although some fonts do not render it properly.
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | ||
Superscript capital | ᴬ | ᴮ | ꟲ | ᴰ | ᴱ | ꟳ | ᴳ | ᴴ | ᴵ | ᴶ | ᴷ | ᴸ | ᴹ | ᴺ | ᴼ | ᴾ | ꟴ | ᴿ | ꟱[12] | ᵀ | ᵁ | ⱽ | ᵂ | ||||
Superscript small cap | 𝿩[13] | 𐞄 | 𝿨[14] | 𝿺 | 𐞒 | 𐞖 | ᶦ | ᶫ | ᶰ | 𐞪 | ᶸ | 𐞲 | |||||||||||||||
Superscript minuscule | ᵃ | ᵇ | ᶜ | ᵈ | ᵉ | ᶠ | ᵍ | ʰ | ⁱ | ʲ | ᵏ | ˡ | ᵐ | ⁿ | ᵒ | ᵖ | 𐞥 | ʳ | ˢ | ᵗ | ᵘ | ᵛ | ʷ | ˣ | ʸ | ᶻ | |
Overscript small cap | ◌ᷛ | ◌ᷞ | ◌ᷟ | ◌ᷡ | ◌ᷢ | ||||||||||||||||||||||
Overscript minuscule | ◌ͣ | ◌ᷨ | ◌ͨ | ◌ͩ | ◌ͤ | ◌ᷫ | ◌ᷚ | ◌ͪ | ◌ͥ | ◌ᷜ | ◌ᷝ | ◌ͫ | ◌ᷠ | ◌ͦ | ◌ᷮ | ◌ͬ | ◌ᷤ | ◌ͭ | ◌ͧ | ◌ͮ | ◌ᷱ | ◌ͯ | ◌ᷦ | ||||
Subscript minuscule | ₐ | ₑ | ₕ | ᵢ | ⱼ | ₖ | ₗ | ₘ | ₙ | ₒ | ₚ | ᵣ | ₛ | ₜ | ᵤ | ᵥ | ₓ | ||||||||||
Underscript minuscule | ◌᷊ | ◌ᪿ |
Α | Β | Γ | Δ | Ε | Ζ | Η | Θ | Ι | Κ | Λ | Μ | Ν | Ξ | Ο | Π | Ρ | Σ | Τ | Υ | Φ | Χ | Ψ | Ω | ||
Superscript minuscule | ᵝ | ᵞ | ᵟ | ᵋ | ᶿ | ᶥ | ᵠ | ᵡ | 𝿳[15] | 𝿴[16] | |||||||||||||||
Overscript minuscule | ◌ᷩ | ||||||||||||||||||||||||
Subscript minuscule | ᵦ | ᵧ | ͺ[17] | ᵨ | ᵩ | ᵪ | |||||||||||||||||||
Underscript minuscule | ◌ͅ | ◌̫[18] |
Ҫ | ||||||||||||||||||||||||||||
Superscript | ᵸ | |||||||||||||||||||||||||||
Overscript | ◌ⷶ | ◌ⷠ | ◌ⷡ | ◌ⷢ | ◌ⷣ | ◌ⷷ | ◌ꙴ | ◌ⷤ | ◌ⷥ | ◌ꙵ | ◌𞂏 | ◌ꙶ | ◌ⷦ | ◌ⷧ | ◌ⷨ | ◌ⷩ | ◌ⷪ | ◌ⷫ | ◌ⷬ | ◌ⷭ | ||||||||
Subscript | ||||||||||||||||||||||||||||
Ӏ | ||||||||||||||||||||||||||||
Superscript | ꚜ | ꚝ | ||||||||||||||||||||||||||
Overscript | ◌ⷮ | ◌ꙷ | ◌ⷹ | ◌ꚞ | ◌ⷯ | ◌ꙻ | ◌ⷰ | ◌ⷱ | ◌ⷲ | ◌ⷳ | ◌ꙸ | ◌ꙹ | ◌ꙺ | ◌ⷺ | ◌ⷻ | ◌ⷼ | ◌ꚟ | ◌ⷽ | ◌ⷾ | ◌ⷿ | ◌ⷴ | |||||||
Subscript |
Many of the Cyrillic characters were added to the Cyrillic Extended-D block, which was added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023.
See also small caps in Unicode.
The Latin Extended-F block was created for the remaining superscript IPA letters. They were added to the free Gentium Plus and Andika fonts with version 6.2 in February 2023. Additional characters for historical and para-IPA letters are pending as of 2024.
The Unicode characters for superscript (modifier) IPA and extIPA consonant letters are as follows. Characters for sounds with secondary articulation are set off in parentheses and placed below the base letters. Pairs of click letters are the current letter on the left and a traditional or para-IPA letter on the right; the latter are pending in Unicode.
Bilabial | Labiodental | Dental | Alveolar | Postalveolar | Retroflex | Palatal | Velar | Uvular | Pharyngeal | Glottal | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Nasal | pronounced as /m ᵐ/ 1D50 | pronounced as /ɱ ᶬ/ 1DAC | pronounced as /n ⁿ/ 207F 1DFF7 | 1DFF1 | pronounced as /ɳ ᶯ/ 1DAF | pronounced as /ɲ ᶮ/ 1DAE | pronounced as /ŋ ᵑ/ 1D51 | pronounced as /ɴ ᶰ/ 1DB0 | |||||||||||||||
Plosive | pronounced as /p ᵖ/ 1D56 | pronounced as /b ᵇ/ 1D47 | pronounced as /t ᵗ/ 1D57 1DB5 | pronounced as /d ᵈ/ 1D48 1DFF5 | 1DFF2 | 1DFEF | pronounced as /ʈ 𐞯/ 107AF | pronounced as /ɖ 𐞋/ 1078B | pronounced as /c ᶜ/ 1D9C | pronounced as /ɟ ᶡ/ 1DA1 | pronounced as /k ᵏ/ 1D4F | pronounced as /ɡ ᶢ//pronounced as /g ᵍ/ 1DA2/1D4D | pronounced as /q 𐞥/ 107A5 | pronounced as /ɢ 𐞒/ 10792 | pronounced as /ʡ 𐞳/ 107B3 | pronounced as /ʔ ˀ/ 02C0 | |||||||
Affricate | pronounced as /ʦ 𐞬/ 107AC | pronounced as /ʣ 𐞇/ 10787 | pronounced as /ʧ 𐞮/ 107AE 107AB | pronounced as /ʤ 𐞊/ 1078A 10789 | pronounced as /ꭧ 𐞭/ 107AD 1DFE5 | pronounced as /ꭦ 𐞈/ 10788 1DFE1 | |||||||||||||||||
Fricative | pronounced as /ɸ ᶲ/ 1DB2 | pronounced as /β ᵝ/ 1D5D | pronounced as /f ᶠ/ 1DA0 | pronounced as /v ᵛ/ 1D5B | pronounced as /θ ᶿ/ 1DBF | pronounced as /ð ᶞ/ 1D9E | pronounced as /s ˢ/ 02E2 1DFF8 | pronounced as /z ᶻ/ 1DBB 1DFF9 | pronounced as /ʃ ᶴ/ 1DB4 1D9D | pronounced as /ʒ ᶾ/ 1DBE 1DBD | pronounced as /ʂ ᶳ/ 1DB3 1DFE3 | pronounced as /ʐ ᶼ/ 1DBC 1DFE7 | pronounced as /ç ᶜ̧/ 1D9C + 0327[19] | pronounced as /ʝ ᶨ/ 1DA8 | pronounced as /x ˣ/ 02E3 (pronounced as /ɧ 𐞗/) 10797 | pronounced as /ɣ ˠ/ 02E0 | pronounced as /χ ᵡ/ 1D61 | pronounced as /ʁ ʶ/ 02B6 | pronounced as /ħ 𐞕/ 10795 (pronounced as /ʩ 𐞐/) 10790 | pronounced as /ʕ ˤ/ [20] | pronounced as /h ʰ/ 02B0 1DFF6 | pronounced as /ɦ ʱ/ 02B1 | |
Approximant | pronounced as /ʋ ᶹ/ 1DB9 | pronounced as /ɹ ʴ/ 02B4 | pronounced as /ɻ ʵ/ 02B5 | pronounced as /j ʲ/ 02B2 (pronounced as /ɥ ᶣ/) 1DA3 | (pronounced as /ʍ ꭩ/) AB69 | pronounced as /ɰ ᶭ/ 1DAD (pronounced as /w ʷ/) 02B7 | |||||||||||||||||
Tap/flap | pronounced as /ⱱ 𐞰/ 107B0 | pronounced as /ɾ 𐞩/ 107A9 | pronounced as /ɽ 𐞨/ 107A8 | ||||||||||||||||||||
Trill | pronounced as /ʙ 𐞄/ 10784 | pronounced as /r ʳ/ 02B3 | pronounced as /ʀ 𐞪/ 107AA | pronounced as /ʜ 𐞖/ 10796 | pronounced as /ʢ 𐞴/ 107B4 | ||||||||||||||||||
Lateral fricative | pronounced as /ɬ 𐞛/ 1079B (pronounced as /ʪ 𐞙/) 10799 | pronounced as /ɮ 𐞞/ 1079E (pronounced as /ʫ 𐞚/) 1079A | pronounced as /ꞎ 𐞝/ 1079D | pronounced as /𝼅 𐞟/ 1079F | pronounced as /𝼆 𐞡/ 107A1 | pronounced as /𝼄 𐞜/ 1079C | |||||||||||||||||
Lateral approximant | pronounced as /l ˡ/ 02E1 (pronounced as /ᶅ ᶪ/) 1DAA | 1DFF0 | pronounced as /ɭ ᶩ/ 1DA9 | pronounced as /ʎ 𐞠/ 107A0 | pronounced as /ʟ ᶫ/ 1DAB (pronounced as /ɫ ꭞ/)[21] AB5E | ||||||||||||||||||
Lateral tap/flap | pronounced as /ɺ 𐞦/ 107A6 | pronounced as /𝼈 𐞧/ 107A7 | |||||||||||||||||||||
Implosive | pronounced as /ƥ 𝿼/ 1DFFC | pronounced as /ɓ 𐞅/ 10785 | pronounced as /ƭ 𝿾/ 1DFFE | pronounced as /ɗ 𐞌/ 1078C | pronounced as / 𝿿/ 1DFFF | pronounced as /ᶑ 𐞍/ 1078D | pronounced as /ƈ 𝿺/ 1DFFA | pronounced as /ʄ 𐞘/ 10798 | pronounced as /ƙ 𝿻/ 1DFFB | pronounced as /ɠ 𐞓/ 10793 | pronounced as /ʠ 𝿽/ 1DFFD | pronounced as /ʛ 𐞔/ 10794 | |||||||||||
Click release[22] | pronounced as /ʘ 𐞵/ 107B5 | pronounced as /ǀ 𐞶/ 107B6 | pronounced as /ʇ 𐞻/ 107BB | pronounced as /ǃ ꜝ/ A71D | pronounced as /ʗ 𐞽/ 107BD | pronounced as /𝼊 𐞹/ 107B9 | pronounced as /ψ 𝿳/ 1DFF3 | pronounced as /ǂ 𐞸/ 107B8 | pronounced as / 𐞿/ 107BF | (pronounced as /ʞ 𐞾/) 107BE | |||||||||||||
Lateral click release | pronounced as /ǁ 𐞷/ 107B7 | pronounced as /ʖ 𐞼/ 107BC | |||||||||||||||||||||
Percussive | pronounced as /¡ ꜞ/ A71E[23] |
The spacing diacritic for ejective consonants, U+2BC, works with superscript letters despite not being superscript itself: (IPA|ᵖʼ ᵗʼ ᶜʼ ᵏˣʼ). If a distinction needs to be made, the combining apostrophe U+315 may be used: (IPA|ᵖ̕ ᵗ̕ ᶜ̕ ᵏˣ̕). The spacing diacritic should be used for a baseline letter with a superscript release, such as pronounced as /[tˢʼ]/ or pronounced as /[kˣʼ]/, where the scope of the apostrophe includes the non-superscript letter, but the combining apostrophe U+315 might be used to indicate a weakly articulated ejective consonant like pronounced as /[ᵗ̕]/ or pronounced as /[ᵏ̕]/, where the whole consonant is written as a superscript, or together with U+2BC when separate apostrophes have scope over the base and modifier letters, as in (IPA|pʼᵏˣ̕).[24]
Spacing diacritics, as in (IPA|tʲ), cannot be secondarily superscripted in plain text: (IPA|ᵗʲ). (In this instance, the old IPA letter for pronounced as /[tʲ]/, (IPA|ƫ), has a superscript variant in Unicode, U+1DB5 (IPA|ᶵ), but that is not generally the case.)
Among older letters, (IPA|ꜧ) (U+A727) was a graphic variant of (IPA|ɮ). Its superscript is supported at (IPA|ꭜ) (U+AB5C). The most common letters with palatal hook are also supported; they are displayed in the table above. IPA once had an idiosyncratic curl on some of the palatalized letters: these are the fricative letters (IPA|ʆ ʓ) and the affricate ligatures (IPA|). In 2024 their superscript forms are pending at (IPA|𝿦 𝿢 𝿤 𝿠) (U+1DFE6, 1DFE2, 1DFE4, 1DFE0). The retired letters (IPA|ƞ) and (IPA|ɼ) have pending support at (IPA|𝿜) (1DFDC) and (IPA|𝿝) (1DFDD).
Among para-IPA letters, Sinological superscript (IPA|ȡ ȴ ȵ ȶ) are pending at (IPA|) (U+1DFEF - 1DFF2).[13] Superscripts of the Bantuist labio-dental plosives (IPA|ȹ) and (IPA|ȸ) are pending at (IPA|𝿟) and (IPA|𝿞).The central semivowels (IPA|ɉ), (IPA|𝼾) (pronounced as /ɥ̶/) and (IPA|𝼿) (pronounced as /w̶/) are pending at U+1DFD9 (IPA|𝿙), 1DFD8 (IPA|𝿘), 1DFDB (IPA|𝿛).
The Unicode characters for superscript (modifier) IPA vowel letters, plus a pair of extended letters (IPA|ᵻ ᵿ) found in English dictionaries, are as follows. Recently retired alternative letters such as (IPA|ɩ ɷ) are also supported; they are set off in parentheses and placed below the standard IPA letters:
Front | Central | Back | |||||
---|---|---|---|---|---|---|---|
Close | pronounced as /i ⁱ/ 2071 | pronounced as /y ʸ/ 02B8 | pronounced as /ɨ ᶤ/ 1DA4 | pronounced as /ʉ ᶶ/ 1DB6 | pronounced as /ɯ ᵚ/ 1D5A | pronounced as /u ᵘ/ 1D58 | |
Near-close | pronounced as /ɪ ᶦ/ 1DA6 (pronounced as /ɩ ᶥ/) 1DA5 | pronounced as /ʏ 𐞲/ 107B2 | (pronounced as /ᵻ ᶧ/) 1DA7 | (pronounced as /ᵿ 𝿚/) 1DFDA | pronounced as /ʊ ᶷ/ 1DB7 (pronounced as /ɷ /) 107A4 | ||
Close-mid | pronounced as /e ᵉ/ 1D49 | pronounced as /ø 𐞢/ 107A2 | pronounced as /ɘ 𐞎/ 1078E | pronounced as /ɵ ᶱ/ 1DB1 | pronounced as /ɤ 𐞑/ 10791 | pronounced as /o ᵒ/ 1D52 | |
Mid | pronounced as /ə ᵊ/ 1D4A | ||||||
Open-mid | pronounced as /ɛ ᵋ/ 1D4B | pronounced as /œ ꟹ/ A7F9 | pronounced as /ɜ ᶟ/ 1D9F (pronounced as /ᴈ ᵌ/) 1D4C | pronounced as /ɞ 𐞏/ 1078F | pronounced as /ʌ ᶺ/ 1DBA | pronounced as /ɔ ᵓ/ 1D53 | |
Near-open | pronounced as /æ 𐞃/ 10783 | pronounced as /ɶ 𐞣/ 107A3 | pronounced as /ɐ ᵄ/ 1D44 | pronounced as /ɑ ᵅ/ 1D45 | pronounced as /ɒ ᶛ/ 1D9B | ||
Open | pronounced as /a ᵃ/ 1D43 |
The precomposed Unicode rhotic vowel letters (IPA|ɚ ɝ) are not directly supported. The rhotic diacritic U+02DE pronounced as /◌˞/ should be used instead: (IPA|ᵊ˞ ᶟ˞).[25]
(IPA|ɜ) and (IPA|ᶟ) are reversed pronounced as /ɛ/. The older IPA turned pronounced as /ɛ/, (IPA|ᴈ), is also supported, at U+1D4C (IPA|ᵌ). However, the briefly resurrected vowel letter (IPA|ʚ) (U+029A) is not supported, only its reversed replacement (IPA|ɞ) is.
Among older letters, (IPA|ᴜ) (U+1D1C), a graphic variant of (IPA|ʊ), is supported at (IPA|ᶸ) (U+1DB8).
Among para-IPA letters, Sinological superscript (IPA|ɿ ʅ ʮ ʯ) are pending at (IPA|) (U+1DFEB - 1DFEE).[13]
The two length marks are also supported:
These are used to add length to another superscript, such as long aspiration.
Superscript wildcards (full caps) are largely supported: e.g. pronounced as /ᴺC/ (prenasalized consonant), pronounced as /ꟲN/ (prestopped nasal), pronounced as /Pꟳ/ (fricative release), pronounced as /NᴾF/ (epenthetic plosive), pronounced as /CVNᵀ/ (tone-bearing syllable), pronounced as /Cᴸ/ (liquid or lateral release), pronounced as /Cᴿ/ (rhotic or resonant release), pronounced as /Vᴳ/ (off-glide/diphthong), pronounced as /Cⱽ/ (fleeting vowel). Superscript pronounced as /S/ for sibilant release has preliminary approval for Unicode 17 (as ꟱); superscript pronounced as /Ʞ/ for fleeting/epenthetic click does not. Other basic Latin superscript wildcards for tone and weak indeterminate sounds, as described in the article on the International Phonetic Alphabet, are mostly supported. (See table in previous section.)
In addition, a very few IPA letters beyond the basic Latin alphabet have combining forms or are supported as subscripts:
ɑ | æ | ç | ð | ə | ʃ | ʍ | ʔ | ʼ | ||
Overscript | ◌ᷧ | ◌ᷔ | ◌ᷗ | ◌ᷙ | ◌ᷪ | ◌ᷯ | ◌̉[26] | ◌̓ | ||
Subscript | ₔ | |||||||||
Underscript | ◌ᫀ | ◌̦ |
Primarily for compatibility with earlier character sets, Unicode contains a number of characters that compose super- and subscripts with other symbols.[1] In most fonts these render much better than attempts to construct these symbols from the above characters or by using markup.