Variant Chinese characters explained

Chinese characters may have several variant forms—visually distinct glyphs that represent the same underlying meaning and pronunciation. Variants of a given character are allographs of one another, and many are directly analogous to allographs present in the English alphabet, such as the double-storey (a) and single-storey (ɑ) variants of the letter A, with the latter more commonly appearing in handwriting. Some contexts require usage of specific variants.

Variant character
Pic:	File:Source Han Sans Version Difference.svg
Piccap:	Regional variants of the character as rendered by the Source Han Sans font family
T:	異體字
S:	异体字
Kanji:	異体字
Romaji:	itaiji
Hangul:	이체자
Rr:	icheja
L:	variant character form
P:	yìtǐzì
Y:	yihtáijih
J:	ji6-tai2-zi6
T2:	又體
S2:	又体
L2:	alternative form
P2:	yòutǐ
Y2:	yauhtái
J2:	jau6-tai2
T3:	或體
S3:	或体
L3:	or form
P3:	huòtǐ
Y3:	waahktái
J3:	waak6-tai2
C4:	重文
L4:	alternative writing
P4:	chóngwén
Y4:	chùngmàn
J4:	cung4-man4
Qn:	chữ dị thể
Hn:	異體

Nature of variants

Before the 20th century, variation in the shape of characters was ubiquitous, a dynamic which continued after the invention of woodblock printing. For example, prior to the Qin dynasty (221–206 BC) the character meaning 'bright' was written as either 明 or 朙—with either or on the left, with the component on the right. Li Si, the Chancellor of Qin, attempted to universalize the Qin small seal script across China following the wars that had politically unified the country for the first time. Li prescribed the 朙 form of the word for 'bright', but some scribes ignored this and continued to write the character as 明. However, the increased usage of 朙 was followed by proliferation of a third variant: 眀, with on the left—likely derived as a contraction of 朙. Ultimately, 明 became the character's standard form.

New variants also result from larger shifts in the writing system as a whole, such as the process of libian and liding that resulted in the clerical script. According to the palaeographer Qiu Xigui, the broadest trend in the evolution of Chinese characters over their history has been simplification, both in graphical shape, the "external appearances of individual graphs", and in graphical form, "overall changes in the distinguishing features of graphic[al] shape and calligraphic style, [...] in most cases refer[ring] to rather obvious and rather substantial changes". Libian often involved significant omissions, additions, or transmutations of the forms used by Qin small seal script, while liding is the direct regularization and linearization of shapes to convert them into clerical forms while preserving their original structure. For example, the character for 'year' was underwent liding to the clerical script form, while the same character after undergoing libian resulted in the orthodox form . Similarly, libian and liding created the two distinct characters Chinese: 虎 and Chinese: 乕 for 'tiger'.

There are variants that arise through the use of different radicals to refer to specific definitions of a polysemous character. For instance, the character Chinese: 雕 could mean either 'a type of hawk' or 'carve'. Variants using different radicals to specify thus developed: Chinese: 鵰 with a radical and Chinese: 琱 with a .

In rare cases, two characters in ancient Chinese with similar meanings were confused and conflated when their modern Chinese readings have merged, for example, Chinese: 飢 and Chinese: 饑, are both read as and mean 'famine', used interchangeably in the modern language, even though Chinese: 飢 initially meant 'insufficient food to satiate' and Chinese: 饑 meant 'famine' in Old Chinese. The two characters formerly belonged to two different Old Chinese rime groups (Chinese: 脂 and Chinese: 微 groups, respectively) and thus indicated they had different pronunciations back then. A similar situation is responsible for the existence of variants of the particle Chinese: 於 'in' which had the ancient form Chinese: 于, now used as its simplified form. In each case above, variants were merged into single simplified forms.

Orthodoxy

Character forms that are most orthodox are known as orthodox variants, which is sometimes taken as mean the forms present in the Kangxi Dictionary, which usually represent the orthodox forms used in late imperial China. Non-orthodox forms are known as folk variants (; Revised Romanization: ; Hepburn:). Some folk variants are longstanding abbreviations or calligraphic forms, and later became the basis for the simplified forms adopted on the mainland. For example, is a folk variant corresponding to the orthodox form 'foolish'. These forms differ by their phonetic component, with the folk variant using a character with a "close enough" pronunciation but having much less strokes and thus quicker to write. In mainland China, simplified forms are called xin zixing, typically contrasting with jiu zixing, which are usually the Kangxi form.

Orthodox and vulgar forms may only differ by the length or location of individual strokes, whether certain strokes intersect, or the presence or absence of minor strokes (dots). These are often not considered to amount to being discrete variants. For instance, Chinese: 述 is the new form of the character with traditional orthography Korean: 述 'recount', 'describe'. As another example, the surname Chinese: 吴, also the name of an ancient state, is the 'new character shape' form of the character traditionally written Chinese: 吳.

Regional standards

Character variant exist throughout every writing system that uses Chinese characters, including written Chinese, Japanese, and Korean. Several governments of countries that speak these languages have standardized their writing systems by specifying certain variants as the standard form. The choice of which variants to use has resulted in some bifurcation of written Chinese between simplified and traditional forms. The standardization of simplified forms in Japan was distinct from the process in mainland China.

The standard character forms prescribed by the government of each region are described in:

The List of Commonly Used Standard Chinese Characters for mainland China
The List of Graphemes of Commonly-Used Chinese Characters for Hong Kong (educational usage only)
The Standard Form of National Characters for Taiwan (educational usage only)
The list of for Japan
The Kangxi Dictionary in Korea

However, it is noted that the traditional printing orthography (or commonly known as jiu zixing) is the de facto standard used by Traditional Chinese communities outside of educational usage ^[1] ^[2] .

Use in computing

Unicode deals with variant characters in a complex manner, as a result of the process of Han unification. In Han unification, some variants that are nearly identical between Chinese-, Japanese-, Korean-speaking regions are encoded in the same code point, and can only be distinguished using different typefaces. Other variants that are more divergent are encoded in different code points. On webpages, displaying the correct variants for the intended language is dependent on the typefaces installed on the computer, the configuration of the web browser and the language tags of web pages. Systems that are ready to display the correct variants are rare because many computer users do not have standard typefaces installed and the most popular web browsers are not configured to display the correct variants by default. The following are some examples of variant forms of Chinese characters with different code points and language tags.

Different code points
Chinese			Japanese	Korean
Mainland	Taiwan	Hong Kong	Japanese	Korean
Chinese: 戶戸户	Chinese: 戶戸户	Chinese: 戶戸户	Japanese: 戶戸户	Korean: 戶戸户
Chinese: 爲為为	Chinese: 爲為为	Chinese: 爲為为	Japanese: 爲為为	Korean: 爲為为
Chinese: 強强	Chinese: 強强	Chinese: 強强	Japanese: 強强	Korean: 強强
Chinese: 畫畵画	Chinese: 畫畵画	Chinese: 畫畵画	Japanese: 畫畵画	Korean: 畫畵画
Chinese: 線綫线	Chinese: 線綫线	Chinese: 線綫线	Japanese: 線綫线	Korean: 線綫线
Chinese: 匯滙	Chinese: 匯滙	Chinese: 匯滙	Japanese: 匯滙	Korean: 匯滙
Chinese: 裏裡	Chinese: 裏裡	Chinese: 裏裡	Japanese: 裏裡	Korean: 裏裡
Chinese: 夜亱	Chinese: 夜亱	Chinese: 夜亱	Japanese: 夜亱	Korean: 夜亱
Chinese: 龜亀龟	Chinese: 龜亀龟	Chinese: 龜亀龟	Japanese: 龜亀龟	Korean: 龜亀龟

The following examples have the same code points, but different language tags. However language tags rarely work correctly to get the expected forms from text renderers (e.g. in the table below where all rendered glyphs may look the same).

Same code point, different language tags
Chinese			Japanese	Korean
Mainland	Taiwan	Hong Kong	Japanese	Korean
Chinese: 刃	Chinese: 刃	Chinese: 刃	Japanese: 刃	Korean: 刃
Chinese: 令	Chinese: 令	Chinese: 令	Japanese: 令	Korean: 令
Chinese: 毒	Chinese: 毒	Chinese: 毒	Japanese: 毒	Korean: 毒
Chinese: 骨	Chinese: 骨	Chinese: 骨	Japanese: 骨	Korean: 骨
Chinese: 縣	Chinese: 縣	Chinese: 縣	Japanese: 縣	Korean: 縣
Chinese: 誤	Chinese: 誤	Chinese: 誤	Japanese: 誤	Korean: 誤
Chinese: 船	Chinese: 船	Chinese: 船	Japanese: 船	Korean: 船
Chinese: 述	Chinese: 述	Chinese: 述	Japanese: 述	Korean: 述
Chinese: 煙	Chinese: 煙	Chinese: 煙	Japanese: 煙	Korean: 煙
Chinese: 贈	Chinese: 贈	Chinese: 贈	Japanese: 贈	Korean: 贈
Chinese: 雪	Chinese: 雪	Chinese: 雪	Japanese: 雪	Korean: 雪
Chinese: 及	Chinese: 及	Chinese: 及	Japanese: 及	Korean: 及
Chinese: 角	Chinese: 角	Chinese: 角	Japanese: 角	Korean: 角
Chinese: 條	Chinese: 條	Chinese: 條	Japanese: 條	Korean: 條
Chinese: 扁	Chinese: 扁	Chinese: 扁	Japanese: 扁	Korean: 扁
Chinese: 低	Chinese: 低	Chinese: 低	Japanese: 低	Korean: 低

Instead, the Unicode standard allows encoding these variants as variation sequences,^[3] by appending a variation selector (a glyph-less non-spacing mark) to the standard CJK unified ideograph (it also works directly inside plain text, without needing to use any rich text format to select the appropriate language or script, and allows easier and more selective control when the same language/script combination needs several variants). The list of valid variation sequences is standardized by Unicode, defined in the Ideographic Variation Database (IVD),^[4] ^[5] part of the Unicode Characters Database (UCD),^[6] and it is expansible without reencoding new code points in the UCS (and since the Unicode versions where variation selectors were encoded and the IVD established, it's no longer needed to encode any new compatibility ideograph to render them; the two blocks CJK Compatibility Ideographs in the BMP and CJK Compatibility Ideographs Supplement in the SIP are now frozen since Unicode 4.1, except to fix a few past mistakes that were forgotten during the Han unification process for the review of normative sources).^[7]

References

Works cited

Notes and References

Web site: Orthography used for Plex Sans TC · Issue #346 · IBM/plex . GitHub . en.
Web site: 請還原Traditional Chinese的眞正Tradition寫法 · Issue #6 · adobe-fonts/source-han-sans . GitHub . en.
Web site: Variation Sequences; FAQ . Unicode Consortium .
Web site: Ideographic Variation Database . Unicode Consortium .
Web site: UTS #37, Unicode Ideographic Variation Database . Unicode Consortium .
Web site: Unicode Character Database, Standard Annex #44 . Unicode Consortium . Explains the different character properties.
Web site: Unicode® Standard Annex #45, U-Source Ideograph . Unicode Consortium .