CJK Compatibility Ideographs explained

Rangestart:F900
Rangeend:FAFF
Script1:Han
1 0 1:302
3 2:59
4 1:106
5 2:3
6 1:2
Sources:KS X 1001
Big5
IBM 32
JIS X 0213
ARIB STD-B24
KPS 10721-2000
Note:[1] [2]
Range was initially part of the Private Use Area in Unicode 1.0.0,[3] and removed from it in Unicode 1.0.1.

CJK Compatibility Ideographs is a Unicode block created to contain mostly Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings. However, it also contains 12 unified ideographs sourced from Japanese character sets from IBM.

The block has dozens of ideographic variation sequences registered in the Unicode Ideographic Variation Database (IVD).[4] [5] These sequences specify the desired glyph variant for a given Unicode character.

Character sources

Sources for the original collection of CJK Compatibility Ideographs include:

In ensuing versions of the standard, more characters have been added to the block from:

The "IBM 32" characters

IBM Japanese double-byte EBCDIC includes several kanji which do not exist in, or do not round-trip from, JIS X 0208. These were included as gaiji in extensions to Shift JIS and EUC-JP from IBM (e.g. code page 942), NEC, the Open Software Foundation, and Microsoft (e.g. Windows code page 932). However, they were not used as a source for the original Unified Repertoire and Ordering (URO). Instead, 32 of the IBM extension kanji, those which had not been included in the URO from other sources, were included in the CJK Compatibility Ideographs block in the range U+FA0E - U+FA2D.

Of these 32 characters:

Block

History

The following Unicode-related documents record the purpose and process of defining specific characters in the CJK Compatibility Ideographs block:

See also

Notes and References

  1. Web site: Unicode character database. The Unicode Standard. 2023-07-26.
  2. Web site: Enumerated Versions of The Unicode Standard. The Unicode Standard. 2023-07-26.
  3. Book: https://www.unicode.org/versions/Unicode1.0.0/ch03_5.pdf . 3.5: Private Use Area . 0-201-56788-1 . The Unicode Standard, Version 1.0, Volume 1 . 1991 . . 118–119.
  4. Web site: Ideographic Variation Database. Unicode Consortium.
  5. Web site: UTS #37, Unicode Ideographic Variation Database. Unicode Consortium.
  6. Web site: PropList.txt . Unicode Consortium.
  7. Web site: Known Anomalies in Unicode Character Names . These 12 characters are unified CJK ideographs, not compatibility ideographs, despite their names. . Unicode Technical Note #27 . . 2021-06-14 . Asmus . Freytag . Rick . McGowan . Ken . Whistler.