ARIB STD B24 character set explained

ARIB STB-B24 encoding
Standard:ARIB STB-B24 Volume 1
Encodes:ARIB STB-B24 Kanji, Kana and mosaic sets,
JIS X 0201
Classification:ISO 2022 profile/extension
ARIB STB-B24 Kanji set
Standard:ARIB STB-B24 Volume 1
Lang:Japanese, English, Russian
Partial support: Greek, Chinese
Encodings:
  • ARIB STB-B24 encoding (ISO 2022 based)
  • Shift JIS (ARIB variant)
Extends:JIS X 0208
Classification:ISO-2022-structured CJK DBCS

Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on . The latest revision is version 6.3 as of .

It includes a number of not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[1] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[2]

Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji. It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.

Sets and codes

See also: ISO 2022. The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets. The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):

SetTypeCode (column/line)Code (hexadecimal)Code (ASCII character)Comments
Kanji2-byte4/242BThe escape code B used for the ARIB Kanji set is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[3] [4]
Alphanumeric1-byte4/104AJJIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP.
Proportional alphanumeric1-byte3/6366
Hiragana1-byte3/0300Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Hiragana1-byte3/7377
Katakana1-byte3/1311Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation.
Proportional Katakana1-byte3/8388
JIS X 0201 Katakana1-byte4/949IJIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3.
Mosaic A1-byte3/2322Pseudographics (ISO-IR-71)
Mosaic B1-byte3/3333Pseudographics (ISO-IR-137)
Mosaic C1-byte3/4344Non-spacing pseudographics (ISO-IR-71 subset with separated mosaic blocks)
Mosaic D1-byte3/5355Non-spacing pseudographics

Code charts

Kanji (double-byte) set

This is a double-byte character set extending JIS X 0208.

Lead byte

The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208.

Character set 0x7A (row number 90, traffic symbols)

Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below shaded) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10. The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.

Character set 0x7B (row number 91, map symbols)

See also: List of Japanese map symbols. Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

Character set 0x7C (row number 92, units, enclosed forms, list markers, arrows)

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

Character set 0x7D (row number 93, game and weather symbols, fractions, units, enclosed forms)

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

Character set 0x7E (row number 94, list markers)

Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.

Single-byte sets

Alphanumeric set

See main article: Code page 895 and JIS X 0201.

Mosaic sets

Most of ARIB STD-B24 Mosaic Set D does not exist in Unicode.

Shift_JIS variant

In addition to the modified ISO 2022 encoding, the B24 standard also specifies a Shift JIS encoding following JIS X 0208:1997, but with the addition of the extended characters in the kanji set.

References

Further reading

External links

Notes and References

  1. Web site: ISO/IEC JTC1/SC2/WG2 N 3397: Japanese TV Symbols . Suignard . Michel . 2008-03-11.
  2. Web site: Unicode 5.2 Emoji List. Emojipedia.
  3. 87 . Japanese Graphic Character Set for Information Interchange . Japanese National Committee on ISO/TC97/SC2 . 1984-07-01.
  4. (IETF)