ARIB STB-B24 encoding | |
Standard: | ARIB STB-B24 Volume 1 |
Encodes: | ARIB STB-B24 Kanji, Kana and mosaic sets, JIS X 0201 |
Classification: | ISO 2022 profile/extension |
ARIB STB-B24 Kanji set | |
Standard: | ARIB STB-B24 Volume 1 |
Lang: | Japanese, English, Russian Partial support: Greek, Chinese |
Encodings: |
|
Extends: | JIS X 0208 |
Classification: | ISO-2022-structured CJK DBCS |
Volume 1 of the Association of Radio Industries and Businesses (ARIB) STD-B24 standard for Broadcast Markup Language specifies, amongst other details, a character encoding for use in Japanese-language broadcasting. It was introduced on . The latest revision is version 6.3 as of .
It includes a number of not found in the base standards (JIS X 0208 and JIS X 0201). It was the source standard for many symbol characters which were added to Unicode, including portions of the Miscellaneous Symbols, Enclosed Alphanumeric Supplement and Enclosed Ideographic Supplement blocks.[1] Its contributions partially overlap the Unicode emoji, but were added a year earlier, in Unicode 5.2.[2]
Fascicle 1 of the ARIB STD-B62 standard, published in 2014, defines Unicode mappings for a selection of the B24 extended characters (excluding, for example, those duplicated by JIS X 0213), as well as a few extended Kanji. It also includes a mapping of utilised characters outside the Basic Multilingual Plane to the BMP's private use area.
See also: ISO 2022. The ARIB STD B24 standard defines multiple character sets and a method of switching between them. These include a Kanji set (an extension of JIS X 0208), an Alphanumeric set, a Hiragana set, Katakana sets of two distinct layouts and four mosaic sets. The sets are selected using ISO 2022 mechanisms for 94-sets, using the following codes (proportional sets use the same layout as the corresponding non-proportional ones):
Set | Type | Code (column/line) | Code (hexadecimal) | Code (ASCII character) | Comments | |
---|---|---|---|---|---|---|
Kanji | 2-byte | 4/2 | 42 | B | The escape code B used for the ARIB Kanji set is used for the 1983 version of JIS C 6226 (JIS X 0208, of which the ARIB Kanji set is an extension) in ISO-2022-JP.[3] [4] | |
Alphanumeric | 1-byte | 4/10 | 4A | J | JIS_C6220-ro (ISO646-JP, JIS X 0201 Roman set). Similar to ASCII, with two assignments differing. Escape code J matches usage in ISO-2022-JP. | |
Proportional alphanumeric | 1-byte | 3/6 | 36 | 6 | ||
Hiragana | 1-byte | 3/0 | 30 | 0 | Hiragana themselves follow the same layout as row 4 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation. | |
Proportional Hiragana | 1-byte | 3/7 | 37 | 7 | ||
Katakana | 1-byte | 3/1 | 31 | 1 | Katakana themselves follow the same layout as row 5 of JIS X 0208, but without a lead byte. Also adds several additional assignments for punctuation. | |
Proportional Katakana | 1-byte | 3/8 | 38 | 8 | ||
JIS X 0201 Katakana | 1-byte | 4/9 | 49 | I | JIS_C6220-jp (JIS X 0201 Kana set). Escape code matches usage in ISO-2022-JP-3. | |
Mosaic A | 1-byte | 3/2 | 32 | 2 | Pseudographics (ISO-IR-71) | |
Mosaic B | 1-byte | 3/3 | 33 | 3 | Pseudographics (ISO-IR-137) | |
Mosaic C | 1-byte | 3/4 | 34 | 4 | Non-spacing pseudographics (ISO-IR-71 subset with separated mosaic blocks) | |
Mosaic D | 1-byte | 3/5 | 35 | 5 | Non-spacing pseudographics |
This is a double-byte character set extending JIS X 0208.
The encoding bytes correspond to the row or cell number plus 0x20, or 32 in decimal (see below). Hence, the code set starting with 0x21 has a row number of 1, and its cell 1 has a continuation byte of 0x21 (or 33), and so forth. Most of the code corresponds to JIS X 0208.
Characters 90-45 through 90-63 and 90-66 through 90-84 (shown below shaded) are listed in the B24 standard only in table 7-10 (the list of extension characters), and are also the only characters in rows 90 through 91 which are not transport-related symbols; this is noted in the B24 standard in an endnote to table 7-10. The remainder of the extensions are listed in both table 7-4 (the double-byte code chart) and table 7-10.
See also: List of Japanese map symbols. Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
Characters from ARIB STD-B24 which were not retained in ARIB STD-B62 are shown shaded.
See main article: Code page 895 and JIS X 0201.
Most of ARIB STD-B24 Mosaic Set D does not exist in Unicode.
In addition to the modified ISO 2022 encoding, the B24 standard also specifies a Shift JIS encoding following JIS X 0208:1997, but with the addition of the extended characters in the kanji set.