ISO/IEC 10367 explained

ISO/IEC 10367:1991 is a standard developed by ISO/IEC JTC 1/SC 2,[1] defining graphical character sets for use in character encodings implementing levels 2 and 3 of ISO/IEC 4873[2] (as opposed to ISO/IEC 8859, which defines character encodings at level 1 of ISO/IEC 4873).

Relationship to ISO/IEC 8859

The parts of ISO/IEC 8859 define complete encodings at level 1 of ISO/IEC 4873 (i.e., as stateless extended ASCII single-byte encodings, reserving the C1 area), and do not allow for use of multiple parts together. For use at levels 2 and 3 of ISO/IEC 4873 (i.e., with shift codes for additional graphical character sets), ISO/IEC 8859 stipulates that equivalent sets from ISO/IEC 10367 should be used instead.

ISO/IEC 10367:1991 includes ASCII, as well as sets matching the G1 sets used for the right-hand sides (non-ASCII parts) of ISO/IEC 6937 (ITU T.51) and of ISO/IEC 8859 parts 1 through 9 (i.e., those parts that existed as of 1991, when it was published), a set of additional Roman characters supplementing some of those parts, and a set of box drawing characters (shown below).[2] [3]

Supplementary G3 Latin set

ISO/IEC 10367 includes the ISO-IR-154 graphical set, which is intended to supplement Latin alphabets number 1, 2 and 5 (i.e., ISO-8859-1, ISO-8859-2 and ISO-8859-9).[3] Specifically, it is intended for use as a G3 set in a profile of ISO/IEC 4873 in which the G1 and G2 sets include the right hand side of ISO-8859-2, and also that of either ISO-8859-1 or ISO-8859-9.[4] These configurations represent the entire ISO/IEC 6937 repertoire (ITU T.51 Annex A) without non-spacing codes.

For instance, the letter Ĉ would be encoded under ISO/IEC 4873 level 2 as 0x8F 0x23 if this set is included.

Highlighted characters also appear in ISO-8859-1 or ISO-8859-9. Under the current edition of ISO/IEC 4873 / ECMA-43 (though not earlier editions), characters must be used from the lowest-numbered working set they appear in, hence those characters are not used from this G3 set when the respective ISO-8859 right-hand side set is used as the G1 or G2 set.

Box drawing set

The following shows the box drawing set from ISO/IEC 10367, which is registered for ISO/IEC 2022 use as ISO-IR-155. It does not use the 0x20/A0 or 0x7F/FF positions, but is nonetheless registered as a 96-character set.[5]

Perl libintl includes a "ISO_10367-BOX" codec. This encodes/decodes ASCII over GL and the ISO-IR-155 box drawing set over GR with a few deviations. Specifically, it includes double-lined box-drawing characters in place of heavy-lined characters, and it replaces the upper half block (▀) at 0xCB with a private use character U+E019, documented as "Unit space B".[6]

Notes and References

  1. Web site: Information technology — Standardized coded graphic character sets for use in 8-bit codes . ISO/IEC 10367:1991 . ISO/IEC JTC 1/SC 2 . ISO/IEC JTC 1/SC 2 . . 1991.
  2. Web site: 8. Code Extension, ISO 2022 and 2375, ISO 4873 and 10367 . van Wingen . Johan W . Character sets. Letters, tokens and codes . 1999 . Terena . dead . https://web.archive.org/web/20200801214714/https://www.terena.org/activities/multiling/euroml/section08.html . 2020-08-01.
  3. Web site: 8-Bit Character Sets - ISO/IEC 10367 . Guide to the use of Character Sets in Europe . DKUUG.
  4. 154 . Supplementary Set for Latin Alphabets 1, 2 and 5. . 1990-03-01 . ECMA . Ecma International.
  5. 155 . Basic Box-Drawings Set . 1990-04-16 . ISO/IEC/JTC1/SC2/WG3 . ISO-IR-155 . ISO/IEC JTC 1/SC 2.
  6. Web site: Conversion routines for ISO_10367_BOX . Flohr . Guido . libintl-perl . Locale::RecodeData::ISO_10367_BOX.