ISO/IEC 646 explained

ISO/IEC 646 encoding family
Standard:ISO/IEC 646, ITU T.50
Prev:US-ASCII
Next:ISO/IEC 8859, ISO/IEC 10646
Otherrelated:DEC NRCS, World System Teletext
Adaptations to other alphabets:
ELOT 927, Symbol, KOI-7, SRPSCII and MAKSCII, ASMO 449, SI 960
Classification:7-bit Basic Latin encoding

ISO/IEC 646 is a set of ISO/IEC standards, described as Information technology — ISO 7-bit coded character set for information interchange and developed in cooperation with ASCII at least since 1964. Since its first edition in 1967 it has specified a 7-bit character code from which several national standards are derived.

ISO/IEC 646 was also ratified by ECMA as ECMA-6. The first version of ECMA-6 had been published in 1965, based on work the ECMA's Technical Committee TC1 had carried out since December 1960.

Characters in the ISO/IEC 646 Basic Character Set are invariant characters. Since that portion of ISO/IEC 646, that is the invariant character set shared by all countries, specified only those letters used in the ISO basic Latin alphabet, countries using additional letters needed to create national variants of ISO/IEC 646 to be able to use their native scripts. Since transmission and storage of 8-bit codes was not standard at the time, the national characters had to be made to fit within the constraints of 7 bits, meaning that some characters that appear in ASCII do not appear in other national variants of ISO/IEC 646.

History

ISO/IEC 646 and its predecessor ASCII (ASA X3.4) largely endorsed existing practice regarding character encodings in the telecommunications industry.

As ASCII did not provide a number of characters needed for languages other than English, a number of national variants were made that substituted some less-used characters with needed ones. Due to the incompatibility of the various national variants, an International Reference Version (IRV) of ISO/IEC 646 was introduced, in an attempt to at least restrict the replaced set to the same characters in all variants. The original version (ISO 646 IRV) differed from ASCII only in that code point 0x24, ASCII's dollar sign ($) was replaced by the international currency symbol (¤). The final 1991 version of the code ISO/IEC 646:1991 is also known as ITU T.50, International Reference Alphabet or IRA, formerly International Alphabet No. 5 (IA5). This standard allows users to exercise the 12 variable characters (i.e., two alternative graphic characters and 10 national defined characters). Among these exercises, ISO 646:1991 IRV (International Reference Version) is explicitly defined and identical to ASCII.

The ISO/IEC 8859 series of standards governing 8-bit character encodings supersede the ISO/IEC 646 international standard and its national variants, by providing 96 additional characters with the additional bit and thus avoiding any substitution of ASCII codes. The ISO/IEC 10646 standard, directly related to Unicode, supersedes all of the ISO 646 and ISO/IEC 8859 sets with one unified set of character encodings using a larger 21-bit value.

A legacy of ISO/IEC 646 is visible on Windows, where in many East Asian locales the backslash character used in filenames is rendered as ¥ or other characters such as . Despite the fact that a different code for ¥ was available even on the original IBM PC's code page 437, and a separate double-byte code for ¥ is available in Shift JIS (although this often uses alternative mapping), so much text was created with the backslash code used for ¥ (due to Shift_JIS being officially based on ISO 646:JP, although Microsoft maps it as ASCII) that even modern Windows fonts have found it necessary to render the code that way. A similar situation exists with ₩ and EUC-KR. Another legacy is the existence of trigraphs in the C programming language.

Published standards

Code page layout

The following table shows the ISO/IEC 646 Invariant character set. Each character is shown with its Unicode equivalent. National code points are gray with the ASCII character that is replaced. Yellow indicates a character that, in some regions, could be combined with a previous character as a diacritic using the backspace character, which may affect glyph choice.

In addition to the invariant set restrictions, 0x23 is restricted to be either or £ and 0x24 is restricted to be either $ or ¤ in ECMA-6:1991, equivalent to ISO/IEC 646:1991. However, these restrictions are not followed by all national variants.

Variant codes and descriptions

ISO/IEC 646 national variants

Some national variants of ISO/IEC 646 are as follows:

CodeISO-IRISO/IEC ESCApprovedNational StandardDescription
CA121ESC 2/8 7/7ISO 646CSA Z243.4-1985-1Canada (No. 1 alternative, with "î")
(French, classical) (Code page 1020)
CA2122ESC 2/8 7/8ISO 646CSA Z243.4-1985-2Canada (No. 2 alternative, with "É")
(French, reformed orthography)
CN57ESC 2/8 5/4?GB/T 1988-80People's Republic of China (Basic Latin)
CU151ESC 2/8 2/1 4/1ISO 646NC 99-10:81 / NC NC00-10:81Cuba (Spanish)
DANO9-1ESC 2/8 4/5SIS?NATS-DANONorway and Denmark (journalistic texts). Invariant code point 0x22 is displayed as «, (compare " in the IRV). It is, however, still considered a double quotation mark. Accompanies SEFI (NATS-SEFI).
DE21ESC 2/8 4/11ISO 646DIN 66003Germany (German) (Code page 1011, 20106)
DK?DS 2089Denmark (Danish) (Code page 1017)
ES17ESC 2/8 5/10ECMAOlivettiSpanish (international) (Code page 1023)
ES285ESC 2/8 6/8ECMAIBMSpain (Basque, Castilian, Catalan, Galician) (Code page 1014)
FI10ISO 646SFS 4017Finland (basic version) (Code page 1018)
FR69ESC 2/8 6/6ISO 646AFNOR NF Z 62010-1982France (French) (Code page 1010)
FR125ESC 2/8 5/2ISO 646AFNOR NF Z 62010-1973France (obsolete since April 1985) (Code page 1104)
GB4ESC 2/8 4/1ISO 646BS 4730United Kingdom (English) (Code page 1013)
HU86ESC 2/8 6/9ISO 646MSZ 7795/3Hungary (Hungarian)
IE207?NSAI 433:1996Ireland (Irish)
INV170ESC 2/8 2/1 4/2ISO 646ISO 646:1983Invariant subset
(IRV)2ESC 2/8 4/0ISO 646ISO 646:1973International Reference Version. 0x7E as an overline (ISO-IR-002).
??ISO 646ISO 646:1983International Reference Version. 0x7E as a tilde (Code page 1009, 20105).
ISO 646:1991 International Reference Version matches the US variant (see below).
IS???Iceland (Icelandic)
IT15ESC 2/8 5/9ECMAUNI 0204-70 / Olivetti?Italian (Code page 1012)
JP14ESC 2/8 4/10ISO 646JIS C 6220:1969-roJapan (Romaji) (Code page 895). Also used as an 8-bit code with the corresponding Katakana supplementary set.
JP-OCR-B92ESC 2/8 6/14ISO 646JIS C 6229-1984-bJapan (OCR-B)
KR?KS C 5636-1989South Korea
MT??Malta (Maltese, English)
NLECMAIBMNetherlands (Dutch) (Code page 1019)
NO60ESC 2/8 6/0ISO 646NS 4551 version 1Norway (Code page 1016)
NO261ESC 2/8 6/1ISO 646NS 4551 version 2Norway (obsolete since June 1987) (Code page 20108)
plBN-74/3101-01Poland (Polish has 18 letters with diacritical marks, but only 9 lowercase letters are normalized due to code space reasons.)
PT16ESC 2/8 4/12ECMAOlivettiPortuguese (international)
PT284ESC 2/8 6/7ECMAIBMPortugal (Portuguese, Spanish) (Code page 1015)
SE10ESC 2/8 4/7ISO 646SEN 850200 Annex B, SIS 63 61 27Sweden (basic Swedish) (Code page 1018, D47)
SE211ESC 2/8 4/8ISO 646SEN 850200 Annex C, SIS 63 61 27Sweden (extended Swedish for names) (Code page 20107, E47)
SEFI8-1ESC 2/8 4/3SISNATS-SEFISweden and Finland (journalistic texts). Accompanies DANO (NATS-DANO).
T.61-7bit102ESC 2/8 7/5?ITU/CCITT T.61 RecommendationInternational (Teletex). Also used with the corresponding supplementary set as an 8-bit code.
TW?CNS 5205-1996Republic of China (Taiwan)
US / (IRV)6ESC 2/8 4/2ISO 646ANSI X3.4-1968 and ISO 646:1983 (also IRV in ISO/IEC 646:1991)United States (ASCII, Code page 367, 20127)
YU141ESC 2/8 7/10ISO 646JUS I.B1.002 (YUSCII)former Yugoslavia (Croatian, Slovene, Serbian, Bosnian)
INIS49ESC 2/8 5/7IAEAINISISO 646 IRV subset

National derivatives

Some national character sets also exist which are based on ISO/IEC 646 but do not strictly follow its invariant set (see also § Derivatives for other alphabets):

Character setISO-IRISO ESCApprovedNational StandardDescription
BS_viewdata47ESC 2/8 5/6British Post OfficeViewdata and Teletext. Viewdata square (⌗) substituted for normally invariant underscore (_) which cannot be displayed on the target hardware. This is actually the encoding of Microsoft's WST_Engl.
GR / greek788ESC 2/8 6/10?HOS ELOT 927Greece (withdrawn in November 1986). Uses Greek letters in place of Roman ones and hence is not strictly speaking an ISO 646 variant.
greek7-old18ESC 2/8 5/11ECMA?Greek graphic set. Similar in concept to greek7, but uses a different mapping of letters. Also, the upper case follows the lower case.
Latin-Greek19ESC 2/8 5/12ECMA?Latin-Greek combined graphics (capitals only). Follows greek7-old, but includes Latin capitals without modification, and Greek capitals over the Latin lower case.
Latin-Greek-127ESC 2/8 5/5ECMAHoneywell-BullLatin-Greek mixed graphics (Greek capitals only). Visually unifies Greek capitals with Latin capitals where possible, and adds the remaining Greek capitals. Unlike the other Greek versions, all Basic Latin letters remain intact. Replaces invariant punctuation as well as national characters, however, and hence is still not strictly speaking an ISO 646 variant.
swiECMAOlivettiSwitzerland (French, German) (Code page 1021) Invariant code point 0x5F is changed from _ to è. Is a DEC NRCS variant, closely related to ISO 646, but lacks a fully ISO 646 compliant equivalent.

Control characters

All the variants listed above are solely graphical character sets, and are to be used with a C0 control character set such as listed in the following table:

ISO-IRISO ESCApprovedDescription
1ESC 2/1 4/0ISO 646ISO 646 controls ("ASCII controls")
7ESC 2/1 4/1ISO 646Scandinavian newspaper (NATS) controls
26ESC 2/1 4/3ISO 646IPTC controls

Associated supplementary character sets

The following table lists supplementary graphical character sets defined by the same standard as specific ISO/IEC 646 variants. These would be selected by using a mechanism such as shift out or the NATS super shift (single shift), or by setting the eighth bit in environments where one was available:

ISO-IRISO/IEC ESCNational StandardDescription
8-2ESC 2/8 4/4NATS-SEFI-ADDSupplementary code used with NATS-SEFI.
9-2ESC 2/8 4/6NATS-DANO-ADDSupplementary code used with NATS-DANO.
13ESC 2/8 4/9JIS C 6220:1969-jpKatakana, used as a supplementary code with ISO-646-JP.
103ESC 2/8 7/6ITU/CCITT T.61 Recommendation, Supplementary SetSupplementary code used with T.61.

Variant comparison chart

The specifics of the changes for some of these variants are given in the following table. Character assignments unchanged across all listed variants (i.e. which remain the same as ASCII) are not shown.

For ease of comparison, variants detailed include national variants of ISO/IEC 646, DEC's closely related National Replacement Character Set (NRCS) series used on VT200 terminals, the related European World System Teletext encoding series defined in ETS 300 706, and a few other closely related encodings based on ISO/IEC 646. Individual code charts are linked from the second column. The cells with non-white background emphasize the differences from US-ASCII (also the Basic Latin subset of ISO/IEC 10646 and Unicode).

Several characters could be used as combining characters, when preceded or followed with a backspace C0 control. This is attested in the code charts for IRV, GB, FR1, CA and CA2, which note that "',^ would behave as the diaeresis, acute accent, cedilla and circumflex (rather than quotation marks, a comma and an upward arrowhead) when preceded or followed by a backspace. The tilde character (~) was similarly introduced as a diacritic (˜). This encoding method originated in the typewriter/teletype era when use of backspace would overstamp a glyph, and may be considered deprecated.

Later, when wider character sets gained more acceptance, ISO/IEC 8859, vendor-specific character sets and eventually Unicode became the preferred methods of coding most of these variants.

Notes and References

  1. Web site: 7-bit character sets: Revisions of ASCII . Tuomas . Salste . Aivosto Oy . January 2016 . . 2016-06-13 . live . https://web.archive.org/web/20160613145224/http://www.aivosto.com/vbtips/charsets-7bit.html#body . 2016-06-13.
  2. ASCII Graphic character set . ANSI . ANSI . 1975 . 6.