ISO-IR-111 explained

KOI8-E (1986)
Alias:ISO-IR-111
Standard:ECMA-113:1986
Classification:Extended ASCII, KOI
Extends:KOI8-B
Lang:Russian, Belarusian, Macedonian, Serbian, Ukrainian (partial)
Next:ECMA-113:1988 (ISO-8859-5)
Otherrelated:KOI8-F

ISO-IR-111[1] or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian (except Ґґ which is added to KOI8-F). The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022.

It was defined by the first (1986) edition of ECMA-113,[2] which is the Ecma International standard corresponding to, and as such also corresponds to a 1987 draft version of ISO-8859-5.[3] The published editions of instead correspond to subsequent editions of ECMA-113, which defines a different encoding.

Naming confusion

ISO-IR-111, the 1985 edition of ECMA-113 (also called "ECMA-Cyrillic" or "KOI8-E"), was based on the 1974 edition of GOST 19768 (i.e. KOI-8). In 1987 ECMA-113 was redesigned.[4] These newer editions of ECMA-113 are equivalent to ISO-8859-5,[4] [5] and do not follow the KOI layout. This confusion has led to a common misconception that ISO-8859-5 was defined in or based on GOST 19768-74.[5]

Possibly as another consequence of this, erroneously lists a different codepage under the names "ISO-IR-111" and "ECMA-Cyrillic", resembling ISO-8859-5 with re-ordered rows, and partially compatible with Windows-1251.[6] Due to concerns that existing implementations might use the RFC 1345 definition for those two labels, it was proposed that the IANA additionally recognise as a label for ECMA-113:1985 content, and the IANA presently lists that label as an alias.[7]

Character set

The following table shows the ISO-IR-111 encoding. Each character is shown with its equivalent Unicode code point.

Extended and modified versions

A modified version named KOI8 Unified or KOI8-F was used in software produced by Fingertip Software, adding the Ґ in its KOI8-U location (replacing the soft hyphen and displacing the universal currency sign), and adding some graphical characters in the C1 control codes area, mainly from KOI8-R and Windows-1251.[8] [9]

Incorrect RFC 1345 code page

RFC 1345's "ECMA-Cyrillic"
Classification:Extended ASCII
Encodes:ISO-IR-111
Lang:Russian, Belarusian, Macedonian, Serbian
Otherrelated:ISO-8859-5, Windows-1251

erroneously lists a different code page under the name ISO-IR-111, encoding the same Cyrillic characters but with a different layout. It resembles a mixture of Windows-1251 and ISO-8859-5.[6] Specifically, line A_ corresponds to ISO-8859-5, lines C_ through F_ correspond to Windows-1251 (equivalent to lines B_ through E_ of ISO-8859-5), and line B_ nearly corresponds to line F_ of ISO-8859-5, with the exception of the § being replaced with a ¤.

Certain codes resemble ISO-IR-111 with flipped letter case, which may have contributed to the confusion. The majority differ and are shown below.

See also

Notes and References

  1. 111 . 1 August 1985 . Right-hand Part of the Cyrillic Alphabet . ECMA . Ecma International.
  2. https://www.ecma-international.org/publications/files/ECMA-ST/Ecma-113,%201st%20edition%20June%201986.pdf ECMA-113. 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Cyrillic Alphabet (1st ed., June 1986)
  3. Web site: The Cyrillic Charset Soup . 2016-12-03 . Roman . Czyborra . 1998-05-25 . 1998-11-30 . dead . https://web.archive.org/web/20161203230933/http://czyborra.com/charsets/cyrillic.html . 2016-12-03 .
  4. https://www.ecma-international.org/publications/files/ECMA-ST/Ecma-113,%202nd%20edition%20June%201988.pdf ECMA-113. 8-Bit Single-Byte Coded Graphic Character Sets - Latin/Cyrillic Alphabet (2nd ed., June 1988)
  5. Web site: Review of 8-bit Cyrillic encodings universe . Valentin . Nechayev . 2013 . 2001 . 2016-12-05 . live . https://web.archive.org/web/20161205134629/http://segfault.kiev.ua/cyrillic-encodings/ . 2016-12-05.
  6. Web site: ECMA-cyrillic alias iso-ir-111 sore . Sokolov . Michael . 2003-04-05 . IETF Charsets Mailing List.
  7. Web site: Character Sets. IANA .
  8. Web site: KOI8 Unified . dead . https://web.archive.org/web/19980109123404/http://fingertipsoft.com/ref/cyrillic/koi8-uni.html . 1998-01-09 . Fingertip Software . 2020-02-11 .
  9. Web site: Mark . Leisher . KOI8 Unified Cyrillic to Unicode 2.1 mapping table . 2008 . 1998-03-05 . . 2020-05-02 .