KOI8-RU explained

KOI8-RU
Lang:Belarusian, Ukrainian, Russian, Bulgarian
Classification:8-bit KOI, extended ASCII
Basedon:KOI8-U, KOI8-R
Extends:KOI8-B
Otherrelated:KOI8-E, KOI8-F

KOI8-RU is an 8-bit character encoding, designed to cover Russian, Ukrainian, and Belarusian which use a Cyrillic alphabet. It is closely related to KOI8-R, which covers Russian and Bulgarian, but replaces ten box drawing characters with five Ukrainian and Belarusian letters Ґ, Є, І, Ї, and Ў in both upper case and lower case. It is even more closely related to KOI8-U, which does not include Ў but otherwise makes the same letter replacements. The additional letter allocations are matched by KOI8-E, except for Ґ which is added to KOI8-F.

In IBM, KOI8-RU is assigned code page/CCSID 1167.[1] [2]

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In the future, both may eventually give way to Unicode.

KOI8 stands for Kod obmena informatsiey, 8 bit (Russian: '''К'''од '''о'''бмена '''и'''нформацией, 8 бит) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the eighth bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Код Обмена Информацией" in KOI8-RU becomes kOD oBMENA iNFORMACIEJ (the Russian meaning of the "KOI" acronym) if the 8th bit is stripped.

Character set

The following table shows the KOI8-RU encoding. Each character is shown with its equivalent Unicode code point.

Although RFC 2319 says that character 0x95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.

Some references have a typo and incorrectly state that character 0xB4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

See also

External links

Notes and References

  1. Web site: Code page 1167 information document. https://web.archive.org/web/20170116144609/https://www-01.ibm.com/software/globalization/cp/cp01167.html. 2017-01-16.
  2. Web site: CCSID 1167 information document. https://web.archive.org/web/20160327100212/http://www-01.ibm.com/software/globalization/ccsid/ccsid1167.html. 2016-03-27.