Indian Script Code for Information Interchange explained

Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Bengali–Assamese, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. ISCII does not encode the writing systems of India that are based on Persian, but its writing system switching codes nonetheless provide for Kashmiri, Sindhi, Urdu, Persian, Pashto and Arabic. The Persian-based writing systems were subsequently encoded in the PASCII encoding.

ISCII has not been widely used outside certain government institutions, although a variant without the mechanism was used on classic Mac OS, Mac OS Devanagari, and it has now been rendered largely obsolete by Unicode. Unicode uses a separate block for each Indic writing system, and largely preserves the ISCII layout within each block.[1]

Background

The Brahmi-derived writing systems have similar structure.[1] So ISCII encodes letters with the same phonetic value at the same code point, overlaying the various scripts. For example, the ISCII codes 0xB3 0xDB represent [ki]. This will be rendered as കി in Malayalam, कि in Devanagari, as ਕਿ in Gurmukhi, and as கி in Tamil. The writing system can be selected in rich text by markup or in plain text by means of the code described below.

One motivation for the use of a single encoding is the idea that it will allow easy transliteration from one writing system to another.[1] However, there are enough incompatibilities that this is not really a practical idea.

ISCII is an 8-bit encoding. The lower 128 code points are plain ASCII, the upper 128 code points are ISCII-specific. In addition to the code points representing characters, ISCII makes use of a code point with mnemonic that indicates that the following byte contains one of two kinds of information. One set of values changes the writing system until the next writing system indicator or end-of-line. Another set of values select display modes such as bold and italic. ISCII does not provide a means of indicating the default writing system.

Codepage layout

The following table shows the character set for Devanagari. The code sets for Assamese, Bengali, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu are similar, with each Devanagari form replaced by the equivalent form in each writing system. Each character is shown with its decimal code and its Unicode equivalent.

Special code points

INV character—code point D9 (217): The INV (invisible consonant) character is used as a pseudo-consonant to display combining elements in isolation. For example, क (ka) + ् (halant) + INV = क्‍ (half ka). The Unicode equivalent is . However, as noted below, the ISCII halant character can be doubled or combined with the ISCII nukta to achieve effects created by or ZWJ in Unicode. For this reason, Apple maps the ISCII INV character to the Unicode, so as to guarantee round-tripping.[2]
  • ATR character—code point EF (239): The ATR (attribute) character followed by a byte code is used to switch to a different font attribute (such as bold) or to a different ISCII or PASCII language (such as Bengali), up to the next ATR sequence or the end of the line. This has no direct Unicode equivalent, as font attributes are not part of Unicode, and each script has a distinct set of code points.
  • Presentational attributes
    ATR + byteMnemonicFormatting option
    0x30BLDBold
    0x31ITAItalics
    0x32ULUnderlining
    0x33EXPExpanded
    0x34HLTHighlight
    0x35OTLOutline
    0x36SHDShadow
    0x37TOPTop half of character (used with LOW to create double-height characters)
    0x38LOWBottom half of character (used with TOP to create double-height characters)
    0x39DBLEntire row double-width and double-height
    Shifts to ISCII scripts
    ATR + byteMnemonicISCII script
    0x40DEFDefault script (i.e. the script which will be switched back to after a line break)
    0x41RMNRomanised transliteration
    0x42DEVDevanagari
    0x43BNGBengali script
    0x44TMLTamil script
    0x45TLGTelugu script
    0x46ASMAssamese script
    0x47ORIOdia script
    0x48KNDKannada script
    0x49MLMMalayalam script
    0x4AGJRGujarati script
    0x4BPNJGurmukhī
    Shifts to PASCII
    ATR + byteMnemonicPASCII locale
    0x71ARBArabic alphabet
    0x72PESPersian alphabet
    0x73URDUrdu alphabet
    0x74SNDSindhi alphabet
    0x75KSMKashmiri alphabet
    0x76PSTPashto alphabet
    EXT character—code point F0 (240): The EXT (extensions for Vedic) character followed by a byte code indicates a Vedic accent. This has no direct Unicode equivalent, as Vedic accents are assigned to distinct code points.
  • Halant character ्—code point E8 (232): The halant character removes the implicit vowel from a consonant and is used between consonants to represent conjunct consonants. For example, क (ka) + ् (halant) + त (ta) = क्त (kta). The sequence ् (halant) + ् (halant) displays a conjunct with an explicit halant, for example क (ka) + ् (halant) + ् (halant) + त (ta) = क्‌त. The sequence ् (halant) + ़ (nukta) displays a conjunct with half consonants, if available, for example क (ka) + ् (halant) + ़ (nukta) + त (ta) = क्‍त.
  • ISCII !!colspan=2
    Unicode
    single halant E8 halant 094D
    halant + halant E8 E8 094D 200C
    halant + nukta E8 E9 094D 200D
    Nukta character ़—code point E9 (233): The nukta character after another ISCII character is used for a number of rarer characters which don't exist in the main ISCII set. For example क (ka) + ़ (nukta) = क़ (qa). These characters have precomposed forms in Unicode, as shown in the following table.
    Single Unicode characters corresponding to ISCII nukta sequences! ISCII
    code point !! Original
    character !! Character
    with nukta !! Unicode
    code point
    A1 (161) 0950
    A6 (166) 090C
    A7 (167) 0961
    AA (176) 0960
    B3 (179) क़ 0958
    B4 (180) ख़ 0959
    B5 (181) ग़ 095A
    BA (186) ज़ 095B
    BF (191) ड़ 095C
    C0 (192) ढ़ 095D
    C9 (201) फ़ 095E
    DB (219) ि 0962
    DC (220) 0963
    DF (223) 0944
    EA (234) 093D

    Code pages for ISCII conversion

    To convert from Unicode (UTF-8) to an ISCII / ANSI coding, the following code pages may be used:

    Code points for all languages

    External links

    Notes and References

    1. Book: The Unicode Standard v15.0 Chapter 12 . The Unicode Consortium . 13 August 2024.
    2. Web site: Map (external version) from Mac OS Devanagari encoding to Unicode 2.1 and later. . Apple . Apple Inc . . 2005-04-05 . 1998-02-05.