Byte addressing explained

Byte addressing in hardware architectures supports accessing individual bytes. Computers with byte addressing are sometimes called byte machines, in contrast to word-addressable architectures, word machines, that access data by word.

Background

The basic unit of digital storage is a bit, storing a single 0 or 1. Many common instruction set architectures can address more than 8 bits of data at a time. For example, 32-bit x86 processors have 32-bit general-purpose registers and can handle 32-bit (4-byte) data in single instructions. However, data in memory may be of various lengths. Instruction sets that support byte addressing supports accessing data in units that are narrower than the word length. An eight-bit processor like the Intel 8008 addresses eight bits, but as this is the full width of the accumulator and other registers, this could be considered either byte-addressable or word-addressable. 32-bit x86 processors, which address memory in 8-bit units but have 32-bit general-purpose registers and can operate on 32-bit items with a single instruction, are byte-addressable.

The advantage of word addressing is that more memory can be addressed in the same number of bits. The IBM 7094 has 15-bit addresses, so could address 32,768 words of 36 bits. The machines were often built with a full complement of addressable memory. Addressing 32,768 bytes of 6 bits would have been much less useful for scientific and engineering users. Or consider 32-bit x86 processors. Their 32-bit linear addresses can address 4 billion different items. Using word addressing, a 32-bit processor could address 4 Gigawords; or 16 Gigabytes using the modern 8-bit byte. If the 386 and its successors had used word addressing, scientists, engineers, and gamers could all have run programs that were 4x larger on 32-bit machines. However, word processing, rendering HTML, and all other text applications would have run more slowly.

When computers were so costly that they were only or mainly used for science and engineering, word addressing was the obvious mode. As it became cost-effective to use computers for handling text, hardware designers moved to byte addressing.

To illustrate why byte addressing is useful, consider the IBM 7094, which is word-addressable and has no concept of a byte. It has 36-bit words and stores its six-bit character codes six to a word. To change the 16th character in a string, the program has to determine that this is the fourth character of the third word in the string, fetch the third word, mask out the old value of the fourth character from the value held in the register, bitwise or in the new one, and then store back the amended word. At least six machine instructions. Usually, these are relegated to a subroutine, so every store or fetch of a single character involves the overhead of calling a subroutine and returning. With byte addressing, that can be achieved in one instruction: store this character code at that byte address. Text programs are easier to write, they are smaller, and run faster.

Hybrid systems

Some systems with word addressing, such as the PDP-6/10 and the GE-600/Honeywell 6000 series, have special mechanisms for accessing bytes efficiently.

On the PDP-6/10, special instructions operated on a byte pointer which included a word address, a bit offset, and a bit width. The / instructions loaded or stored one byte, the instruction incremented the byte pointer, and the / instructions incremented the byte pointer and then loaded or stored the next byte. These instructions could operate on arbitrary-width bit fields.[1] Programs took advantage of this flexibility: those not needing lowercase letters used the limited character set of 6-bit bytes for efficiency; most used 7-bit ASCII, packed 5 to a word with one unused bit; and the C implementation used 9-bit bytes because C requires all memory to be byte-addressable.

On the GE/Honeywell machines, special indirect addressing modes could be used on most instruction types, and operated on a byte pointer which could operate on either 6-bit or 9-bit bytes.[2]

Neither of these machines originally had direct machine support for random access to bytes; adjusting a byte pointer to point N bytes before or after the byte to which it currently pointed required a sequence of multiple instructions. The KL10 PDP-10 model extended the instruction to become the "adjust byte pointer" instruction,, that could adjust a byte pointer by an arbitrary number of bytes.

See also

Notes and References

  1. Book: DECsystem-10/DECSYSTEM-20 Processor Reference Manual. AD-H391A-T1. June 1982. Digital Equipment Corporation.
  2. Book: GE-625/635 Programming Reference Manual. July 1969. General Electric. 169, 171–172.