Piece table explained

In computing, a piece table is a data structure typically used to represent a text document while it is edited in a text editor. Initially a reference (or 'span') to the whole of the original file is created, which represents the as yet unchanged file. Subsequent inserts and deletes replace a span by combinations of one, two, or three references to sections of either the original document or to a buffer holding inserted text.[1]

Typically the text of the original document is held in one immutable block, and the text of each subsequent insert is stored in new immutable blocks. Because even deleted text is still included in the piece table, this makes multi-level or unlimited undo easier to implement with a piece table than with alternative data structures such as a gap buffer.

This data structure was invented by J Strother Moore.[2]

Description

For this description, we use buffer as the immutable block to hold the contents.

A piece table consists of three columns:

In addition to the table, two buffers are used to handle edits:

Operations

Index

Definition: Index(i): return the character at position i
To retrieve the i-th character, the appropriate entry in a piece table is read.

Example

Given the following buffers and piece table:

BufferContent
Original fileipsum sit amet
Add fileLorem deletedtext dolor
Piece table!Which!Start Index!Length
Add06
Original05
Add176
Original59

To access the i-th character, the appropriate entry in the piece table is looked up.

For instance, to get the value of Index(15), the 3rd entry of piece table is retrieved. This is because the 3rd entry describes the characters from index 11 to 16 (the first entry describes characters in index 0 to 5, the next one is 6 to 10). The piece table entry instructs the program to look for the characters in the "add file" buffer, starting at index 17 in that buffer. The relative index in that entry is 15-11 = 4, which is added to the start position of the entry in the buffer to obtain index of the letter: 4+17 = 21. The value of Index(15) is the 21st character of the "add file" buffer, which is the character "o".

For the buffers and piece table given above, the following text is shown: Lorem ipsum dolor sit amet

Insert

Inserting characters to the text consists of:

Delete

Single character deletion can be one of two possible conditions:

Usage

Several text editors use an in-RAM piece table internally, including Bravo, Abiword,[3] [4] [5] Atom[6] and Visual Studio Code.[7]

The "fast save" feature in some versions of Microsoft Word uses a piece table for the on-disk file format.

The on-disk representation of text files in the Oberon System uses a piece chain technique that allows pieces of one document to point to text stored in some other document, similar to transclusion.[8]

See also

Notes and References

  1. Web site: Crowley. Charles. 10 June 1998. Data Structures for Text Sequences - 6.4 The piece table method. live. https://web.archive.org/web/20180223071931/https://www.cs.unm.edu/~crowley/papers/sds.pdf. 23 February 2018. 26 July 2021. www.cs.unm.edu.
  2. David Lu."What's been wrought using the Piece Table?".(discussion)
  3. http://www.abisource.com/wiki/AbiWordDevelopment#PieceTableBackground "AbiWord Development: Piece Table Background"
  4. James Brown. "Piece Chains: Design & Implementation of a Win32 Text Editor".
  5. Joaquin Cuenca Abela."Improving the AbiWord's Piece Table".
  6. Web site: nathansobo. 2017-10-12. Atom’s new concurrency-friendly buffer implementation. 2021-07-26. Atom Blog.
  7. https://code.visualstudio.com/updates/v1_21#_text-buffer-improvements "VS Code 1.21 Release Notes
  8. Niklaus Wirth, Jürg Gutknecht."Project Oberon: The Design of an Operating System and Compiler" .2005.p. 90.