Superimposed code explained

A superimposed code such as Zatocoding is a kind of hash code that was popular in marginal punched-card systems.

Marginal punched-card systems

See main article: Edge-notched card.

Many names, some of them trademarked, have been used for marginal punched-card systems:edge-notched cards, slotted cards, E-Z Sort, Zatocards, McBee, McBee Keysort, Flexisort, Velom, Rocket, etc.The center of each card held the relevant information—typically the name and author of a book, research paper, or journal article on a nearby shelf; and a list of subjects and keywords.Some sets of cards contained all the information required by the user on the card itself, handwritten, typewritten, or on microfilm (aperture card).Every card in a stack had the same set of pre-punched holes.The user would find the particular cards relevant to a search by aligning the holes in the set of cards (using a card holder or card tray), inserting one or more knitting-needle-like rods all the way through the stack, so the desired cards (which had been notched or cut open) fell out from the irrelevant cards in the collection (left un-notched), which remain on the needle(s).A user could repeat this selection many times to form a complex Boolean searching query.A card that was relevant to 2 or more subjects would have the slot(s) for each of those subjects cut out, so that card would drop out when either one or the other or both subjects was selected .The "superimposed code" coding systems, such as Zatocoding, saved space by entering several or all subjects in the same field; such a "superimposed code" stores much more information in less space, but at the cost of occasional "false" selections.[1]

Once you have a collection of index cards, one per book, research paper, or journal article in a library, with a list of keywords (subjects) discussed in a particular book written on that book's card, the "obvious way" to code those subjects is to count up the total number of subjects used in the entire collection R, make a row of R holes near the top of every card, and for each subject actually discussed in a particular book, cut a slot from the hole corresponding to that subject in the card corresponding to that book.[2] Naturally, this also requires a separate list of every subject used in the collection that indicates which hole is punched for each subject.Unfortunately, there may be thousands of distinct subjects in the collection,and it is impractical to punch thousands of holes in every card.While it may not seem possible to use less than 1 hole per subject,superimposed code systems can solve this problem.

Superimposed codes

The Zatocoding system of information retrieval was developed by Calvin Mooers in 1947.[3]

Calvin Mooers invented Zatocoding at M.I.T., a mechanical information retrieval system based on superimposed codes, and formed the Zator Company in 1947 to commercialize its applications.[4] The particular superimposed code used in that system is called Zatocoding,while the marginal-punched card information retrieval system as a whole is called "Zator".[5]

Setting up a superimposed code for a particular library goes something like this:

n=N(1-

-
1
r
2

)

[2]

Later, when we need to find books on some particular subject, we look up that subject in our list of all R subjects, find the corresponding slot pattern of n slots,and put n needles are through the whole stack in that pattern.All of the cards that have been cut with that pattern will fall out.It is possible that a few other, undesired cards may also fall out—cards who have several subjects whose hole patterns overlap in such a way as to mimic the desired pattern.The probability F of some undesired card with v slots cut in it falling through when we select some pattern of n needles isapproximately

F=\left(

v
N

\right)n

.Most systems have a N large enough and r small enough such that, v < N/2 (i.e., the card is less than half-punched),so that probability of an undesired card falling through is less than

F<\left(

1
2

\right)n

.[2]

There are several different ways to choose which holes will be slotted for each subject.

(Several variations of Zatocoding were developed. Bourne describes a variant "for newer retrieval systems that require high performance of the superimposed coding system",[6] using an approach Mooers published in 1959.[7])

Zatocoding

Setting up a Zatocode for a particular list of R subjects goes something like this:[2]

Other superimposed codes

A Zatocode requires a code book that lists every subject and a randomly generated notch code associated with each one.Other "direct" superimposed codeshave a fixed hash function for transforming the letters in (one spelling of) a subject into a notch code.Such codes require a much shorter code book that describes the translation of letters in a word to the corresponding notch code, and can in principle easily add new subjects without changing the code book.[5]

A Bloom filter can be considered a kind of superimposed code.[8]

External links

Notes and References

  1. Robert V. Williams."Punched Cards: A Brief Tutorial".computing now 2002.
  2. W. Ross Ashby.W. Ross Ashby's Journal: Zato-coding1960 Sep. 22. p. 6208-6222
  3. "About the Cover".College and Research Libraries News, April 2008.http://crln.acrl.org/content/69/4.cover-expansionhttps://www.flickr.com/photos/acrl/2387064291/
  4. [Eugene Garfield]
  5. [Herbert Marvin Ohlman]
  6. Book: Bourne, Charles P.. Methods of Information Handling. John Wiley & Sons, Inc.. 1963. 67.
  7. Book: Mooers, Calvin N.. The Application of Simple Pattern Inclusion Selection to Large-Scale Information Retrieval Systems. Zator Company. April 1959.
  8. James Blustein; and Amal El-Maazawi."Bloom Filters - A Tutorial, Analysis, and Survey".p. 11.