Operator grammar explained

Operator grammar is a mathematical theory of human language that explains how language carries information. This theory is the culmination of the life work of Zellig Harris, with major publications toward the end of the last century. Operator grammar proposes that each human language is a self-organizing system in which both the syntactic and semantic properties of a word are established purely in relation to other words. Thus, no external system (metalanguage) is required to define the rules of a language. Instead, these rules are learned through exposure to usage and through participation, as is the case with most social behavior. The theory is consistent with the idea that language evolved gradually, with each successive generation introducing new complexity and variation.

Operator grammar posits three universal constraints: dependency (certain words depend on the presence of other words to form an utterance), likelihood (some combinations of words and their dependents are more likely than others) and reduction (words in high likelihood combinations can be reduced to shorter forms, and sometimes omitted completely). Together these provide a theory of language information: dependency builds a predicate–argument structure; likelihood creates distinct meanings; reduction allows compact forms for communication.

Dependency

The fundamental mechanism of operator grammar is the dependency constraint: certain words (operators) require that one or more words (arguments) be present in an utterance. In the sentence John wears boots, the operator wears requires the presence of two arguments, such as John and boots. (This definition of dependency differs from other dependency grammars in which the arguments are said to depend on the operators.)

In each language the dependency relation among words gives rise to syntactic categories in which the allowable arguments of an operator are defined in terms of their dependency requirements. Class N contains the words that do not require the presence of other words (e.g. John). Class O_N contains the words that require exactly one word of type N (e.g. stumble). Class O_O contains the words that require exactly one word of type O (e.g. handsome). Class O_NN contains the words that require two words of type N (e.g. wear). Class O_OO contains the words that require two words of type O (e.g. because), as in John stumbles because John wears boots. Other classes include O_ON (e.g. with), O_NO (e.g. say), O_NNN (e.g. put), and O_NNO (e.g. ask).

The categories in operator grammar are universal and are defined purely in terms of how words relate to other words, and do not rely on an external set of categories such as noun, verb, adjective, adverb, preposition, conjunction, etc. The dependency properties of each word are observable through usage and therefore learnable.

Likelihood

The dependency constraint creates a structure (syntax) in which any word of the appropriate class can be an argument for a given operator. The likelihood constraint places additional restrictions on this structure by making some operator/argument combinations more likely than others. Thus, John wears hats is more likely than John wears snow which in turn is more likely than John wears vacation. The likelihood constraint creates meaning (semantics) by defining each word in terms of the words it can take as arguments, or of which it can be an argument.

Each word has a unique set of words with which it has been observed to occur called its selection. The coherent selection of a word is the set of words for which the dependency relation has above average likelihood. Words that are similar in meaning have similar coherent selection. This approach to meaning is self-organizing in that no external system is necessary to define what words mean. Instead, the meaning of the word is determined by its usage within a population of speakers. Patterns of frequent use are observable and therefore learnable. New words can be introduced at any time and defined through usage.

In this sense, link grammar could be viewed as a kind of operator grammar, in that the linkage of words is determined entirely by their context, and that each selection is assigned a log-likelihood.

Reduction

The reduction constraint acts on high likelihood combinations of operators and arguments and makes more compact forms. Certain reductions allow words to be omitted completely from an utterance. For example, I expect John to come is reducible to I expect John, because to come is highly likely under expect. The sentence John wears boots and John wears hats can be reduced to John wears boots and hats because repetition of the first argument John under the operator and is highly likely. John reads things can be reduced to John reads, because the argument things has high likelihood of occurring under any operator.

Certain reductions reduce words to shorter forms, creating pronouns, suffixes and prefixes (morphology). John wears boots and John wears hats can be reduced to John wears boots and he wears hats, where the pronoun he is a reduced form of John. Suffixes and prefixes can be obtained by appending other freely occurring words, or variants of these. John is able to be liked can be reduced to John is likeable. John is thoughtful is reduced from John is full of thought, and John is anti-war from John is against war.

Modifiers are the result of several of these kinds of reductions, which give rise to adjectives, adverbs, prepositional phrases, subordinate clauses, etc.

John wears boots; the boots are of leather (two sentences joined by semicolon operator) →
John wears boots which are of leather (reduction of repeated noun to relative pronoun) →
John wears boots of leather (omission of high likelihood phrase which are) →
John wears leather boots (omission of high likelihood operator of, transposition of short modifier to left of noun)

Each language has a unique set of reductions. For example, some languages have morphology and some don’t; some transpose short modifiers and some do not. Each word in a language participates only in certain kinds of reductions. However, in each case, the reduced material can be reconstructed from knowledge of what is likely in the given operator/argument combination. The reductions in which each word participates are observable and therefore learnable, just as one learns a word’s dependency and likelihood properties.

Information

The importance of reductions in operator grammar is that they separate sentences that contain reduced forms from those that don’t (base sentences). All reductions are paraphrases, since they do not remove any information, just make sentences more compact. Thus, the base sentences contain all the information of the language and the reduced sentences are variants of these. Base sentences are made up of simple words without modifiers and largely without affixes, e.g. snow falls, sheep eat grass, John knows sheep eat grass, that sheep eat snow surprises John.

Each operator in a sentence makes a contribution in information according to its likelihood of occurrence with its arguments. Highly expected combinations have low information; rare combinations have high information. The precise contribution of an operator is determined by its selection, the set of words with which it occurs with high frequency. The arguments boots, hats, sheep, grass and snow differ in meaning according to the operators for which they can appear with high likelihood in first or second argument position. For example, snow is expected as first argument of fall but not of eat, while the reverse is true of sheep. Similarly, the operators eat, devour, chew and swallow differ in meaning to the extent that the arguments they select and the operators that select them differ.

Operator grammar predicts that the information carried by a sentence is the accumulation of contributions of each argument and operator. The increment of information that a given word adds to a new sentence is determined by how it was used before. In turn, new usages stretch or even alter the information content associated with a word. Because this process is based on high frequency usage, the meanings of words are relatively stable over time, but can change in accordance with the needs of a linguistic community.