Ray Solomonoff (July 25, 1926 – December 7, 2009)[1] [2] was an American mathematician who invented algorithmic probability,[3] his General Theory of Inductive Inference (also known as Universal Inductive Inference),[4] and was a founder of algorithmic information theory.[5] He was an originator of the branch of artificial intelligence based on machine learning, prediction and probability. He circulated the first report on non-semantic machine learning in 1956.[6]
Solomonoff first described algorithmic probability in 1960, publishing the theorem that launched Kolmogorov complexity and algorithmic information theory. He first described these results at a conference at Caltech in 1960,[7] and in a report, Feb. 1960, "A Preliminary Report on a General Theory of Inductive Inference."[8] He clarified these ideas more fully in his 1964 publications, "A Formal Theory of Inductive Inference," Part I[9] and Part II.[10]
Algorithmic probability is a mathematically formalized combination of Occam's razor,[11] [12] [13] [14] and the Principle of Multiple Explanations.[15] It is a machine independent method of assigning a probability value to each hypothesis (algorithm/program) that explains a given observation, with the simplest hypothesis (the shortest program) having the highest probability and the increasingly complex hypotheses receiving increasingly small probabilities.
Solomonoff founded the theory of universal inductive inference, which is based on solid philosophical foundations[4] and has its root in Kolmogorov complexity and algorithmic information theory. The theory uses algorithmic probability in a Bayesian framework. The universal prior is taken over the class of all computable measures; no hypothesis will have a zero probability. This enables Bayes' rule (of causation) to be used to predict the most likely next event in a series of events, and how likely it will be.[10]
Although he is best known for algorithmic probability and his general theory of inductive inference, he made many other important discoveries throughout his life, most of them directed toward his goal in artificial intelligence: to develop a machine that could solve hard problems using probabilistic methods.
Ray Solomonoff was born on July 25, 1926, in Cleveland, Ohio, son of Jewish Russian immigrants Phillip Julius and Sarah Mashman Solomonoff. He attended Glenville High School, graduating in 1944. In 1944 he joined the United States Navy as Instructor in Electronics. From 1947–1951 he attended the University of Chicago, studying under Professors such as Rudolf Carnap and Enrico Fermi, and graduated with an M.S. in Physics in 1951.
From his earliest years he was motivated by the pure joy of mathematical discovery and by the desire to explore where no one had gone before. At the age of 16, in 1942, he began to search for a general method to solve mathematical problems.
In 1952 he met Marvin Minsky, John McCarthy and others interested in machine intelligence. In 1956 Minsky and McCarthy and others organized the Dartmouth Summer Research Conference on Artificial Intelligence, where Solomonoff was one of the original 10 invitees—he, McCarthy, and Minsky were the only ones to stay all summer. It was for this group that Artificial Intelligence was first named as a science. Computers at the time could solve very specific mathematical problems, but not much else. Solomonoff wanted to pursue a bigger question, how to make machines more generally intelligent, and how computers could use probability for this purpose.
He wrote three papers, two with Anatol Rapoport, in 1950–52,[16] that are regarded as the earliest statistical analysis of networks.
He was one of the 10 attendees at the 1956 Dartmouth Summer Research Project on Artificial Intelligence. He wrote and circulated a report among the attendees: "An Inductive Inference Machine".[6] It viewed machine learning as probabilistic, with an emphasis on the importance of training sequences, and on the use of parts of previous solutions to problems in constructing trial solutions for new problems. He published a version of his findings in 1957.[17] These were the first papers to be written on probabilistic machine learning.
In the late 1950s, he invented probabilistic languages and their associated grammars.[18] A probabilistic language assigns a probability value to every possible string. Generalizing the concept of probabilistic grammars led him to his discovery in 1960 of Algorithmic Probability and General Theory of Inductive Inference.
Prior to the 1960s, the usual method of calculating probability was based on frequency: taking the ratio of favorable results to the total number of trials. In his 1960 publication, and, more completely, in his 1964 publications, Solomonoff seriously revised this definition of probability. He called this new form of probability "Algorithmic Probability" and showed how to use it for prediction in his theory of inductive inference. As part of this work, he produced the philosophical foundation for the use of Bayes rule of causation for prediction.
The basic theorem of what was later called Kolmogorov Complexity was part of his General Theory. Writing in 1960, he begins: "Consider a very long sequence of symbols ... We shall consider such a sequence of symbols to be 'simple' and have a high a priori probability, if there exists a very brief description of this sequence – using, of course, some sort of stipulated description method. More exactly, if we use only the symbols 0 and 1 to express our description, we will assign the probability 2−N to a sequence of symbols if its shortest possible binary description contains N digits."[19]
The probability is with reference to a particular universal Turing machine. Solomonoff showed and in 1964 proved that the choice of machine, while it could add a constant factor would not change the probability ratios very much. These probabilities are machine independent.
In 1965, the Russian mathematician Kolmogorov independently published similar ideas. When he became aware of Solomonoff's work, he acknowledged Solomonoff, and for several years, Solomonoff's work was better known in the Soviet Union than in the Western World. The general consensus in the scientific community, however, was to associate this type of complexity with Kolmogorov, who was more concerned with randomness of a sequence. Algorithmic Probability and Universal (Solomonoff) Induction became associated with Solomonoff, who was focused on prediction — the extrapolation of a sequence.
Later in the same 1960 publication Solomonoff describes his extension of the single-shortest-code theory. This is AlgorithmicProbability. He states: "It would seem that if there are several different methods of describing a sequence, each of these methods should be given some weight in determining the probability of that sequence."[20] He then shows how this idea can be used to generate the universal a priori probability distribution and how it enables the use of Bayes rule in inductive inference. Inductive inference, by adding up the predictions of all models describing a particular sequence, using suitable weights based on the lengths of those models, gets the probability distribution for the extension of that sequence. This method of prediction has since become known as Solomonoff induction.
He enlarged his theory, publishing a number of reports leading up to the publications in 1964. The 1964 papers give a more detailed description of Algorithmic Probability, and Solomonoff Induction, presenting five different models, including the model popularly called the Universal Distribution.
Other scientists who had been at the 1956 Dartmouth Summer Conference (such as Newell and Simon) were developing the branch of Artificial Intelligence that used machines governed by if-then rules, fact based. Solomonoff was developing the branch of Artificial Intelligence that focussed on probability and prediction; his specific view of A.I. described machines that were governed by the Algorithmic Probability distribution. The machine generates theories together with their associated probabilities, to solve problems, and as new problems and theories develop, updates the probability distribution on the theories.
In 1968 he found a proof for theefficacy of Algorithmic Probability,[21] but mainly because of lack of general interest at that time, did not publish it until 10 years later. In his report, he published the proof for the convergence theorem.
In the years following his discovery of Algorithmic Probability he focused on how to use this probability and Solomonoff Induction in actual prediction and problem solving for A.I. He also wanted to understand the deeper implications of this probability system.
One important aspect of Algorithmic Probability is that it is complete and incomputable.
In the 1968 report he shows that Algorithmic Probability is complete; that is, if there is any describable regularity in a body of data, Algorithmic Probability will eventually discover that regularity, requiring a relatively small sample of that data. Algorithmic Probability is the only probability system known to be complete in this way. As a necessary consequence of its completeness it is incomputable. The incomputability is because some algorithms—a subset of those that are partially recursive—can never be evaluated fully because it would take too long. But these programs will at least be recognized as possible solutions. On the other hand, any computable system is incomplete. There will always be descriptions outside that system's search space, which will never be acknowledged or considered, even in an infinite amount of time. Computable prediction models hide this fact by ignoring such algorithms.
In many of his papers he described how to search for solutions to problems and in the 1970s and early 1980s developed what he felt was the best way to update the machine.
The use of probability in A.I., however, did not have a completely smooth path. In the early years of A.I., the relevance of probability was problematic. Many in the A.I. community felt probability was not usable in their work. The area of pattern recognition did use a form of probability, but because there was no broadly based theory of how to incorporate probability in any A.I. field, most fields did not use it at all.
There were, however, researchers such as Pearl and Peter Cheeseman who argued that probability could be used in artificial intelligence.
About 1984, at an annual meeting of the American Association for Artificial Intelligence (AAAI), it was decided that probability was in no way relevant to A.I.
A protest group formed, and the next year there was a workshop at the AAAI meeting devoted to "Probability and Uncertainty in AI." This yearly workshop has continued to the present day.[22]
As part of the protest at the first workshop, Solomonoff gave a paper on how to apply the universal distribution to problems in A.I.[23] This was an early version of the system he has been developing since that time.
In that report, he described the search technique he had developed. In search problems, the best order of search, is time
Ti/Pi
Ti
Pi
In other papers he explored how to limit the time needed to search for solutions, writing on resource bounded search. The search space is limited by available time or computation cost rather than by cutting out search space as is done in some other prediction methods, such as Minimum Description Length.
Throughout his career Solomonoff was concerned with the potential benefits and dangers of A.I., discussing it in many of his published reports. In 1985 he analyzed a likely evolution of A.I., giving a formula predicting when it would reach the "Infinity Point".[25] This work is part of the history of thought about a possible technological singularity.
Originally algorithmic induction methods extrapolated ordered sequences of strings. Methods were needed for dealing with other kinds of data.
A 1999 report,[26] generalizes the Universal Distribution and associated convergence theorems to unordered sets of strings and a 2008 report,[27] to unordered pairs of strings.
In 1997,[28] 2003 and 2006 he showed that incomputability and subjectivity are both necessary and desirable characteristics of any high performance induction system.
In 1970 he formed his own one man company, Oxbridge Research, and continued his research there except for periods at otherinstitutions such as MIT, University of Saarland in Germany and the Dalle Molle Institute for Artificial Intelligence in Lugano, Switzerland. In 2003 he was the first recipient of the Kolmogorov Award by The Computer Learning Research Center at the Royal Holloway, University of London, where he gave the inaugural Kolmogorov Lecture. Solomonoff was most recently a visiting professor at the CLRC.
In 2006 he spoke at AI@50, "Dartmouth Artificial Intelligence Conference: the Next Fifty Years" commemorating the fiftieth anniversaryof the original Dartmouth summer study group. Solomonoff was one of five original participants to attend.
In Feb. 2008, he gave the keynote address at the Conference "Current Trends in the Theory and Application of Computer Science" (CTTACS), held at Notre Dame University in Lebanon. He followed this with a short series of lectures, and began research on new applications of Algorithmic Probability.
Algorithmic Probability and Solomonoff Induction have many advantages for Artificial Intelligence. Algorithmic Probability gives extremely accurate probability estimates. These estimates can be revised by a reliable method so that they continue to be acceptable. It utilizes search time in a very efficient way. In addition to probability estimates, Algorithmic Probability "has for AI another important value: its multiplicity of models gives us many different ways to understand our data;
A description of Solomonoff's life and work prior to 1997 is in "The Discovery of Algorithmic Probability", Journal of Computer and System Sciences, Vol 55, No. 1, pp 73–88, August 1997. The paper, as well as most of the others mentioned here, are available on his website at the publications page.
In an article published the year of his death, a journal article said of Solomonoff: "A very conventional scientist understands his science using a single 'current paradigm'—the way of understanding that is most in vogue at the present time. A more creative scientist understands his science in very many ways, and can more easily create new theories, new ways of understanding, when the 'current paradigm' no longer fits the current data".[29]