Cyc Explained

Cyc
Logo Caption:Cycorp company's logo
Author:Douglas Lenat
Developer:Cycorp, Inc.
Latest Release Version:6.1
Programming Language:Lisp, CycL, SubL
Genre:Ontology and knowledge base and knowledge representation language and inference engine

Cyc (pronounced) is a long-term artificial intelligence project that aims to assemble a comprehensive ontology and knowledge base that spans the basic concepts and rules about how the world works.

Hoping to capture common sense knowledge, Cyc focuses on implicit knowledge that other AI platforms may take for granted.

This is contrasted with facts one might find somewhere on the internet or retrieve via a search engine or Wikipedia.

Cyc enables semantic reasoners to perform human-like reasoning and be less "brittle" when confronted with novel situations.

Douglas Lenat began the project in July 1984 at MCC, where he was Principal Scientist 1984–1994, and then, since January 1995, has been under active development by the Cycorp company, where he was the CEO.

Overview

The need for a large-scale symbolic artificial intelligence project emerged in the early 1980s. AI researchers had observed over the preceding 25 years that AI programs often produced promising initial results but struggled to "scale up"—failing to extend beyond their initial training sets to a wider range of cases. This issue was highlighted by Douglas Lenat and Alan Kay,[1] [2] who organized a meeting at Stanford University in 1983 to discuss the problem.

The back-of-the-envelope calculations by Lenat, Kay, and their colleagues (including Marvin Minsky, Allen Newell, Edward Feigenbaum, and John McCarthy) indicated that that effort would require between 1,000 and 3,000 person-years of effort. Events within a year of that meeting enabled an effort of that scale to get underway.

The project began in July 1984 as the flagship project of the 400-person Microelectronics and Computer Technology Corporation (MCC), a research consortium started by two dozen large United States-based corporations "to counter a then ominous Japanese effort in AI, the so-called "fifth-generation" project."[3] The US Government reacted to the Fifth Generation threat by passing the National Cooperative Research Act of 1984, which for the first time allowed US companies to "collude" on long-term high-risk high-payoff research, and MCC and Sematech sprang up to take advantage of that ten-year opportunity. MCC's first President and CEO was Bobby Ray Inman, former NSA Director and Central Intelligence Agency deputy director.

The objective of the Cyc project was to codify, in machine-usable form, the millions of pieces of knowledge that compose human common sense.[4] This entailed, along the way, (1) developing an adequately expressive representation language, CycL,[5] (2) developing an ontology spanning all human concepts down to some appropriate level of detail,[6] (3) developing a knowledge base on that ontological framework, comprising all human knowledge about those concepts down to some appropriate level of detail, and (4) developing an inference engine exponentially faster than those used in then-conventional expert systems,[7] [8] to be able to infer the same types and depth of conclusions that humans are capable of, given their knowledge of the world.

In slightly more detail:

CycL has a publicly released specification and dozens of HL modules were described in Lenat and Guha's textbook, but the actual Cyc inference engine code, and the full list of 1000+ HL modules, is Cycorp-proprietary.[12]

The name "Cyc" (from "encyclopedia", pronounced pronounced as /[saɪk]/, like "syke") is a registered trademark owned by Cycorp. Access to Cyc is through paid licenses, but bona fide AI research groups are given research-only no-cost licenses (cf. ResearchCyc); as of 2017, over 600 such groups worldwide have these licenses.

Typical pieces of knowledge represented in the Cyc knowledge base are "Every tree is a plant" and "Plants die eventually". When asked whether trees die, the inference engine can draw the obvious conclusion and answer the question correctly.

Most of Cyc's knowledge, outside maths, is only true by default. For example, Cyc knows that as a default parents love their children, people smile when happy, a child's first step is a big accomplishment, people are happy when a loved one has a big accomplishment, and only adults have children. When asked whether a picture captioned "Someone watching his daughter take her first step" contains a smiling adult person, Cyc can logically infer that the answer is Yes, and "show its work" by presenting the step-by-step logical argument using those five pieces of knowledge from its knowledge base. These are formulated in the language CycL, which is based on predicate calculus and has a syntax similar to that of the Lisp programming language.

In 2008, Cyc resources were mapped to many Wikipedia articles.[13] Cyc is presently connected to Wikidata. Future plans may connect Cyc to both DBpedia and Freebase.

Much of the current work Cyc continues to be knowledge engineering, representing facts about the world by hand, and implementing efficient inference mechanisms on that knowledge. Increasingly, however, work at Cycorp involves giving the Cyc system the ability to communicate with end users in natural language, and to assist with the ongoing knowledge formation process via machine learning and natural-language understanding. Another large effort at Cycorp is building a suite of Cyc-powered ontological engineering tools to lower the bar to entry for individuals to contribute to, edit, browse, and query Cyc.

Like many companies, Cycorp has ambitions to use Cyc's natural-language processing to parse the entire internet to extract structured data; unlike all others, it is able to call on the Cyc system itself to act as an inductive bias and as an adjudicator of ambiguity, metaphor, and ellipsis. There are few, if any, systematic benchmark studies of Cyc's performance.

Knowledge base

The concept names in Cyc are CycL terms or constants. Constants start with an optional #$ and are case-sensitive. There are constants for:

Two important binary predicates are #$isa and #$genls. The first one describes that one item is an instance of some collection, the second one that one collection is a subcollection of another one. Facts about concepts are asserted using certain CycL sentences. Predicates are written before their arguments, in parentheses: (#$isa #$BillClinton #$UnitedStatesPresident)"Bill Clinton belongs to the collection of U.S. presidents." (#$genls #$Tree-ThePlant #$Plant)"All trees are plants." (#$capitalCity #$France #$Paris)"Paris is the capital of France."

Sentences can also contain variables, strings starting with ?. These sentences are called "rules". One important rule asserted about the #$isa predicate reads: (#$implies (#$and (#$isa ?OBJ ?SUBSET) (#$genls ?SUBSET ?SUPERSET)) (#$isa ?OBJ ?SUPERSET))"If OBJ is an instance of the collection SUBSET and SUBSET is a subcollection of SUPERSET, then OBJ is an instance of the collection SUPERSET". Another typical example is (#$relationAllExists #$biologicalMother #$ChordataPhylum #$FemaleAnimal)which means that for every instance of the collection #$ChordataPhylum (i.e. for every chordate), there exists a female animal (instance of #$FemaleAnimal), which is its mother (described by the predicate #$biologicalMother).

The knowledge base is divided into microtheories (Mt), collections of concepts and facts typically pertaining to one particular realm of knowledge. Unlike the knowledge base as a whole, each microtheory must be free from monotonic contradictions. Each microtheory is a first-class object in the Cyc ontology; it has a name that is a regular constant; microtheory constants contain the string Mt by convention. An example is #$MathMt, the microtheory containing mathematical knowledge. The microtheories can inherit from each other and are organized in a hierarchy: one specialization of #$MathMt is #$GeometryGMt, the microtheory about geometry.

Inference engine

An inference engine is a computer program that tries to derive answers from a knowledge base.The Cyc inference engine performs general logical deduction (including modus ponens, modus tollens, universal quantification and existential quantification).[14] It also performs inductive reasoning, statistical machine learning and symbolic machine learning, and abductive reasoning (but of course sparingly and using the existing knowledge base as a filter and guide).

Releases

OpenCyc

The first version of OpenCyc was released in spring 2002 and contained only 6,000 concepts and 60,000 facts. The knowledge base was released under the Apache License. Cycorp stated its intention to release OpenCyc under parallel, unrestricted licences to meet the needs of its users. The CycL and SubL interpreter (the program that allows users to browse and edit the database as well as to draw inferences) was released free of charge, but only as a binary, without source code. It was made available for Linux and Microsoft Windows. The open source Texai[15] project released the RDF-compatible content extracted from OpenCyc.[16] A version of OpenCyc, 4.0, was released in June 2012. OpenCyc 4.0 included much of the Cyc ontology at that time, containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other; however, these are mainly taxonomic assertions, not the complex rules available in Cyc. The OpenCyc 4.0 knowledge base contained 239,000 concepts and 2,093,000 facts.

The main point of releasing OpenCyc was to help AI researchers understand what was missing from what they now call ontologies and knowledge graphs. It is useful and important to have properly taxonomized concepts like person, night, sleep, lying down, waking, happy, etc., but what is missing from the OpenCyc content about those terms, but present in the Cyc KB content, are the various rules of thumb that most of us share about those terms: that (as a default, in the ModernWesternHumanCultureMt) each person sleeps at night, sleeps lying down, can be woken up, is not happy about being woken up, and so on. That point does not require continually-updated releases of OpenCyc, so, as of 2017, OpenCyc is no longer available.

ResearchCyc

In July 2006, Cycorp released the executable of ResearchCyc 1.0, a version of Cyc aimed at the research community, at no charge. (ResearchCyc was in beta stage of development during all of 2004; a beta version was released in February 2005.) In addition to the taxonomic information contained in OpenCyc, ResearchCyc includes significantly more semantic knowledge (i.e., additional facts and rules of thumb) involving the concepts in its knowledge base; it also includes a large lexicon, English parsing and generation tools, and Java-based interfaces for knowledge editing and querying. In addition it contains a system for ontology-based data integration. As of 2017, regular releases of ResearchCyc continued to appear, with 600 research groups utilizing licenses around the world at no cost for noncommercial research purposes. As of December 2019, ResearchCyc is no longer supported. Cycorp expects to improve and overhaul tools for external developers over the coming years.

Applications

There have been over a hundred successful applications of Cyc;[17] listed here are a few mutually dissimilar instances:

Pharmaceutical Term Thesaurus Manager/Integrator

For over a decade, Glaxo has used Cyc to semi-automatically integrate all the large (hundreds of thousands of terms) thesauri of pharmaceutical-industry terms that reflect differing usage across companies, countries, years, and sub-industries.[18] This ontology integration task requires domain knowledge, shallow semantic knowledge, but also arbitrarily deep common sense knowledge and reasoning. Pharma vocabulary varies across countries, (sub-) industries, companies, departments, and decades of time. E.g., what is a gel pak? What is the "street name" for ranitidine hydrochloride? Each of these n controlled vocabularies is an ontology with approximately 300k terms. Glaxo researchers need to issue a query in their current vocabulary, have it translated into a neutral “true meaning”, and then have that transformed in the opposite direction to find potential matches against documents each of which was written to comply with a particular known vocabulary.

They had been using a large staff to do that manually. Cyc is used as the universal interlingua capable of representing the union of all the terms' "true meanings", and capable of representing the 300k transformations between each of those controlled vocabularies and Cyc, thereby converting an problem into a linear one without introducing the usual sort of "telephone game" attenuation of meaning. Furthermore, creating each of those 300k mappings for each thesaurus is done in a largely automated fashion, by Cyc.

Terrorism Knowledge Base

See also: MIPT Terrorism Knowledge Base. The comprehensive Terrorism Knowledge Base was an application of Cyc in development that tried to ultimately contain all relevant knowledge about "terrorist" groups, their members, leaders, ideology, founders, sponsors, affiliations, facilities, locations, finances, capabilities, intentions, behaviors, tactics, and full descriptions of specific terrorist events. The knowledge is stored as statements in mathematical logic, suitable for computer understanding and reasoning.[19] [20]

Cleveland Clinic Foundation

The Cleveland Clinic has used Cyc to develop a natural-language query interface of biomedical information, spanning decades of information on cardiothoracic surgeries.[21] A query is parsed into a set of CycL (higher-order logic) fragments with open variables (e.g., "this question is talking about a person who developed an endocarditis infection", "this question is talking about a subset of Cleveland Clinic patients who underwent surgery there in 2009", etc.); then various constraints are applied (medical domain knowledge, common sense, discourse pragmatics, syntax) to see how those fragments could possibly fit together into one semantically meaningful formal query; significantly, in most cases, there is exactly one and only one such way of incorporating and integrating those fragments.[22] Integrating the fragments involves (i) deciding which open variables in which fragments actually represent the same variable, and (ii) for all the final variables, decide what order and scope of quantification that variable should have, and what type (universal or existential). That logical (CycL) query is then converted into a SPARQL query that is passed to the CCF SemanticDB that is its data lake.

MathCraft

One Cyc application aims to help students doing math at a 6th grade level, helping them much more deeply understand that subject matter.[23] It is based on the experience that we often have thought we understood something, but only really understood it after we had to explain or teach it to someone else. Unlike almost all other educational software, where the computer plays the role of the teacher, this application of Cyc, called MathCraft,[24] has Cyc play the role of a fellow student who is always slightly more confused than you, the user, are about the subject. The user's role is to observe the Cyc avatar and give it advice, correct its errors, mentor it, get it to see what it's doing wrong, etc. As the user gives good advice, Cyc allows the avatar to make fewer mistakes of that type, hence, from the user's point of view, it seems as though the user has just successfully taught it something. This is a variation of learning by teaching.

Criticisms

The Cyc project has been described as "one of the most controversial endeavors of the artificial intelligence history". Catherine Havasi, CEO of Luminoso, says that Cyc is the predecessor project to IBM's Watson.[25] Machine-learning scientist Pedro Domingos refers to the project as a "catastrophic failure" for several reasons, including the unending amount of data required to produce any viable results and the inability for Cyc to evolve on its own.[26]

Robin Hanson, a professor of economics at George Mason University, gives a more balanced analysis:

A similar sentiment was expressed by Marvin Minsky: "Unfortunately, the strategies most popular among AI researchers in the 1980s have come to a dead end," said Minsky. So-called "expert systems", which emulated human expertise within tightly defined subject areas like law and medicine, could match users' queries to relevant diagnoses, papers and abstracts, yet they could not learn concepts that most children know by the time they are 3 years old. "For each different kind of problem," said Minsky, "the construction of expert systems had to start all over again, because they didn't accumulate common-sense knowledge." Only one researcher has committed himself to the colossal task of building a comprehensive common-sense reasoning system, according to Minsky. Douglas Lenat, through his Cyc project, has directed the line-by-line entry of more than 1 million rules into a commonsense knowledge base.[27]

Gary Marcus, a professor of psychology and neural science at New York University and the cofounder of an AI company called Geometric Intelligence, says "it represents an approach that is very different from all the deep-learning stuff that has been in the news.”[28] This is consistent with Doug Lenat's position that "Sometimes the veneer of intelligence is not enough".[29]

Stephen Wolfram writes:

Marcus writes:

Every few years since it began publishing (1993), there is a new Wired Magazine article about Cyc,[30] [27] [31] some positive and some negative (including one issue[32] which contained one of each).

Notable employees

This is a list of some of the notable people who work or have worked on Cyc either while it was a project at MCC (where Cyc was first started) or Cycorp.

See also

Further reading

External links

Notes and References

  1. Lenat . Douglas B. . Brown . John Seely . 1984-08-01 . Why am and eurisko appear to work . Artificial Intelligence . 23 . 3 . 269–294 . 10.1.1.565.8830 . 10.1016/0004-3702(84)90016-X.
  2. Lenat . Douglas B. . Borning . Alan . McDonald . David . Taylor . Craig . Weyer . Steven . 1983 . Knoesphere: Building Expert Systems with Encyclopedic Knowledge . Proceedings of the Eighth International Joint Conference on Artificial Intelligence - Volume 1 . IJCAI'83 . 167–169.
  3. The World in a Box. Scientific American. 286. 1. 18–19. 10.1038/scientificamerican0102-18. 2002. Wood. Lamont. 2002SciAm.286a..18W.
  4. Lenat. Doug. Prakash. Mayank. Shepherd. Mary. January 1986. CYC: Using Common Sense Knowledge to Overcome Brittleness and Knowledge }} Bottlenecks ]. AI Magazine. 6. 4. 65–85. 0738-4602.
  5. Lenat. Douglas B.. Guha. R. V.. June 1991. The Evolution of CycL, the Cyc Representation Language. ACM SIGART Bulletin. 2. 3. 84–87. 10.1145/122296.122308. 10306053. 0163-5719.
  6. Lenat. Douglas B.. Guha. R. V.. Pittman. Karen. Pratt. Dexter. Shepherd. Mary. August 1990. Cyc: Toward Programs with Common Sense. Commun. ACM. 33. 8. 30–49. 10.1145/79173.79176. 7296269. 0001-0782.
  7. Book: Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project. Lenat. Douglas B.. Guha. R. V.. 1989. Addison-Wesley Longman Publishing Co., Inc.. 978-0201517521. 1st. Boston, MA, USA.
  8. Elkan. Charles. Greiner. Russell. 1993-05-01. Building large knowledge-based systems: Representation and inference in the cyc project: D.B. Lenat and R.V. Guha. Artificial Intelligence. 61. 1. 41–52. 10.1016/0004-3702(93)90092-P.
  9. Web site: A Representation Language Language. www.aaai.org. 2017-11-27.
  10. Russell. Greiner. October 1980. RLL-1: A Representation Language Language. https://web.archive.org/web/20150208123546/http://www.dtic.mil/docs/citations/ADA096510. live. February 8, 2015. en.
  11. Web site: Schedule - Knowledge Representation and Reasoning: Integrating Symbolic and Neural Approaches. sites.google.com. 2017-11-28.
  12. Web site: Hal's Legacy: 2001's Computer as Dream and Reality. From 2001 to 2001: Common Sense and the Mind of HAL. https://web.archive.org/web/20191209190521/https://www.cyc.com/wp-content/uploads/2019/07/img-615153705-0001-1.pdf . 2019-12-09 . live. Lenat. Douglas. Douglas Lenat. Cycorp, Inc.. 2006-09-26.
  13. Web site: Integrating Cyc and Wikipedia: Folksonomy meets rigorously defined common-sense. 2013-05-10.
  14. Web site: cyc Inference engine. 2015-06-04. 2019-12-09. https://web.archive.org/web/20191209201932/https://www.cyc.com/archives/service/cyc-inference-engines. dead.
  15. Web site: The open source Texai project . https://web.archive.org/web/20090216201139/http://texai.org/blog . 2009-02-16.
  16. Web site: Texai SourceForge project files.
  17. Web site: Cycorp Products. www.cyc.com. en-US. 2017-11-29.
  18. News: Birth of a Thinking Machine. HILTZIK. MICHAEL A.. 2001-06-21. Los Angeles Times. 2017-11-29. en-US. 0458-3035.
  19. The Comprehensive Terrorism Knowledge Base in Cyc. Chris Deaton. Blake Shepard. Charles Klein . Corrinne Mayans. Brett Summers. Antoine Brusseau. Michael Witbrock. Doug Lenat. Proceedings of the 2005 International Conference on Intelligence Analysis. 10.1.1.70.9247. 2005.
  20. AFRL-RI-RS-TR-2008-125 . April 2008 . TERRORISM KNOWLEDGE BASE (TKB) Final Technical Report. Douglas B. Lenat . Chris Deaton. Air Force Research Laboratory Information Directorate. .
  21. Web site: Case Study: A Semantic Web Content Repository for Clinical Research. www.w3.org. en-US. 2018-02-28.
  22. Lenat. Douglas. Witbrock. Michael. Baxter. David. Blackstone. Eugene. Deaton. Chris. Schneider. Dave. Scott. Jerry. Shepard. Blake. 2010-07-28. Harnessing Cyc to Answer Clinical Researchers' Ad Hoc Queries. AI Magazine. en. 31. 3. 13. 10.1609/aimag.v31i3.2299. 0738-4602. free.
  23. Lenat. Douglas B.. Durlach. Paula J.. 2014-09-01. Reinforcing Math Knowledge by Immersing Students in a Simulated Learning-By-Teaching Experience. International Journal of Artificial Intelligence in Education. en. 24. 3. 216–250. 10.1007/s40593-014-0016-x. 1560-4292. free.
  24. Web site: Mathcraft by Cycorp. www.mathcraft.ai. 2017-11-29.
  25. News: Who's Doing Common-Sense Reasoning And Why It Matters. Havasi. Catherine. TechCrunch. Aug 9, 2014 . 2017-11-29. en.
  26. Book: Domingos, Pedro . The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World . 978-0465065707 . 2015 . Basic Books . registration .
  27. AI Founder Blasts Modern Research. Baard. Mark. WIRED. 2017-11-29. en-US . May 13, 2003.
  28. News: An AI that spent 30 years learning some common sense is ready for work. Knight. Will. MIT Technology Review. Mar 14, 2016 . 2017-11-29. en.
  29. Web site: Sometimes the Veneer of Intelligence is Not Enough CogWorld. cognitiveworld.com. en. Doug Lenat . May 15, 2017. 2017-11-29.
  30. CYC-O. Goldsmith. Jeffrey. WIRED. Apr 1, 1994 . 2017-11-29. en-US.
  31. One Genius' Lonely Crusade to Teach a Computer Common Sense. WIRED. Cade Metz . March 25, 2016 . 2017-11-29. en-US.
  32. The Wired 25. Wired Staff. WIRED. Nov 1, 1998 . 2017-11-29. en-US.