Stochastic parrot explained

In machine learning, the term stochastic parrot is a metaphor to describe the theory that large language models, though able to generate plausible language, do not understand the meaning of the language they process. The term was coined by Emily M. Bender[1] [2] in the 2021 artificial intelligence research paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell.[3]

Origin and definition

The term was first used in the paper "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? " by Bender, Timnit Gebru, Angelina McMillan-Major, and Margaret Mitchell (using the pseudonym "Shmargaret Shmitchell"). They argued that large language models (LLMs) present dangers such as environmental and financial costs, inscrutability leading to unknown dangerous biases, and potential for deception, and that they can't understand the concepts underlying what they learn.[4] Gebru was asked to retract the paper or remove the names of Google employees from it. According to Jeff Dean, the paper "didn't meet our bar for publication". In response, Gebru listed conditions to be met, stating that otherwise they could "work on a last date". Dean wrote that one of these condition was for Google to disclose the reviewers of the paper and their specific feedback, which Google declined. Shortly after, she received an email saying that Google was "accepting her resignation". Her firing sparked a protest by Google employees, who believed the intent was to censor Gebru's criticism.[5]

The word "stochastic" derives from the ancient Greek word "stokhastikos" meaning "based on guesswork", or "randomly determined".[6] The word "parrot" refers to the idea that LLMs merely repeat words without understanding their meaning.

In their paper, Bender et al. argue that LLMs are probabilistically linking words and sentences together without considering meaning. Therefore, they are labeled to be mere "stochastic parrots".

According to the machine learning professionals Lindholm, Wahlstrom, Lindsten, and Schon, the analogy highlights two vital limitations:[7]

Lindholm et al. noted that, with poor quality datasets and other limitations, a learning machine might produce results that are "dangerously wrong".

Subsequent usage

In July of 2021, the Alan Turing Institute hosted a keynote and panel discussion on the paper., the paper has been cited in 1,529 publications.[8] The term has been used in publications in the fields of law,[9] grammar,[10] narrative,[11] and humanities.[12] The authors continue to maintain their concerns about the dangers of chatbots based on large language models, such as GPT-4.[13]

Stochastic parrot is now a neologism used by AI skeptics to refer to machines' lack of understanding of the meaning of their outputs and is sometimes interpreted as a "slur against AI". Its use expanded further when Sam Altman, CEO of Open AI, used the term ironically when he tweeted, "i am a stochastic parrot and so r u." The term was then designated to be the 2023 AI-related Word of the Year for the American Dialect Society, even over the words "ChatGPT" and "LLM".[14]

The phrase is often referenced by some researchers to describe LLMs as pattern matchers that can generate plausible human-like text through their vast amount of training data, merely parroting in a stochastic fashion. However, other researchers argue that LLMs are, in fact, able to understand language.[15]

Debate

Some LLMs, such as ChatGPT, have become capable of interacting with users in convincingly human-like conversations. The development of these new systems has deepened the discussion of the extent to which LLMs understand or are simply "parroting".

Subjective experience

In the mind of a human being, words and language correspond to things one has experienced.[16] For LLMs, words may correspond only to other words and patterns of usage fed into their training data.[17] [18] Proponents of the idea of stochastic parrots thus conclude that LLMs are incapable of actually understanding language.

Hallucinations and mistakes

The tendency of LLMs to pass off fake information as fact is held as support. Called hallucinations, LLMs will occasionally synthesize information that matches some pattern, but not reality. That LLMs can’t distinguish fact and fiction leads to the claim that they can’t connect words to a comprehension of the world, as language should do. Further, LLMs often fail to decipher complex or ambiguous grammar cases that rely on understanding the meaning of language. As an example, borrowing from Saba et al., is the prompt:

LLMs respond to this in the affirmative, not understanding that the meaning of "newspaper" is different in these two contexts; it is first an object and second an institution. Based on these failures, some AI professionals conclude they are no more than stochastic parrots.

Benchmarks and experiments

One argument against the hypothesis that LLMs are stochastic parrot is their results on benchmarks for reasoning, common sense and language understanding. In 2023, some LLMs have shown good results on many language understanding tests, such as the Super General Language Understanding Evaluation (SuperGLUE).[19] Such tests, and the smoothness of many LLM responses, help as many as 51% of AI professionals believe they can truly understand language with enough data, according to a 2022 survey.

When experimenting on ChatGPT-3, one scientist argued that the model was not a stochastic parrot, but had serious reasoning limitations. He found that the model was coherent and informative when attempting to predict future events based on the information in the prompt. ChatGPT-3 was frequently able to parse subtextual information from text prompts as well. However, the model frequently failed when tasked with logic and reasoning, especially when these prompts involved spatial awareness. The model’s varying quality of responses indicates that LLMs may have a form of "understanding" in certain categories of tasks while acting as a stochastic parrot in others.

Interpretability

Another technique for investigating if LLMs can understand is termed "mechanistic interpretability". The idea is to reverse-engineer a large language model to analyze how it internally processes the information.

One example is Othello-GPT, where a small transformer was trained to predict legal Othello moves. It has been found that this model has an internal representation of the Othello board, and that modifying this representation changes the predicted legal Othello moves in the correct way. This supports the idea that LLMs have a "world model", and are not just doing superficial statistics.[20]

In another example, a small transformer was trained on computer programs written in the programming language Karel. Similar to the Othello-GPT example, this model developed an internal representation of Karel program semantics. Modifying this representation results in appropriate changes to the output. Additionally, the model generates correct programs that are, on average, shorter than those in the training set.

Researchers also studied "grokking", a phenomenon where an AI model initially memorizes the training data outputs, and then, after further training, suddenly finds a solution that generalizes to unseen data.[21]

Shortcuts to reasoning

However, when tests created to test people for language comprehension are used to test LLMs, they sometimes result in false positives caused by spurious correlations within text data. Models have shown examples of shortcut learning, which is when a system makes unrelated correlations within data instead of using human-like understanding.[22] One such experiment conducted in 2019 tested Google’s BERT LLM using the argument reasoning comprehension task. BERT was prompted to choose between 2 statements, and find the one most consistent with an argument. Below is an example of one of these prompts:

Researchers found that specific words such as "not" hint the model towards the correct answer, allowing near-perfect scores when included but resulting in random selection when hint words were removed. This problem, and the known difficulties defining intelligence, causes some to argue all benchmarks that find understanding in LLMs are flawed, that they all allow shortcuts to fake understanding.

See also

References

Works cited

Further reading

External links

Notes and References

  1. News: April 20, 2023 . Muhammad Saad . Uddin . Stochastic Parrots: A Novel Look at Large Language Models and Their Limitations . Towards AI . 2023-05-12 . en-US.
  2. Weil . Elizabeth . March 1, 2023 . You Are Not a Parrot . . 2023-05-12.
  3. Book: Bender . Emily M. . Gebru . Timnit . McMillan-Major . Angelina . Shmitchell . Shmargaret . Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency . On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? . 2021-03-01 . FAccT '21 . New York, NY, USA . Association for Computing Machinery . 610–623 . 10.1145/3442188.3445922 . 978-1-4503-8309-7. 232040593 . free .
  4. Web site: Haoarchive . Karen . 4 December 2020 . We read the paper that forced Timnit Gebru out of Google. Here's what it says. . live . https://web.archive.org/web/20211006233625/https://www.technologyreview.com/2020/12/04/1013294/google-ai-ethics-research-paper-forced-out-timnit-gebru/ . 6 October 2021 . 19 January 2022 . . en.
  5. Web site: Lyons . Kim . 5 December 2020 . Timnit Gebru's actual paper may explain why Google ejected her . The Verge.
  6. News: Zimmer . Ben . 'Stochastic Parrot': A Name for AI That Sounds a Bit Less Intelligent . 2024-04-01 . WSJ . en-US.
  7. News: Uddin . Muhammad Saad . April 20, 2023 . Stochastic Parrots: A Novel Look at Large Language Models and Their Limitations . 2023-05-12 . Towards AI . en-US.
  8. Web site: Bender: On the Dangers of Stochastic Parrots . . 2023-05-12.
  9. Arnaudo . Luca . Artificial Intelligence, Capabilities, Liabilities: Interactions in the Shadows of Regulation, Antitrust – And Family Law . April 20, 2023 . SSRN . 10.2139/ssrn.4424363. 258636427 .
  10. In the Cage with the Stochastic Parrot . Pete . Bleackley . BLOOM . 2023 . Speculative Grammarian . CXCII . 3 . 2023-05-13.
  11. Gáti . Daniella . Theorizing Mathematical Narrative through Machine Learning. . . 53 . 1 . 2023 . 139–165 . Project MUSE . 10.1353/jnt.2023.0003. 257207529 .
  12. Rees . Tobias . Non-Human Words: On GPT-3 as a Philosophical Laboratory . . 151 . 2 . 2022 . 168–82 . 10.1162/daed_a_01908 . 48662034. 248377889 . free .
  13. Web site: Sharon . Goldman . March 20, 2023 . With GPT-4, dangers of 'Stochastic Parrots' remain, say researchers. No wonder OpenAI CEO is a 'bit scared' . 2023-05-09 . VentureBeat . en-US.
  14. News: Corbin . Sam . 2024-01-15 . Among Linguists, the Word of the Year Is More of a Vibe . 2024-04-01 . The New York Times . en-US . 0362-4331.
  15. Arkoudas . Konstantine . 2023-08-21 . ChatGPT is no Stochastic Parrot. But it also Claims that 1 is Greater than 1 . Philosophy & Technology . en . 36 . 3 . 54 . 10.1007/s13347-023-00619-6 . 2210-5441.
  16. Fayyad . Usama M. . 2023-05-26 . From Stochastic Parrots to Intelligent Assistants—The Secrets of Data and Human Interventions . IEEE Intelligent Systems . 38 . 3 . 63–67 . 10.1109/MIS.2023.3268723 . 1541-1672.
  17. Book: Saba, Walid S. . Stochastic LLMS do not Understand Language: Towards Symbolic, Explainable and Ontologically Based LLMS . Lecture Notes in Computer Science . 2023 . 14320 . Almeida . João Paulo A. . Borbinha . José . Guizzardi . Giancarlo . Link . Sebastian . Zdravkovic . Jelena . Conceptual Modeling . en . Cham . Springer Nature Switzerland . 3–19 . 10.1007/978-3-031-47262-6_1 . 2309.05918 . 978-3-031-47262-6.
  18. Mitchell . Melanie . Krakauer . David C. . 2023-03-28 . The debate over understanding in AI's large language models . Proceedings of the National Academy of Sciences . en . 120 . 13 . e2215907120 . 10.1073/pnas.2215907120 . 0027-8424 . 10068812 . 36943882. 2210.13966 . 2023PNAS..12015907M .
  19. Wang . Alex . Pruksachatkun . Yada . Nangia . Nikita . Singh . Amanpreet . Michael . Julian . Hill . Felix . Levy . Omer . Bowman . Samuel R. . 2019-05-02 . SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems . cs.CL . en . 1905.00537.
  20. Web site: Li . Kenneth . 2023-01-21 . Large Language Model: world models or surface statistics? . 2024-04-04 . The Gradient . en.
  21. Web site: Schreiner . Maximilian . 2023-08-11 . Grokking in machine learning: When Stochastic Parrots build models . 2024-05-25 . the decoder . en-US.
  22. Geirhos . Robert . Jacobsen . Jörn-Henrik . Michaelis . Claudio . Zemel . Richard . Brendel . Wieland . Bethge . Matthias . Wichmann . Felix A. . 2020-11-10 . Shortcut learning in deep neural networks . Nature Machine Intelligence . en . 2 . 11 . 665–673 . 10.1038/s42256-020-00257-z . 2004.07780 . 2522-5839.