Foundation model explained

A foundation model, also known as large AI model, is a machine learning or deep learning model that is trained on broad data such that it can be applied across a wide range of use cases.[1] Foundation models have transformed artificial intelligence (AI), powering prominent generative AI applications like ChatGPT. The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) created and popularized the term.[2]

Foundation models are general-purpose technologies that can support a diverse range of use cases. Building foundation models is often highly resource-intensive, with the most expensive models costing hundreds of millions of dollars to pay for the underlying data and compute required.[3] In contrast, adapting an existing foundation model for a specific use case or using it directly is much less expensive.

Early examples of foundation models are language models (LMs) like OpenAI's "GPT-n" series and Google's BERT[4] . Beyond text, foundation models have been developed across a range of modalitiesincluding DALL-E and Flamingo for images, MusicGen[5] for music, and RT-2[6] for robotic control. Foundation models constitute a broad shift in AI development: foundation models are being built for astronomy,[7] radiology,[8] genomics,[9] music,[10] coding,[11] times-series forecasting,[12] and mathematics.[13]

Definitions

The Stanford Institute for Human-Centered Artificial Intelligence's (HAI) Center for Research on Foundation Models (CRFM) coined the term "foundation model" in August 2021 to mean "any model that is trained on broad data (generally using self-supervision at scale) that can be adapted (e.g., fine-tuned) to a wide range of downstream tasks".[14] This was based on their observation that preexisting terms, while overlapping, were not adequate, stating that "'(large) language model' was too narrow given [the] focus is not only language; 'self-supervised model' was too specific to the training objective; and 'pretrained model' suggested that the noteworthy action all happened after 'pretraining."[15] The term "foundation model" was chosen over "foundational model"[16] because "foundational" implies that these models provide fundamental principles in a way that "foundation" does not.[17] After considering many terms, they settled on "foundation model" to emphasize the intended function (i.e., amenability to subsequent further development) rather than modality, architecture, or implementation.

As governments regulate foundation models, new legal definitions have emerged.

Overall, while many of these definitions stick close to the original Stanford definition, they do introduce some subtle distinctions. For example, the U.S. definitions are the sole definitions to make reference to the size of a foundation model, though they differ on an exact magnitude. Beyer and Eshoo's definition also specifies that foundation models must achieve a level of performance as to be a potential danger. In contrast, the E.U. definition includes mention of whether the model is designed for generality of output. Nonetheless, all definitions share that foundation models must be trained on a broad range of data with potential applications in many domains.

History

Technologically, foundation models are built using established machine learning techniques like deep neural networks, transfer learning, and self-supervised learning. Foundation models are noteworthy given the unprecedented resource investment, model and data size, and ultimately their scope of application when compared to previous forms of AI. The rise of foundation models constitutes a new paradigm in AI, where general-purpose models function as a reusable infrastructure, instead of bespoke and one-off task-specific models.

Foundation models draw upon a series of advances in the history of AI. These models can be situated against the backdrop of the broader rise of machine learning since the 1990s. Prior AI models depended on specific instructions to solve a given task, but machine learning-powered models were able to decipher what task to solve given sufficient data. Such a shift from so-called expert systems to data-driven machine learning was the first step towards the modern foundation model.

The next major step was the advent of deep learning circa 2010. With larger datasets and more advanced neural networks, AI models were able to achieve higher levels of performance. The first major instance of deep learning was exhibited by the model architecture AlexNet, which won the 2012 ImageNet Large Scale Visual Recognition Challenge. AlexNet exhibited strong performance on a large-scale general dataset, and first proved that deep learning was possible. Alongside the methodological shift to end-to-end optimization of deep neural networks, the 2010s was also marked by a software shift. In the mid 2010s, the rise of deep learning frameworks like Pytorch and Tensorflow provided crucial infrastructure for simplifying and scaling deep learning pipelines.

Foundation models began to materialize as the latest wave of deep learning models in the late 2010s with models like ELMo, GPT, BERT and GPT-2. Relative to most prior work on deep learning, these language models demonstrated the potential of training on much large web-sourced datasets using self-supervised objectives (e.g. predicting the next word in a large corpus of text). These approaches, which draw upon earlier works like word2vec and GloVe, deviated from prior supervised approaches that required annotated data (e.g. crowd-sources labels).

Overall, the computational advances in specialized hardware and parallelism (e.g., large clusters of NVIDIA GPUs), new developments in neural network architecture (e.g., the Transformer), and the increased use of training data with minimal supervision all contributed to the rise of foundation models. Some noteworthy foundation models include: GPT, BERT, GPT-2, T5, GPT-3, CLIP, DALL-E, Stable Diffusion, GPT-4, LLaMA, LLaMA 2, and Mistral. Each of these models came with its own unique abilities, particularly in their strong generative capabilities.

In particular, 2022 was particularly influential in the history of foundation models. The releases of Stable Diffusion and ChatGPT (initially powered by the GPT-3.5 model) led to foundation models and generative AI entering widespread public discourse. Further, releases of LLaMA, Llama 2, and Mistral in 2023 contributed to a greater emphasis placed on how foundation models are released with open foundation models garnering a lot of support[20] and scrutiny.[21]

Related concepts

Frontier models

Certain highly advanced foundation models are termed "frontier models," which have the potential to "possess dangerous capabilities sufficient to pose severe risks to public safety." These "dangerous capabilities" stem from the accidental or intentional misuse of such models, which in conjunction with their powerful nature can lead to severe harms. As foundation models continue to improve, some AI researchers speculate that almost all next-generation foundation models will be considered frontier models.

Since the concept of dangerous capabilities is inherently subjective, there is no strict designation for what foundation models qualify as frontier models. However, some generally held ideas for sufficiently dangerous capabilities include:

Due to frontier models' unique capabilities, it is difficult to effectively regulate their development and deployment. Because of their emergent nature, new dangerous capabilities can appear on their own in frontier models, both in the development stage and after being deployed. Additionally, since frontier models continue to adapt after deployment, it remains difficult to mitigate all harms that arise from already-deployed models. If a frontier model happens to be open-source or is released online, the model can also disseminate rapidly, further hampering regulators by creating a lack of accountability.

General-purpose AI

Due to their adaptability to a wide range of use-cases, foundation models are sometimes considered to be examples of general-purpose AI. In designing the EU AI Act, the European Parliament has stated that a new wave of general-purpose AI technologies shapes the overall AI ecosystem.[25] The fuller structure of the ecosystem, in addition to the properties of specific general-purpose AI systems, influences the design of AI policy and research. General-purpose AI systems also often appear in people's everyday lives through applications and tools like ChatGPT or DALL-E.

Government agencies like EU Parliament have identified regulation general-purpose AI, such as foundation models, to be a high priority. General-purpose AI systems are often characterized by large size, opacity, and potential for emergence, all of which can create unintended harms. Such systems also heavily influence downstream applications, which further exacerbates the need for regulation. In regards to prominent legislation, a number of stakeholders have pushed for the EU AI Act to include restrictions on general-purpose AI systems, all of which would also apply to foundation models.

Technical details

Modeling

For a foundation model to effectively generalize, it must acquire rich representations of the training data. As a result, expressive model architectures that efficiently process large-scale data are often preferred in building foundation models. Currently, the Transformer architecture is the de facto choice for building foundation models across a range of modalities.

Training

Foundation models are built by optimizing a training objective(s), which is a mathematical function that determines how model parameters are updated based on model predictions on training data.[26] Language models are often trained with a next-tokens prediction objective, which refers to the extent at which the model is able to predict the next token in a sequence. Image models are commonly trained with contrastive learning or diffusion training objectives. For contrastive learning, images are randomly augmented before being evaluated on the resulting similarity of the model's representations. For diffusion models, images are noised and the model learns to gradually de-noise via the objective. Multimodal training objectives also exist, with some separating images and text during training, while others examine them concurrently. In general, the training objectives for foundation models promote the learning of broadly useful representations of data.

With the rise of foundation models and the larger datasets that power them, a training objective must be able to parse through internet-scale data for meaningful data points. Additionally, since foundation models are designed to solve a general range of tasks, training objectives ought to be domain complete, or able to solve a broad set of downstream capabilities within the given domain. Lastly, foundation model training objectives should seek to scale well and be computationally efficient. With model size and compute power both being relevant constraints, a training objective must be able to overcome such bottlenecks.

Data

Foundation models are trained on a large quantity of data, working under the maxim "the more data, the better." Performance evaluation does show that more data generally leads to better performance, but other issues arise as data quantity grows. Tasks like managing the dataset, integrating data across new applications, ensuring adherence to data licenses, and maintaining data quality all become more difficult as data size grows. The specific demands of foundation models have only exacerbated such issues, as it remains the norm for large foundation models to use public web-scraped data. Foundation models include also search engines data and SEO meta tags data. Public web data remains a plentiful resource, but it also demands stringent moderation and data processing from foundation model developers before it can be successfully integrated into the training pipeline.[27]

Training foundation models often runs the risk of violating user privacy, as private data can be disclosed, collected, or used in ways beyond the stated scope. Even if no private data is leaked, models can still inadvertently compromise security through learned behavior in the resulting foundation model.[28] Data quality is another key point, as web-scraped data frequently contains biased, duplicate, and toxic material. Once foundation models are deployed, ensuring high-quality data is still an issue, as undesirable behavior can still emerge from small subsets of data.

Systems

The size of foundation models also brings about issues with the computer systems they run on. The average foundation model is too large to be run within a single accelerator's memory and the initial training process requires an expensive amount of resources. Such issues are predicted to further exacerbate in future as foundation models grow to new heights. Due to this constraint, researchers have begun looking into compressing model size through tight model inference.

GPUs are the most common choice of compute hardware for machine learning, due to high memory storage and strong power. Typical foundation model training requires many GPUs, all connected in parallel with fast interconnects. Acquiring a sufficient amount of GPUs of requisite compute efficiency is a challenge for many foundation model developers, one that has led to an increasing dilemma in the field. Larger models require greater compute power, but often at the cost of improved compute efficiency. Since training remains time-consuming and expensive, the tradeoff between compute power and compute efficiency has led only a few select companies to afford the production costs for large, state of the art foundation models. Some techniques like compression and distillation can make inference more affordable, but they fail to completely shore up this weakness.

Scaling

The accuracy and capabilities of foundation models often scale predictably with the size of the model and the amount of the training data. Specifically, scaling laws have been discovered, which are data-based empirical trends that relate resources (data, model size, compute usage) to model capabilities. Particularly, a model's scale is defined by compute, dataset size, and the number of parameters, all of which exhibit a power-law relationship with end performance.

However, broken scaling laws[29] have been discovered in which this relationship smoothly transitions (at points referred to as break(s)) from a power law with one exponent to a power law with another (different) exponent. When one does not collect any points near (or after) the break(s), it can be difficult to obtain an accurate extrapolation.

Adaptation

Foundation models are inherently multi-purpose: to use these model for a specific use case requires some form of adaptation. At a minimum, models need to be adapted to perform the task of interest (task specification), but often better performance can be achieved by more extensive adaptation to the domain of interest (domain specialization).

A variety of methods (e.g. prompting, in-context learning, fine-tuning, LoRA) provide different tradeoffs between the costs of adaptation and the extent to which models are specialized. Some major facets to consider when adapting a foundation model are compute budget and data availability. Foundation models can be very large, up to trillions of parameters in size, so adapting the entirety of a foundation model can be computationally expensive. Therefore, developers sometimes adapt only the last neural layer or only the bias vectors to save time and space. For particularly niche applications, specific data may also not be available to adapt the foundation model sufficiently. In such circumstances, data must be manually labeled, which is costly and can demand expert knowledge.

Evaluation

Evaluation is a key part of developing foundation models. Not only does evaluation allow for tracking progress of high-performance models, it also creates benchmarks for future model development. Stakeholders rely on evaluations to understand model behaviors and gain insight into their various attributes. Traditionally, foundation models are evaluated relative to each other through standardized task benchmarks like MMLU,[30] MMMU, HumanEval,[31] and GSM8K.[32] Given that foundation models are multi-purpose, increasingly meta-benchmarks are developed that aggregate different underlying benchmarks. Examples include LM-Harness, BIG-Bench, HELM,[33] OpenLLM Leaderboard,[34] DecodingTrust,[35] and HEIM.[36]

Since foundation models' utility depends on their own general capabilities and the performance of fine-tuned applications, evaluation must cover both metrics. Proper evaluation examines both a foundation model's downstream applications in aggregate and the direct properties the foundation model holds. To ensure further equity in evaluation, certain existing evaluation frameworks account for all adaptation resources, which leads to more informed analyses for the benefit of all stakeholders.[37]

Supply chain

Foundation models' general capabilities allow them to fulfill a unique role in the AI ecosystem,[38] fueled by many upstream and downstream technologies. Training a foundation model requires several resources (e.g. data, compute, labor, hardware, code), with foundation models often involving immense amounts of data and compute (also referred to as computational power). Due to foundation models' large development costs and inexpensive adaptation requirements, the AI landscape has shifted to a small subset of AI companies making foundation models for downstream adaptation. Thus, most foundation model companies outsource this step to specialized data providers (e.g. Scale AI,[39] Surge[40]) and compute providers (e.g. Amazon Web Services, Google Cloud, Microsoft Azure).The foundation model developer itself will then take the data and use the supplied compute to actually train the foundation model. After the foundation model is completely built, much of the data and labor requirements abate. In this development process, hardware and compute are the most necessary, and also the most exclusive resources. To train larger and more complex AI, a sufficient amount of compute is key. However, compute is consolidated in the hands of a few, select entities, which most foundation model developers depend on. As such, the foundation model pipeline is concentrated heavily around these providers. Compute is also costly; in 2023, AI companies spent more than 80% of total capital on compute resources.[41]

Foundation models require a large amount of general data to power their capabilities. Early foundation models scraped from subsets of the internet to provide this data information. As the size and scope of foundation models grows, larger quantities of internet scraping becomes necessary, resulting in higher likelihoods of biased or toxic data. This toxic or biased data can disproportionately harm marginalized groups and exacerbate existing prejudices.[42]

To address this issue of low-quality data that arose with unsupervised training, some foundation model developers have turned to manual filtering. This practice, known as data labor, comes with its own host of issues.[43] Such manual data detoxification is often outsourced to reduce labor costs, with some workers making less than $2 per hour.[44]

The foundation model will then be hosted online either via the developer or via an external organization. Once released, other parties can create applications based on the foundation model, whether through fine-tuning or wholly new purposes. People can then access these applications to serve their various means, allowing one foundation model to power and reach a wide audience.

Release strategies

After a foundation model is built, it can be released in one of many ways. There are many facets to a release: the asset itself, who has access, how access changes over time, and the conditions on use.[45] All these factors contribute to how a foundation model will affect downstream applications. In particular, the two most common forms of foundation model release are through APIs and direct model downloads.

When a model is released via an API, users can query the model and receive responses, but cannot directly access the model itself. Comparatively, the model could be directly downloadable for users to access and modify. Both release strategies are often classified as an open release. The exact definition of an open release is disputed, but widely accepted requirements are provided by the Open Source Initiative.

Some open foundation models are: PaLM 2, Llama 2, Granite, and Mistral. While open foundation models can further research and development more easily, they are also more susceptible to misuse. Open foundation models can be downloaded by anyone, and particularly powerful models can be fine-tuned to intentionally or unintentionally cause harm.

During a closed release, the foundation model cannot be accessed by the public, but is used internally by an organization. Such releases are considered safer, but offer no additional value to the research community or the public at large.

Some foundation models like Google DeepMind's Flamingo are fully closed, meaning they are available only to the model developer; others, such as OpenAI's GPT-4, are limited access, available to the public but only as a black box; and still others, such as Meta's Llama 2 are open, with broadly available model weights enabling downstream modification and scrutiny.

Notes and References

  1. Competition and Markets Authority (2023). AI Foundation Models: Initial Report. Available at: https://assets.publishing.service.gov.uk/media/65081d3aa41cc300145612c0/Full_report_.pdf
  2. Web site: Introducing the Center for Research on Foundation Models (CRFM) . 11 June 2022 . Stanford HAI. 18 August 2021 .
  3. Nestor Maslej, Loredana Fattorini, Erik Brynjolfsson, John Etchemendy, Katrina Ligett, Terah Lyons, James Manyika, Helen Ngo, Juan Carlos Niebles, Vanessa Parli, Yoav Shoham, Russell Wald, Jack Clark, and Raymond Perrault, "The AI Index 2023 Annual Report," AI Index Steering Committee, Institute for Human-Centered AI, Stanford University, Stanford, CA, April 2023.
  4. 2002.12327 . cs.CL . Anna . Rogers . Olga . Kovaleva . A Primer in BERTology: What we know about how BERT works . Anna . Rumshisky . 2020.
  5. Copet . Jade . Simple and Controllable Music Generation . 2023-11-07 . 2306.05284 . Kreuk . Felix . Gat . Itai . Remez . Tal . Kant . David . Synnaeve . Gabriel . Adi . Yossi . Défossez . Alexandre. cs.SD .
  6. Web site: 2023-07-28 . Speaking robot: Our new AI model translates vision and language into robotic actions . 2023-12-11 . Google . en-us.
  7. Nguyen . Tuan Dung . AstroLLaMA: Towards Specialized Foundation Models in Astronomy . 2023-09-12 . 2309.06126 . Ting . Yuan-Sen . Ciucă . Ioana . O'Neill . Charlie . Sun . Ze-Chang . Jabłońska . Maja . Kruk . Sandor . Perkowski . Ernest . Miller . Jack. astro-ph.IM .
  8. Tu . Tao . Towards Generalist Biomedical AI . 2023-07-26 . 2307.14334 . Azizi . Shekoofeh . Driess . Danny . Schaekermann . Mike . Amin . Mohamed . Chang . Pi-Chuan . Carroll . Andrew . Lau . Chuck . Tanno . Ryutaro. cs.CL .
  9. Zvyagin . Maxim . GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics . 2022-10-11 . en . 10.1101/2022.10.10.511571 . Brace . Alexander . Hippe . Kyle . Deng . Yuntian . Zhang . Bin . Bohorquez . Cindy Orozco . Clyde . Austin . Kale . Bharat . Perez-Rivera . Danilo.
  10. Web site: Engineering . Spotify . 2023-10-13 . LLark: A Multimodal Foundation Model for Music . 2023-12-11 . Spotify Research . en-US.
  11. Li . Raymond . StarCoder: may the source be with you! . 2023-05-09 . 2305.06161 . Allal . Loubna Ben . Zi . Yangtian . Muennighoff . Niklas . Kocetkov . Denis . Mou . Chenghao . Marone . Marc . Akiki . Christopher . Li . Jia. cs.CL .
  12. Web site: Se . Ksenia . Spektor . Ian . April 5, 2024 . Revolutionizing Time Series Forecasting: Interview with TimeGPT's creators . 2024-04-11 . Turing Post . en.
  13. Azerbayev . Zhangir . Llemma: An Open Language Model For Mathematics . 2023-11-30 . 2310.10631 . Schoelkopf . Hailey . Paster . Keiran . Santos . Marco Dos . McAleer . Stephen . Jiang . Albert Q. . Deng . Jia . Biderman . Stella . Welleck . Sean. cs.CL .
  14. On the Opportunities and Risks of Foundation Models . Bommasani . Rishi . Hudson . Drew A. . 18 August 2021 . 2108.07258 . Adeli . Ehsan . Altman . Russ . Arora . Simran . von Arx . Sydney . Bernstein . Michael S. . Bohg . Jeannette . Bosselut . Antoine . 1 . Brunskill . Emma . Brynjolfsson . Erik . Buch . Shyamal . Card . Dallas . Castellon . Rodrigo . Chatterji . Niladri . Chen . Annie . Creel . Kathleen . Davis . Jared Quincy . Demszky . Dora . Donahue . Chris . Doumbouya . Moussa . Durmus . Esin . Ermon . Stefano . Etchemendy . John . Ethayarajh . Kawin . Fei-Fei . Li . Finn . Chelsea . Gale . Trevor . Gillespie . Lauren . Goel . Karan . Goodman . Noah . Grossman . Shelby . Guha . Neel . Hashimoto . Tatsunori . Henderson . Peter . Hewitt . John . Ho . Daniel E. . Hong . Jenny . Hsu . Kyle . Huang . Jing . Icard . Thomas . Jain . Saahil . Jurafsky . Dan . Kalluri . Pratyusha . Karamcheti . Siddharth . Keeling . Geoff . Khani . Fereshte . Khattab . Omar . Koh . Pang Wei . Krass . Mark . Krishna . Ranjay . Kuditipudi . Rohith . Kumar . Ananya . Ladhak . Faisal . Lee . Mina . Lee . Tony . Leskovec . Jure . Levent . Isabelle . Li . Xiang Lisa . Li . Xuechen . Ma . Tengyu . Malik . Ali . Manning . Christopher D. . Mirchandani . Suvir . Mitchell . Eric . Munyikwa . Zanele . Nair . Suraj . Narayan . Avanika . Narayanan . Deepak . Newman . Ben . Nie . Allen . Niebles . Juan Carlos . Nilforoshan . Hamed . Nyarko . Julian . Ogut . Giray . Orr . Laurel . Papadimitriou . Isabel . Park . Joon Sung . Piech . Chris . Portelance . Eva . Potts . Christopher . Raghunathan . Aditi . Reich . Rob . Ren . Hongyu . Rong . Frieda . Roohani . Yusuf . Ruiz . Camilo . Ryan . Jack . Ré . Christopher . Sadigh . Dorsa . Sagawa . Shiori . Santhanam . Keshav . Shih . Andy . Srinivasan . Krishnan . Tamkin . Alex . Taori . Rohan . Thomas . Armin W. . Tramèr . Florian . Wang . Rose E. . Wang . William . Wu . Bohan . Wu . Jiajun . Wu . Yuhuai . Xie . Sang Michael . Yasunaga . Michihiro . You . Jiaxuan . Zaharia . Matei . Zhang . Michael . Zhang . Tianyi . Zhang . Xikun . Zhang . Yuhui . Zheng . Lucia . Zhou . Kaitlyn . Liang . Percy.
  15. Web site: Reflections on Foundation Models . Stanford HAI . 18 October 2021 . 22 May 2023.
  16. Web site: Bommasani . Rishi . Liang . Percy . 2021-10-18 . Reflections on Foundation Models . 2023-12-11 . Stanford CRFM.
  17. Web site: Marcus . Gary . 2021-09-11 . Has AI found a new Foundation? . 2023-12-11 . The Gradient . en.
  18. Web site: House . The White . 2023-10-30 . Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence . 2024-02-12 . The White House . en-US.
  19. Web site: AI Foundation Model Transparency Act .
  20. Web site: 2023-10-31 . Joint Statement on AI Safety and Openness . 2024-02-12 . Mozilla . en.
  21. Web site: 2023-06-06 . Hawley and Blumenthal Demand Answers from Meta, Warn of Misuse After 'Leak' of Meta's AI Model . 2024-02-12 . Senator Josh Hawley . en.
  22. Singhal . Karan . Azizi . Shekoofeh . Tu . Tao . Mahdavi . S. Sara . Wei . Jason . Chung . Hyung Won . Scales . Nathan . Tanwani . Ajay . Cole-Lewis . Heather . Pfohl . Stephen . Payne . Perry . Seneviratne . Martin . Gamble . Paul . Kelly . Chris . Babiker . Abubakr . August 2023 . Large language models encode clinical knowledge . Nature . en . 620 . 7972 . 172–180 . 10.1038/s41586-023-06291-2 . 37438534 . 10396962 . 1476-4687. 2212.13138 . 2023Natur.620..172S .
  23. Simshaw . Drew . April 22, 2022 . Access to A.I. Justice: Avoiding an Inequitable Two-Tiered System of Legal Services . SSRN Electronic Journal.
  24. Arbel . Yonathan A. . Becher . Shmuel I. . 2020 . Contracts in the Age of Smart Readers . Geo. Wash. L. Rev. . 90 . 83 . 10.2139/ssrn.3740356 . 229386991 .
  25. Web site: General-purpose artificial intelligence Think Tank European Parliament . 2024-02-12 . www.europarl.europa.eu . en.
  26. Claude Elwood . Shannon . July 1948 . A Mathematical Theory of Communication . Bell System Technical Journal.
  27. Book: Jo . Eun Seo . Gebru . Timnit . Lessons from archives: Strategies for collecting sociocultural data in machine learning . 2020-01-27 . Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency . 306–316 . 10.1145/3351095.3372829. 1912.10389 . 978-1-4503-6936-7 .
  28. Book: Bender . Emily M. . Gebru . Timnit . McMillan-Major . Angelina . Shmitchell . Shmargaret . On the Dangers of Stochastic Parrots: Can Language Models be Too Big? . 2021-03-01 . Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency . https://dl.acm.org/doi/10.1145/3442188.3445922 . FAccT '21 . New York, NY, USA . Association for Computing Machinery . 610–623 . 10.1145/3442188.3445922 . 978-1-4503-8309-7.
  29. Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.
  30. Web site: Papers with Code - MMLU Benchmark (Multi-task Language Understanding) . 2024-04-21 . paperswithcode.com . en.
  31. Web site: Papers with Code - HumanEval Benchmark (Code Generation) . 2024-04-21 . paperswithcode.com . en.
  32. Web site: Papers with Code - GSM8K Benchmark (Arithmetic Reasoning) . 2024-04-21 . paperswithcode.com . en.
  33. Web site: Holistic Evaluation of Language Models (HELM) . 2024-04-21 . crfm.stanford.edu.
  34. Web site: 2023-11-09 . open-llm-leaderboard (Open LLM Leaderboard) . 2024-04-21 . huggingface.co.
  35. Web site: DecodingTrust Benchmark . 2024-04-21 . decodingtrust.github.io.
  36. Web site: Holistic Evaluation of Image Models (HEIM) . 2024-04-21 . crfm.stanford.edu.
  37. Linzen . Tal . July 2020 . Jurafsky . Dan . Chai . Joyce . Schluter . Natalie . Tetreault . Joel . How Can We Accelerate Progress Towards Human-like Linguistic Generalization? . Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Online . Association for Computational Linguistics . 5210–5217 . 10.18653/v1/2020.acl-main.465. 2005.00955 .
  38. Web site: Ecosystem Graphs for Foundation Models . 2024-02-13 . crfm.stanford.edu.
  39. Web site: Accelerate the Development of AI Applications Scale AI . 2024-04-21 . scale.com . en.
  40. Web site: Surge AI World's Most Powerful Data Labeling Platform . 2024-04-21 . www.surgehq.ai . en.
  41. Web site: pnp . 2023-09-27 . Computational Power and AI . 2024-02-13 . AI Now Institute . en-US.
  42. News: Tiku . Nitasha . Schaul . Kevin . Chen . Szu Yu . These fake images reveal how AI amplifies our worst stereotypes . 2024-02-13 . Washington Post . en.
  43. Web site: How the AI industry profits from catastrophe . 2024-02-13 . MIT Technology Review . en.
  44. 2023-01-18 . Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer . 2024-02-13 . TIME . en.
  45. Web site: Liang . Percy . Bommasani . Rishi . Creel . Kathleen . May 17, 2022 . The Time is Now to Develop Community Norms for the Release of Foundation Models . Stanford CRFM.