Attention Is All You Need Explained

"Attention Is All You Need" is a 2017 landmark[1] [2] research paper in machine learning authored by eight scientists working at Google. The paper introduced a new deep learning architecture known as the transformer, based on the attention mechanism proposed in 2014 by Bahdanau et al. It is considered a foundational[3] paper in modern artificial intelligence, as the transformer approach has become the main architecture of large language models like those based on GPT. At the time, the focus of the research was on improving Seq2seq techniques for machine translation, but the authors go further in the paper, foreseeing the technique's potential for other tasks like question answering and multimodal Generative AI.[4]

The paper's title is a reference to the song "All You Need Is Love" by the Beatles.[5]

An early design document was titled "Transformers: Iterative Self-Attention and Processing for Various Tasks", and included an illustration of six characters from the Transformers animated show. The team was named Team Transformer.

the paper has been cited more than 100,000 times.[6]

For their 100M-parameter Transformer model, they suggested learning rate should be linearly scaled up from 0 to maximal value for the first part of the training (i.e. 2% of the total number of training steps), and to use dropout, to stabilize training.

Authors

The authors of the paper are: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, and Illia Polosukhin. All eight authors were "equal contributors" to the paper; the listed order was randomized. The Wired article highlights the group's diversity:[5]

Six of the eight authors were born outside the United States; the other two are children of two green-card-carrying Germans who were temporarily in California and a first-generation American whose family had fled persecution, respectively.

By 2023, all eight authors had left Google and founded their own AI start-ups (except Łukasz Kaiser, who joined OpenAI).[5] [6]

External links

Notes and References

  1. Web site: AI Researcher Who Helped Write Landmark Paper Is Leaving Google . Love . Julia . 2023-07-10 . . 2024-04-01.
  2. Web site: 'Attention is All You Need' creators look beyond Transformers for AI at Nvidia GTC: 'The world needs something better' . Goldman . Sharon . 2024-03-20 . . 2024-04-01.
  3. Book: Shinde . Gitanjali . Wasatkar . Namrata . Mahalle . Parikshit . 2024-06-06 . Data-Centric Artificial Intelligence for Multidisciplinary Applications . . 75 . 9781040031131.
  4. Vaswani . Ashish . Ashish Vaswani . Shazeer . Noam . Parmar . Niki . Uszkoreit . Jakob . Jones . Llion . Gomez . Aidan N . Aidan Gomez . Kaiser . Łukasz . Polosukhin . Illia . Attention is All you Need . Advances in Neural Information Processing Systems . 2017 . 30 . Curran Associates, Inc..
  5. News: Levy . Steven . 8 Google Employees Invented Modern AI. Here’s the Inside Story . 2024-03-20 . Wired . en-US . 1059-1028.
  6. News: Meet the $4 Billion AI Superstars That Google Lost. Bloomberg . 13 July 2023. www.bloomberg.com.