Wu Dao Explained

悟道 (Wu Dao)
Author:	Beijing Academy of Artificial Intelligence
Released:	January 11, 2021

Wu Dao is a multimodal artificial intelligence developed by the Beijing Academy of Artificial Intelligence (BAAI). Wu Dao 1.0 was first announced on January 11, 2021; an improved version, Wu Dao 2.0, was announced on May 31. It has been compared to GPT-3, and is built on a similar architecture; in comparison, GPT-3 has 175 billion parameters — variables and inputs within the machine learning model — while Wu Dao has 1.75 trillion parameters. Wu Dao was trained on 4.9 terabytes of images and texts (which included 1.2 terabytes of Chinese text and 1.2 terabytes of English text), while GPT-3 was trained on 45 terabytes of text data. Yet, a growing body of work highlights the importance of increasing both data and parameters. The chairman of BAAI said that Wu Dao was an attempt to "create the biggest, most powerful AI model possible". Wu Dao 2.0, was called "the biggest language A.I. system yet". It was interpreted by commenters as an attempt to "compete with the United States".. Notably, the type of architecture used for Wu Dao 2.0 is a mixture-of-experts (MoE) model, unlike GPT-3, which is a "dense" model: while MoE models require much less computational power to train than dense models with the same numbers of parameters, trillion-parameter MoE models have shown comparable performance to models that are hundreds of times smaller.

Wu Dao's creators demonstrated its ability to perform natural language processing and image recognition, in addition to generation of text and images. The model can not only write essays, poems and couplets in traditional Chinese, it can both generate alt text based on a static image and generate nearly photorealistic images based on natural language descriptions. Wu Dao also showed off its ability to power virtual idols (with a little help from Microsoft-spinoff Xiaoice) and predict the 3D structures of proteins like AlphaFold.

History

Wu Dao's development began in October 2020, several months after the May 2020 release of GPT-3. The first iteration of the model, Wu Dao 1.0, "initiated large-scale research projects" via four related models.

Wu Dao – Wen Yuan, a 2.6-billion-parameter pretrained language model, was designed for tasks like open-domain answering, sentiment analysis, and grammar correction.
Wu Dao – Wen Lan, a 1-billion-parameter multimodal graphic model, was trained on 50 million image pairs to perform image captioning.
Wu Dao – Wen Hui, an 11.3-billion-parameter generative language model, was designed for "essential problems in general artificial intelligence from a cognitive perspective"; Synced says that it can "generate poetry, make videos, draw pictures, retrieve text, perform complex reasoning, etc".
Wu Dao – Wen Su, based on Google's BERT language model and trained on the 100-gigabyte UNIPARC database (as well as thousands of gene sequences), was designed for biomolecular structure prediction and protein folding tasks.

WuDao Corpora

WuDao Corpora (also written as WuDaoCorpora), as of version 2.0, was a large dataset constructed for training Wu Dao 2.0. It contains 3 terabytes of text scraped from web data, 90 terabytes of graphical data (incorporating 630 million text/image pairs), and 181 gigabytes of Chinese dialogue (incorporating 1.4 billion dialogue rounds). Wu Dao 2.0 was trained using FastMoE, a variant of the mixture of experts architecture published by Google. TheNextWeb said in June 2021 that "details as to exactly how Wu Dao was trained, what was in its various datasets, and what practical applications it can be used for remain scarce". OpenAI's policy director called Wu Dao an example of "model diffusion", a neologism describing a situation in which multiple entities develop models similar to OpenAI's.