Wu Dao Explained

悟道 (Wu Dao)
Author:Beijing Academy of Artificial Intelligence
Released:January 11, 2021

Wu Dao is a multimodal artificial intelligence developed by the Beijing Academy of Artificial Intelligence (BAAI). Wu Dao 1.0 was first announced on January 11, 2021; an improved version, Wu Dao 2.0, was announced on May 31. It has been compared to GPT-3, and is built on a similar architecture; in comparison, GPT-3 has 175 billion parameters — variables and inputs within the machine learning model — while Wu Dao has 1.75 trillion parameters. Wu Dao was trained on 4.9 terabytes of images and texts (which included 1.2 terabytes of Chinese text and 1.2 terabytes of English text), while GPT-3 was trained on 45 terabytes of text data. Yet, a growing body of work highlights the importance of increasing both data and parameters. The chairman of BAAI said that Wu Dao was an attempt to "create the biggest, most powerful AI model possible". Wu Dao 2.0, was called "the biggest language A.I. system yet". It was interpreted by commenters as an attempt to "compete with the United States".. Notably, the type of architecture used for Wu Dao 2.0 is a mixture-of-experts (MoE) model, unlike GPT-3, which is a "dense" model: while MoE models require much less computational power to train than dense models with the same numbers of parameters, trillion-parameter MoE models have shown comparable performance to models that are hundreds of times smaller.

Wu Dao's creators demonstrated its ability to perform natural language processing and image recognition, in addition to generation of text and images. The model can not only write essays, poems and couplets in traditional Chinese, it can both generate alt text based on a static image and generate nearly photorealistic images based on natural language descriptions. Wu Dao also showed off its ability to power virtual idols (with a little help from Microsoft-spinoff Xiaoice) and predict the 3D structures of proteins like AlphaFold.

History

Wu Dao's development began in October 2020, several months after the May 2020 release of GPT-3. The first iteration of the model, Wu Dao 1.0, "initiated large-scale research projects" via four related models.

WuDao Corpora

WuDao Corpora (also written as WuDaoCorpora), as of version 2.0, was a large dataset constructed for training Wu Dao 2.0. It contains 3 terabytes of text scraped from web data, 90 terabytes of graphical data (incorporating 630 million text/image pairs), and 181 gigabytes of Chinese dialogue (incorporating 1.4 billion dialogue rounds). Wu Dao 2.0 was trained using FastMoE, a variant of the mixture of experts architecture published by Google. TheNextWeb said in June 2021 that "details as to exactly how Wu Dao was trained, what was in its various datasets, and what practical applications it can be used for remain scarce". OpenAI's policy director called Wu Dao an example of "model diffusion", a neologism describing a situation in which multiple entities develop models similar to OpenAI's.