Text-to-video model explained
A text-to-video model is a machine learning model that uses a natural language description as input to produce a video relevant to the input text.[1] Advancements during the 2020s in the generation of high-quality, text-conditioned videos have largely been driven by the development of video diffusion models.[2]
Models
There are different models, including open source models. Chinese-language input[3] CogVideo is the earliest text-to-video model "of 9.4 billion parameters" to be developed, with its demo version of open source codes first presented on GitHub in 2022. That year, Meta Platforms released a partial text-to-video model called "Make-A-Video",[4] [5] [6] and Google's Brain (later Google DeepMind) introduced Imagen Video, a text-to-video model with 3D U-Net.[7] [8] [9] [10] [11]
In March 2023, a research paper by Alibaba was published that applied many of the principles found in latent image diffusion models to video generation.[12] [13] The following year, Alibaba released ModelScope.[14] Services like Kaiber and Reemix subsequently adopted similar approaches to video generation in their respective products.
Matthias Niessner and Lourdes Agapito at AI company Synthesia work on developing 3D neural rendering techniques that can synthesise realistic video by using 2D and 3D neural representations of shape, appearances, and motion for controllable video synthesis of avatars.[15] In June 2024, Luma Labs launched its Dream Machine video tool.[16] [17] That same month,[18] Kuaishou extended its Kling AI text-to-video model to international users. In July 2024, TikTok owner ByteDance released Jimeng AI in China, through its subsidiary, Faceu Technology.[19]
AI chatbot Grok added a text-to-video feature on August 13, 2024 that creates and posts generated video on the X platform. Social media sites were quickly flooded with fabricated images of Donald Trump, Kamala Harris, X site owner Elon Musk, and others, including in false, disturbing depictions, such as participating in 9/11 attacks.[20]
Alternative approaches to text-to-video models include Google's Phenaki, Hour One, Colossyan,[21] Runway's Gen-3 Alpha,[22] [23] and OpenAI's unreleased (as at August 2024) Sora,[24] available only to alpha testers.[25]
See also
Notes and References
- Artificial Intelligence Index Report 2023. Stanford Institute for Human-Centered Artificial Intelligence. 98. Multiple high quality text-to-video models, AI systems that can generate video clips from prompted text, were released in 2022..
- Melnik . Andrew . Video Diffusion Models: A Survey . 2024-05-06 . 2405.03150 . Ljubljanac . Michal . Lu . Cong . Yan . Qi . Ren . Weiming . Ritter . Helge. cs.CV .
- https://aibusiness.com/nlp/ai-video-generation-the-supreme-list Text-to-Video Generative AI Models: The Definitive List
- Web site: Davies . Teli . 2022-09-29 . Make-A-Video: Meta AI's New Model For Text-To-Video Generation . 2022-10-12 . Weights & Biases . en.
- Web site: Monge . Jim Clyde . 2022-08-03 . This AI Can Create Video From Text Prompt . 2022-10-12 . Medium . en.
- Web site: Meta's Make-A-Video AI creates videos from text . 2022-10-12 . www.fonearena.com.
- News: google: Google takes on Meta, introduces own video-generating AI . 2022-10-12 . The Economic Times. 6 October 2022 .
- Web site: Monge . Jim Clyde . 2022-08-03 . This AI Can Create Video From Text Prompt . 2022-10-12 . Medium . en.
- Web site: Nuh-uh, Meta, we can do text-to-video AI, too, says Google . 2022-10-12 . www.theregister.com.
- Web site: Papers with Code - See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction . 2022-10-12 . paperswithcode.com . en.
- Web site: Papers with Code - Text-driven Video Prediction . 2022-10-12 . paperswithcode.com . en.
- Web site: Home - DAMO Academy . 2023-08-12 . damo.alibaba.com.
- Luo . Zhengxiong . Chen . Dayou . Zhang . Yingya . Huang . Yan . Wang . Liang . Shen . Yujun . Zhao . Deli . Zhou . Jingren . Tan . Tieniu . 2023 . VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation . cs.CV . 2303.08320.
- https://www.theregister.com/2024/06/25/alibaba_modelscope_english_translation/ Alibaba Cloud unleashes thousands of Chinese AI models to the world
- Web site: Text to Speech for Videos . 2023-10-17.
- https://venturebeat.com/ai/luma-ai-debuts-dream-machine-for-realistic-video-generation-heating-up-ai-media-race/ Luma AI debuts 'Dream Machine' for realistic video generation, heating up AI media race
- https://www.forbes.com/sites/charliefink/2024/06/13/apple-debuts-intelligence-mistral-raises-600-million-new-ai-text-to-video/ Apple Debuts Intelligence, Mistral Raises $600 Million, New AI Text-To-Video
- https://venturebeat.com/ai/what-you-need-to-know-about-kling-the-ai-video-generator-rival-to-sora-thats-wowing-creators/ What you need to know about Kling, the AI video generator rival to Sora that’s wowing creators
- https://www.reuters.com/technology/artificial-intelligence/bytedance-joins-openais-sora-rivals-with-ai-video-app-launch-2024-08-06/ ByteDance joins OpenAI's Sora rivals with AI video app launch
- https://www.ctvnews.ca/sci-tech/elon-musk-s-ai-photo-tool-is-generating-realistic-fake-images-of-trump-harris-and-biden-1.7004074 Elon Musk's AI photo tool is generating realistic, fake images of Trump, Harris and Biden
- https://aibusiness.com/nlp/ai-video-generation-the-supreme-list Text-to-Video Generative AI Models: The Definitive List
- https://the-decoder.com/runways-sora-competitor-gen-3-alpha-now-available/ Runway's Sora competitor Gen-3 Alpha now available
- https://www.bloomberg.com/news/articles/2023-03-20/generative-ai-s-next-frontier-is-video Generative AI's Next Frontier Is Video
- https://www.nbcnews.com/tech/tech-news/openai-sora-video-artificial-intelligence-unveiled-rcna139065 OpenAI teases 'Sora,' its new text-to-video AI model
- https://www.marketingdive.com/news/toys-r-us-openai-sora-gen-ai-first-text-video/719797/ Toys R Us creates first brand film to use OpenAI’s text-to-video tool