Generative audio explained

Generative audio refers to the creation of audio files from databases of audio clips. This technology differs from synthesized voices such as Apple's Siri or Amazon's Alexa, which use a collection of fragments that are stitched together on demand. Generative audio works by using neural networks to learn the statistical properties of an audio source, then reproduces those properties.^[1]

Implications

With this technology, a person's voice can be replicated to speak phrases that they may have never spoken. This could lead to a synthetic version of a public figure's voice being used against them.^[2]

Technology

This method uses a generative adversarial network (GAN), a deep machine learning technique where two machine learning models work against each other to create realistic audio.^[3]

Notes and References

News: Fake news: you ain't seen nothing yet. The Economist. July 2017. 2017-07-01.
Book: Zotkin. D. N.. Shamma. S. A.. Ru. P.. Duraiswami. R.. Davis. L. S.. 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03) . Pitch and timbre manipulations using cortical representation of sound . April 2003. 5. V–517–20. 10.1109/ICASSP.2003.1200020. 978-0-7803-7663-2. 10372569.
Mobin. Shariq. October 2016. Voice Conversion using Convolutional Neural Networks. 1610.08927. stat.ML.

Generative audio explained

Implications

Technology

See also

Notes and References