Audio signal processing explained

Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting of compressions and rarefactions. The energy contained in audio signals or sound power level is typically measured in decibels. As audio signals may be represented in either digital or analog format, processing may occur in either domain. Analog processors operate directly on the electrical signal, while digital processors operate mathematically on its digital representation.

History

The motivation for audio signal processing began at the beginning of the 20th century with inventions like the telephone, phonograph, and radio that allowed for the transmission and storage of audio signals. Audio processing was necessary for early radio broadcasting, as there were many problems with studio-to-transmitter links.^[1] The theory of signal processing and its application to audio was largely developed at Bell Labs in the mid 20th century. Claude Shannon and Harry Nyquist's early work on communication theory, sampling theory and pulse-code modulation (PCM) laid the foundations for the field. In 1957, Max Mathews became the first person to synthesize audio from a computer, giving birth to computer music.

Major developments in digital audio coding and audio data compression include differential pulse-code modulation (DPCM) by C. Chapin Cutler at Bell Labs in 1950, linear predictive coding (LPC) by Fumitada Itakura (Nagoya University) and Shuzo Saito (Nippon Telegraph and Telephone) in 1966,^[2] adaptive DPCM (ADPCM) by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973,^[3] ^[4] discrete cosine transform (DCT) coding by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974,^[5] and modified discrete cosine transform (MDCT) coding by J. P. Princen, A. W. Johnson and A. B. Bradley at the University of Surrey in 1987.^[6] LPC is the basis for perceptual coding and is widely used in speech coding,^[7] while MDCT coding is widely used in modern audio coding formats such as MP3^[8] and Advanced Audio Coding (AAC).^[9]

Types

Analog

An analog audio signal is a continuous signal represented by an electrical voltage or current that is analogous to the sound waves in the air. Analog signal processing then involves physically altering the continuous signal by changing the voltage or current or charge via electrical circuits.

Historically, before the advent of widespread digital technology, analog was the only method by which to manipulate a signal. Since that time, as computers and software have become more capable and affordable, digital signal processing has become the method of choice. However, in music applications, analog technology is often still desirable as it often produces nonlinear responses that are difficult to replicate with digital filters.

Digital

A digital representation expresses the audio waveform as a sequence of symbols, usually binary numbers. This permits signal processing using digital circuits such as digital signal processors, microprocessors and general-purpose computers. Most modern audio systems use a digital approach as the techniques of digital signal processing are much more powerful and efficient than analog domain signal processing.^[10]

Applications

Processing methods and application areas include storage, data compression, music information retrieval, speech processing, localization, acoustic detection, transmission, noise cancellation, acoustic fingerprinting, sound recognition, synthesis, and enhancement (e.g. equalization, filtering, level compression, echo and reverb removal or addition, etc.).

Audio broadcasting

See also: Broadcasting. Audio signal processing is used when broadcasting audio signals in order to enhance their fidelity or optimize for bandwidth or latency. In this domain, the most important audio processing takes place just before the transmitter. The audio processor here must prevent or minimize overmodulation, compensate for non-linear transmitters (a potential issue with medium wave and shortwave broadcasting), and adjust overall loudness to the desired level.

Active noise control

Active noise control is a technique designed to reduce unwanted sound. By creating a signal that is identical to the unwanted noise but with the opposite polarity, the two signals cancel out due to destructive interference.

Audio synthesis

Audio effects

See main article: Effects unit. Audio effects alter the sound of a musical instrument or other audio source. Common effects include distortion, often used with electric guitar in electric blues and rock music; dynamic effects such as volume pedals and compressors, which affect loudness; filters such as wah-wah pedals and graphic equalizers, which modify frequency ranges; modulation effects, such as chorus, flangers and phasers; pitch effects such as pitch shifters; and time effects, such as reverb and delay, which create echoing sounds and emulate the sound of different spaces.

Musicians, audio engineers and record producers use effects units during live performances or in the studio, typically with electric guitar, bass guitar, electronic keyboard or electric piano. While effects are most frequently used with electric or electronic instruments, they can be used with any audio source, such as acoustic instruments, drums, and vocals.^[11] ^[12]

Computer audition

Notes and References

Book: Atti, Andreas Spanias, Ted Painter, Venkatraman. Audio signal processing and coding. 2006. John Wiley & Sons. Hoboken, NJ. 0-471-79147-4. 464. [Online-Ausg.].
Gray . Robert M. . A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol . Found. Trends Signal Process. . 2010 . 3 . 4 . 203–303 . 10.1561/2000000036 . https://ghostarchive.org/archive/20221009/https://ee.stanford.edu/~gray/lpcip.pdf . 2022-10-09 . live . 1932-8346. free .
P. Cummiskey, Nikil S. Jayant, and J. L. Flanagan, "Adaptive quantization in differential PCM coding of speech", Bell Syst. Tech. J., vol. 52, pp. 1105—1118, Sept. 1973
Cummiskey . P. . Jayant . Nikil S. . Flanagan . J. L. . Adaptive quantization in differential PCM coding of speech . The Bell System Technical Journal . 1973 . 52 . 7 . 1105–1118 . 10.1002/j.1538-7305.1973.tb02007.x . 0005-8580.
Nasir Ahmed . N. Ahmed . T. Natarajan . Kamisetty Ramamohan Rao . IEEE Transactions on Computers. Discrete Cosine Transform. C-23. 1. 90–93. January 1974 . 10.1109/T-C.1974.223784 . 149806273 . https://ghostarchive.org/archive/20221009/https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/Ahmed_et_al.__1974.pdf . 2022-10-09 . live.
J. P. Princen, A. W. Johnson und A. B. Bradley: Subband/transform coding using filter bank designs based on time domain aliasing cancellation, IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987.
Book: Schroeder . Manfred R. . Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder . 2014 . Springer . 9783319056609 . Bell Laboratories . 388 . https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388.
Web site: Guckert . John . The Use of FFT and MDCT in MP3 Audio Compression . https://ghostarchive.org/archive/20221009/http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf . 2022-10-09 . live . . Spring 2012 . 14 July 2019.
Web site: MP3 and AAC Explained. Brandenburg. Karlheinz. 1999. live. https://web.archive.org/web/20170213191747/https://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf. 2017-02-13.
Book: Zölzer, Udo . Digital Audio Signal Processing . John Wiley and Sons . 1997 . 0-471-97226-6.
Book: Horne. Greg. Complete Acoustic Guitar Method: Mastering Acoustic Guitar c. Alfred Music. 2000. 9781457415043. 92.
Book: Yakabuski. Jim. Professional Sound Reinforcement Techniques: Tips and Tricks of a Concert Sound Engineer. Hal Leonard. 2001. 9781931140065. 139.