Audio coding format explained

An audio coding format[1] (or sometimes audio compression format) is a content representation format for storage or transmission of digital audio (such as in digital television, digital radio and in audio and video files). Examples of audio coding formats include MP3, AAC, Vorbis, FLAC, and Opus. A specific software or hardware implementation capable of audio compression and decompression to/from a specific audio coding format is called an audio codec; an example of an audio codec is LAME, which is one of several different codecs which implements encoding and decoding audio in the MP3 audio coding format in software.

Some audio coding formats are documented by a detailed technical specification document known as an audio coding specification. Some such specifications are written and approved by standardization organizations as technical standards, and are thus known as an audio coding standard. The term "standard" is also sometimes used for de facto standards as well as formal standards.

Audio content encoded in a particular audio coding format is normally encapsulated within a container format. As such, the user normally doesn't have a raw AAC file, but instead has a .m4a audio file, which is a MPEG-4 Part 14 container containing AAC-encoded audio. The container also contains metadata such as title and other tags, and perhaps an index for fast seeking.[2] A notable exception is MP3 files, which are raw audio coding without a container format. De facto standards for adding metadata tags such as title and artist to MP3s, such as ID3, are hacks which work by appending the tags to the MP3, and then relying on the MP3 player to recognize the chunk as malformed audio coding and therefore skip it. In video files with audio, the encoded audio content is bundled with video (in a video coding format) inside a multimedia container format.

An audio coding format does not dictate all algorithms used by a codec implementing the format. An important part of how lossy audio compression works is by removing data in ways humans can't hear, according to a psychoacoustic model; the implementer of an encoder has some freedom of choice in which data to remove (according to their psychoacoustic model).

Lossless, lossy, and uncompressed audio coding formats

A lossless audio coding format reduces the total data needed to represent a sound but can be de-coded to its original, uncompressed form. A lossy audio coding format additionally reduces the bit resolution of the sound on top of compression, which results in far less data at the cost of irretrievably lost information.

Transmitted (streamed) audio is most often compressed using lossy audio codecs as the smaller size is far more convenient for distribution. The most widely used audio coding formats are MP3 and Advanced Audio Coding (AAC), both of which are lossy formats based on modified discrete cosine transform (MDCT) and perceptual coding algorithms.

Lossless audio coding formats such as FLAC and Apple Lossless are sometimes available, though at the cost of larger files.

Uncompressed audio formats, such as pulse-code modulation (PCM, or .wav), are also sometimes used. PCM was the standard format for Compact Disc Digital Audio (CDDA).

History

In 1950, Bell Labs filed the patent on differential pulse-code modulation (DPCM). Adaptive DPCM (ADPCM) was introduced by P. Cummiskey, Nikil S. Jayant and James L. Flanagan at Bell Labs in 1973.[3] [4]

Perceptual coding was first used for speech coding compression, with linear predictive coding (LPC).[5] Initial concepts for LPC date back to the work of Fumitada Itakura (Nagoya University) and Shuzo Saito (Nippon Telegraph and Telephone) in 1966.[6] During the 1970s, Bishnu S. Atal and Manfred R. Schroeder at Bell Labs developed a form of LPC called adaptive predictive coding (APC), a perceptual coding algorithm that exploited the masking properties of the human ear, followed in the early 1980s with the code-excited linear prediction (CELP) algorithm which achieved a significant compression ratio for its time.[5] Perceptual coding is used by modern audio compression formats such as MP3[5] and AAC.

Discrete cosine transform (DCT), developed by Nasir Ahmed, T. Natarajan and K. R. Rao in 1974,[7] provided the basis for the modified discrete cosine transform (MDCT) used by modern audio compression formats such as MP3[8] and AAC. MDCT was proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987,[9] following earlier work by Princen and Bradley in 1986.[10] The MDCT is used by modern audio compression formats such as Dolby Digital,[11] [12] MP3,[8] and Advanced Audio Coding (AAC).[13]

List of lossy formats

General

Basic compression algorithmAudio coding standardAbbreviationIntroductionMarket share [14]
Modified discrete cosine transform (MDCT)Dolby Digital (AC-3)AC3199158%[15]
Adaptive Transform Acoustic CodingATRAC1992
MPEG Layer IIIMP3199349%[16]
Advanced Audio Coding (MPEG-2 / MPEG-4)AAC199788%
Windows Media AudioWMA1999
Ogg VorbisOgg20007%[17]
Constrained Energy Lapped TransformCELT2011[18]
OpusOpus20128%[19]
LDACLDAC2015[20] [21]
Adaptive differential pulse-code modulation (ADPCM)aptX / aptX-HDaptX1989[22]
Digital Theater SystemsDTS199014%[23] [24]
Master Quality AuthenticatedMQA2014
Sub-band coding (SBC)MPEG-1 Audio Layer IIMP21993
MusepackMPC1997

Speech

List of lossless formats

See also

References

  1. The term "audio coding" can be seen in e.g. the name Advanced Audio Coding, and is analogous to the term video coding
  2. Web site: Video – Where is synchronization information stored in container formats?.
  3. 10.1002/j.1538-7305.1973.tb02007.x. Adaptive Quantization in Differential PCM Coding of Speech. 1973. Cummiskey. P.. Jayant. N. S.. Flanagan. J. L.. Bell System Technical Journal. 52. 7. 1105–1118.
  4. Cummiskey . P. . Jayant . Nikil S. . Flanagan . J. L. . Adaptive quantization in differential PCM coding of speech . The Bell System Technical Journal . 1973 . 52 . 7 . 1105–1118 . 10.1002/j.1538-7305.1973.tb02007.x . 0005-8580.
  5. Book: Schroeder . Manfred R. . Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder . 2014 . Springer . 9783319056609 . Bell Laboratories . 388 . https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388.
  6. Gray . Robert M. . A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol . Found. Trends Signal Process. . 2010 . 3 . 4 . 203–303 . 10.1561/2000000036 . 1932-8346. free .
  7. Nasir Ahmed . N. Ahmed . T. Natarajan . Kamisetty Ramamohan Rao . IEEE Transactions on Computers . Discrete Cosine Transform . C-23 . 1 . 90–93 . January 1974 . 10.1109/T-C.1974.223784 . 149806273 . 2019-10-20 . 2016-12-08 . https://web.archive.org/web/20161208075733/https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/Ahmed_et_al.__1974.pdf . dead .
  8. Web site: Guckert . John . The Use of FFT and MDCT in MP3 Audio Compression . . Spring 2012 . 14 July 2019.
  9. Book: 10.1109/ICASSP.1987.1169405. https://ieeexplore.ieee.org/document/1169405. Subband/Transform coding using filter bank designs based on time domain aliasing cancellation. ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing. 1987. Princen. J.. Johnson. A.. Bradley. A.. 12. 2161–2164. 58446992.
  10. 10.1109/TASSP.1986.1164954. Analysis/Synthesis filter bank design based on time domain aliasing cancellation. 1986. Princen. J.. Bradley. A.. IEEE Transactions on Acoustics, Speech, and Signal Processing. 34. 5. 1153–1161.
  11. Book: Luo . Fa-Long . Mobile Multimedia Broadcasting Standards: Technology and Practice . 2008 . . 9780387782638 . 590 .
  12. Britanak . V. . On Properties, Relations, and Simplified Implementation of Filter Banks in the Dolby Digital (Plus) AC-3 Audio Coding Standards . IEEE Transactions on Audio, Speech, and Language Processing . 2011 . 19 . 5 . 1231–1241 . 10.1109/TASL.2010.2087755. 897622 .
  13. Web site: MP3 and AAC Explained. Brandenburg. Karlheinz. 1999. live. https://web.archive.org/web/20170213191747/https://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf. 2017-02-13.
  14. Web site: Video Developer Report 2019 . . 2019 . 5 November 2019.
  15. Britanak . V. . On Properties, Relations, and Simplified Implementation of Filter Banks in the Dolby Digital (Plus) AC-3 Audio Coding Standards . IEEE Transactions on Audio, Speech, and Language Processing . 2011 . 19 . 5 . 1231–1241 . 10.1109/TASL.2010.2087755. 897622 .
  16. Stanković . Radomir S. . Astola . Jaakko T. . Reminiscences of the Early Work in DCT: Interview with K.R. Rao . Reprints from the Early Days of Information Sciences . 2012 . 60 . 13 October 2019.
  17. Web site: Xiph.Org Foundation . Xiph.Org Foundation . Vorbis I specification - 1.1.2 Classification . 2009-06-02 . 2009-09-22.
  18. Presentation of the CELT codec. Timothy B.. Terriberry. http://www.celt-codec.org/presentations/misc/lca-celt.pdf. Presentation.
  19. Valin. Jean-Marc. Maxwell. Gregory. Terriberry. Timothy B.. Vos. Koen. October 2013. High-Quality, Low-Delay Music Coding in the Opus Codec. 135th AES Convention. Audio Engineering Society. 1602.04845.
  20. Web site: Darko . John H. . The inconvenient truth about Bluetooth audio . DAR__KO . 2017-03-29 . 2018-01-13 . https://web.archive.org/web/20180114020200/http://www.digitalaudioreview.net/2017/03/the-inconvenient-truth-about-bluetooth-audio/ . 2018-01-14 . dead .
  21. Web site: What is Sony LDAC, and how does it do it?. Ford. Jez. 2015-08-24. AVHub. 2018-01-13.
  22. Web site: aptX HD - lossless or lossy?. Ford. Jez. 2016-11-22. AVHub. 2018-01-13.
  23. Web site: Digital Theater Systems Audio Formats . . 10 November 2019 . 27 December 2011.
  24. Book: Spanias . Andreas . Painter . Ted . Atti . Venkatraman . Audio Signal Processing and Coding . 2006 . . 9780470041963 . 338 .