In audio engineering, joint encoding refers to a joining of several channels of similar information during encoding in order to obtain higher quality, a smaller file size, or both.
The term joint stereo has become prominent as the Internet has allowed for the transfer of relatively low bit rate, acceptable-quality audio with modest Internet access speeds. Joint stereo refers to any number of encoding techniques used for this purpose. Two forms are described here, both of which are implemented in various ways with different codecs, such as MP3, AAC and Ogg Vorbis.
This form of joint stereo uses a technique known as joint frequency encoding, which functions on the principle of sound localization. Human hearing is predominantly less acute at perceiving the direction of certain audio frequencies. By exploiting this characteristic, intensity stereo coding can reduce the data rate of an audio stream with little or no perceived change in apparent quality.
More specifically, the dominance of inter-aural time differences (ITD) for sound localization by humans is only present for lower frequencies. That leaves inter-aural amplitude differences (IAD) as the dominant location indicator for higher frequencies (the cutoff being ~2 kHz). The idea of intensity stereo coding is to merge the lower spectrum into just one channel (thus reducing overall differences between channels) and to transmit a little side information about how to pan certain frequency regions to recover the IAD cues. ITD is not lost completely in this scheme, however: the shape of the ear makes it such that the ITD can be recovered from IAD if the sound comes from free space, e.g. played through loudspeakers.[1]
This type of coding does not perfectly reconstruct the original audio because of the loss of information which results in the simplification of the stereo image and can produce perceptible compression artifacts. However, for very low bit rates this type of coding usually yields a gain in perceived quality of the audio. It is supported by many audio compression formats (including MP3, AAC, Vorbis and Opus) but not always by every encoder.
M/S stereo coding transforms the left and right channels into a mid channel and a side channel. The mid channel is the sum of the left and right channels, or
M=L+R
S=L-R
To reconstruct the original signal, the channels are either added or subtracted .
This form of coding is also sometimes known as matrix stereo and is used in many different forms of audio processing and recording equipment. It is not limited to digital systems and can even be created with passive audio transformers or analog amplifiers. One example of the use of M/S stereo is in FM stereo broadcasting, where
L+R
L-R
M/S is also a common technique for production of stereo recordings. See .
M/S encoding does not strictly require that the left and right channels use the same weight. In Opus CELT, M/S encoding is combined with an angle parameter, so that different weights can be used to maximize de-correlation.[3]
A similar form of joining multiple channels is seen in the ambisonics implementation of Opus 1.3. A matrix may be used to mix the spherical harmonic channels together, reducing redundancy.[4]
Parametric stereo is similar to intensity stereo, except that parameters beyond the intensity difference is used. In the MPEG-4 (HE-AAC) version, the intensity difference and time delay difference are used, allowing all bands to be used without hurting localization. HE-AAC also adds "correlation" information, which replicates ambience by synthesizing some difference between channels.[5]
Binaural cue coding (BCC) is the HE-AAC PS technique extended for many input channels, all downmixing to one. The very same ILD, ITD, and IC parameters were used. MPEG Surround is similar to BCC, but allows downmixing to multiple channels, and does not seem to use ITD.[6]
Joint frequency encoding is an encoding technique used in audio data compression to reduce the data rate.
The idea is to merge a given frequency range of multiple sound channels together so that the resulting encoding will preserve the sound information of that range not as a bundle of separate channels but as one homogeneous data stream. This will destroy the original channel separation permanently, as the information cannot be accurately reconstructed, but will greatly lessen the amount of required storage space. Only some forms of joint stereo use the joint frequency encoding technique, such as intensity stereo coding.
When used within the MP3 compression process, joint stereo normally employs multiple techniques, and can switch between them for each MPEG frame. Typically, a modern encoder's joint stereo mode uses M/S stereo for some frames and L/R stereo for others, whichever method yields the best result. Encoders use different algorithms to determine when to switch and how much space to allocate to each channel; quality can suffer if the switching is too frequent or if the side channel doesn't get enough bits. With some encoding software, it is possible to force the use of M/S stereo for all frames, mimicking the joint stereo mode of some early encoders like Xing. Within the LAME encoder, this is known as forced joint stereo.[7]
As with MP3, Ogg Vorbis stereo files can employ either L/R stereo or joint stereo. When using joint stereo, both M/S stereo and intensity stereo methods may be used. As opposed to MP3 where M/S stereo (when used) is applied before quantization, an Ogg Vorbis encoder applies M/S stereo to samples in the frequency domain after quantization, making application of M/S stereo a lossless step. After this step, any frequency area can be converted to intensity stereo by removing the corresponding part of the M/S signal's side channel. Ogg Vorbis' floor function will take care of the required left-right panning. Opus similarly has support for all three options in the CELT layer; the SILK layer is M/S-only.[8]