Audio Coding

Coding algorithms (compression algorithms) are the basis for so-called 'codecs'. Codecs are known as soft or hardware that is capable to encode and decode audio or video streams before transmitting the data to the remote peer and vice versa. In this way the required transmission capacity or memory space can be reduced by simplification and summarization. While encoding, less important information can be rejected.

Audio coding and the compression based on this benefits from several properties of the human auditory system. The quality of the audio stream should be maintained with the lowest possible data volume.

Pulse Code Modulation (PCM)
PCM is a procedure to convert analogue audio signals into digital data without compression. It's the basis for the ITU-T standard G.711.

In this procedure an analogue audio signal is multiplied (modulated) by an impulse and the result is displayed by a binary code with the so-called sampling depth (e.g. 16 bit). This process is repeated periodically with the sampling rate (e.g. for G.711 8 bit and a sampling rate of 8 kHz). The achieved compression factor has a ratio of 1,75:1.

This method is called µ-Law in North America and Japan, and A-Law in Europe.

Differential Pulse Code Modulation (DPCM)
The DPCM method is utilised to convert signals that are descrete and separated in time, into digital signals. DPCM is an extension of the PCM method and the preliminary stage of the ADPCM technique.

This method is very suitable for consecutive signals where the correlation is high. This refers to digital audio signals which can be converted with a high degree of data compression.

Adaptive Differential Pulse Code Modulation (ADPCM)
ADPCM is a technique used for audio signals which has been standardized (G.721, G.722, G.725, G.726, G.727) after being recommended by the ITU-T. The transmission range in the standard G.721 is primarly intended for 4kHz audio signals and in the standards G.722 and G.725 up to 7kHz.

ADPCM is a special form of the Pulse Code Modulation (PCM) which tries to predict the signal waveform in the next section. During quantization the difference between the predicted and the real signal is calculated. This balance can be transmitted with 2-5 bits which means that a narrow bandwidth is sufficient for the transmission. With this method it's possible to dynamically adapt the output data rate between 16kbps and 64 kbps.

Linear Predictive Coding (LPC)
LPC is a type of encoding for the efficient transmission of voice signals. The data reduction can be obtained by applying prediction techniques which can forecast signals linearly. Therefore only the difference of two consecutive short-term segments has to be submitted.

LPC uses a simple synthetic model of language that is based on volume, voice frequency and the voicing of the output signal. Using the LPC-Vocoder on the recipients' side a natural speech reproduction is possible. While the speech reproduction with a phoneme synthesizer has a strong mechanical sound.

LPC is often incorrectly equated with the algorithm CELP.

Code Excited Linear Prediction (CELP)
CELP is a technique based on the LPC method which works with a fixed data rate of 4,8 kbps.

The CELP-coder works similar to the LPC-coder. In addition to this CELP is able to calculate errors between the original and the synthetic model. The errors are transmitted during a code exchange between encoder and decoder. Every error code is defined in the so-called 'codebook' in the same way on each side. With this considerable effort of approx. one order of magnitude more, a significantly better voice quality can be achieved in comparison to the LPC method.

CELP is included in Part 3 of the MPEG-4 standard agreed upon by the ISO. The so-called 'low-delay CELP', an alternative with less than 2 ms is the basis of the ITU-T audio standard G.728.

Audio Coding

About this page

Fragen?