Layer 2 encoding ================ This document describes what data is being stored in a Layer 2 encoded MPEG audio frame. It also describes the layout of such a frame. For more information refer to the literature, e.g.: - ISO standard - Peter Noll "MPEG Digital Audio Coding" IEEE Signal Processing Magazine, Sept. 1997, pp.59-81 - Davis Pan "A Tutorial on MPEG/Audio Compression" IEEE Multimedia Vol. 2, No. 7, 1995, pp. 60-74 - chapter 4 on "Audio" in Haskell/Puri/Netravali "Digital Video: An Introduction to MPEG-2", Chapman & Hall, New York, 1997 1) What data is encoded Each channel is encoded separately (but possibly simultaneously with the other one in order to allow adaptive bit allocation between channels). For each channel, the following processing is performed for frames of 1152 PCM input samples. (In the graph, processes are put into boxes, data has no boxes.) PCM-Samples (36 x 32 samples) > TIME domain | v ---------------------- |polyphase Filterbank| ---------------------- | v 32 equally spaced subbands > FREQUENCY domain (containing 3 x 12 samples per subband, which form a block that is encoded together) | v -------------------------- |dynamic bit allocation &| (based on psycho-acoustic model) |scalefactor calculation,| |scalefactor selection | |information -------------------------- | v - 32 scalefactor selection values (1 per subband and block of 36 samples) - 3 x 32 scalefactors (1 per subband and block of 12 samples) - 32 bitallocation values (1 per subband and block of 36 samples) - 3 x 12 x 32 quantized and normalized samples (36 per subband) The bitallocation tells the decoder the number of bits used to represent each encoded sample. During bit allocation, each subband is treated separately. According to the signal-to-mask-ratio calculated by the psycho-acoustic model, bits are allocated for the sample quantization in an iterative loop starting with 0 bits per block. The quantization is linear. The scalefactor selection tells the decoder how many scalefactors are encoded per subband. If the 3 scalefactors of each subband do not differ a lot, only one or two of them are transmitted. The scalefactor is a multiplier for the samples containing the maximum value of the block of 12 samples, such that the value of the largest sample in the block is unity. The scalefactors therefore basically perform a normalization of the 12 samples in a block. Only the lowest 8 / 12 / 27 or 30 subbands are encoded in layer 2 thus implementing a low-pass filtering and eliminating high frequencies for lower bitrates. (In MPEG2 always 30 subbands are encoded). Quantized samples may be grouped before they are encoded. Three consecutive quantized samples in a block may then form a triplet which is encoded in one codeword. The bitallocation information is actually a pointer to a table that also stores whether triplet coding has been used. 2) Layout of frame Each MPEG audio frame contains a header after which the encoded audio data is stored. A layer 2 frame contains the following data: ------------------------ | Header | (32 byte) ------------------------ | Bitallocation index | (2-4 bit per subband dependent on table, | to table | giving 64-128 bit per channel) ------------------------ | optional CRC | (16 bit) ------------------------ | Scalefactorselection | (0/2 bit per subband, giving | information | 0-60 bit per channel) ------------------------ | Scalefactors index | (3 x 6 bit per subband if used, giving | to scalefactor table | 0-576 bit per channel) ------------------------ | quantized samples | (3-16 bit per sample) ------------------------ | ancilliary data | ("padding bits") ------------------------ The framesize depends on the samplingfrequency of the PCM samples and on the desired bitrate for the MPEG audio stream. It may be calculated via the following formula: framesize = 144 * bitrate / samplingfrequency Allowed bitrates and samplingfrequencies differ between MPEG1 and MPEG2. Bits are allocated according to the desired framesize such that the 1152 PCM samples may be encoded with varying detail in streams of different bitrate.