You are here

Acoustic book

CD and MP3


It is a plastic disk of 12 cm diameter.

Data are reported on it in the form of notches of about 0.5 mm of width and from 0.8 to 3.2 mm long relative to digital code type (audio signal, synchronization, control) of containing capacity, laterally 1.6 mm spaced.

The notches are incised on just one side of the disc and they are aligned on a spiral track starting from the inner part of the disc. The incision has a depth of an mm fraction.

In DVDs in all of these dimensions are halved, and, besides, they can have two different writing levels.

The distance between a track and the next is about 1.6 m (the half in DVDs). The incised part ranges from a 25 mm to a 58 ray, and therefore it has a 33 mm extension, for which the spirals are 33 mm / 1.6 mm = 20,625.
The total length of the track is obtained by multiplying this number by the average length of a spiral (260 mm) and its value is of 5.38 km.
The disc rotates at a variable angular speed, so that the sliding speed of the track is always of 1.2 m / sec compared to the reading head. The maximum total duration of the sound reproduction is therefore 5.38 km / 1.2 m / sec = 75 minutes.


Reading requires a slight ray of laser light (wavelength in the order of 780 nm in CDs and 650 nm in DVDs) hitting the tracks and subsequently reflected. If the ray hits a part lacking in notches, it is completely back reflected. Notches reflection is instead much lesser. The optical structure of a CD player is based on the presence of a prism that deflects the light ray. A photodiode detects the light and an electronic circuit transforms the variations in quantized signals. A processor then decodes the signal and sends it to a D / A converter.

The main problem is related to the very small tracks maintenance and thus the track maintenance is not a mechanical act, as it happens in traditional turntables, but electronic, and it is based on the use of various rays (obtained by the same laser ray) and on the comparison between the brightness respective function of time. Also the correct focus is based on the use of different beams (obtained from the same laser beam) and on a comparison between the respective brightness in relation to time.


The audio encoding also used in CDs (linear encoding) require the storage of 1.41 Mbits per music second. This also means that the real time transmission (through the Internet, for example) of a so encoded audio file requires a data stream (bit rate) of 1.41 Mbits/second. Both these aspects of linear encoding (the need for a high amount of memory and the necessity for high-speed data streams) highlight problems we are trying to overcome trough perceptive encodings. They rely on the fact that the final result must be evaluated for its perceptive quality rather than for its physical correspondence to the recorded sound. The main considered perceptive phenomenon is that of masking.

The masking acts on sounds that are relatively close in frequency to the masking sound. For this reason, the usual techniques of perceptive coding mainly perform a division of the useful frequency band (the half of the sampling frequency) into subbands (in the case of MPEG encoding 1, into 32 subbands of equal amplitude).
A set of digital filters separates the subbands, and each subband there are "windows" corresponding to a fixed number of samples at a certain time interval.
A fast Fourier transform allows to identifying the present frequency components as well as their intensity. For each subbands the average value o the samples is calculated. In correspondence of the intensity peaks of each subband the masking curves are calculated, also taking account of the effects of each subband on the others. Those bands where the signals are below the level of audibility so calculated are not encoded. The others are normalized compared to average value (which may also be calculated on a greater points number) and encoded with a number of bits depending on the ratio between the maximum signal peak and the new audibility level. Indeed, it is important to make sure that the quantization noise is lower than the calculated audibility level.

Therefore, where the signal / level of audibility ratio is higher, more bits will be needed (please remember the approximate connection Ratio Signal / Noise = 6.n, where n stands for the number of bits), where it is lower and with less bits. But how many bits are available? This depends on the bit rate chosen for the encoded file. The bit rate for linear encoding is of 1.41 Mbit / second. Perceptive encodings achieve a significant compression compared to this bit rate, and, therefore, also compared to the memory storage.
In the case of MPEG-1 encoding, there is a range that goes from 64 KB / sec (even less for speech) up to 448 Kbit / sec. A perceptually indistinguishable quality from the linear encoding one is already achieved at 128 Kbit / sec. For each " time window " it is thus available a certain number of bits that is allocated among the subbands (permanently or in a dynamically way). If some bits remain, they are held for the next window. In certain types of encoding, the bit rate may also be variable.


The MPEG-1 Audio Layer 1 encoding is defined by Motion Picture Expert Group (MPEG) of the International Electrotechnical Commission (IEC) together with the International Standard Organisation (ISO). The ISO / IEC 11172 MPEG-1 standard is dated back to1992. It provides an input window of 384 samples (corresponding to a time interval of 8.7 ms for a 44.1 kHz sampling). The bandwidth of 20 kHz is divided into 32 subbands of equal width, in each of which are calculated 12 values ​​per window (one every 32 input samples). This operation is not a sub-sampling, because it is performed in each band separately. Therefore, if the 12 subbands samples were encoded in 16 bits and all bands were encoded, there would not be any reduction in the bit stream.
In fact, there would be 32.12.16 bits = 6144 bits for each input window, the same number of the input window bits. The MPEG- 1 Audio Layer 1 encoding gives 4 bits to encode the 12 values ​​of each subband normalized to the mean value calculated on the entire window. To this average value are assigned 6 bits. If all bands were encoded, the number of output bits for each input window would then be of 32.12.4 + 32.6 = 1728 bits, with a reduction proportion compared to the input bit of 1728/6144 = 0.28. But the ratio becomes, in general, much more favourable, because only a part of the bands is encoded.


The MP3 encoding corresponds to the third level of complexity of the MPEG-1 (Audio Layer 3) perceptive encoding. It includes everything previously said plus a series of improvements, including the possibility to vary the time window on which the Fourier transform is performed (wider for still signals, more restricted for variable signals), a more effective algorithm for the allocation of bits to the subbands and a further reduction of data stream through other encodings, such as that based on the strings length, used in fax transmissions.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer