Professional Documents
Culture Documents
MPEG Audio Compression-1
MPEG Audio Compression-1
MPEG Audio Compression-1
1. PSYCHOACOUSTICS
|
Psychoacoustics is the research where you aim to understand how the ear and brain interact as various sounds enter the ear. A perceptual audio codec is a codec that takes advantage of this human characteristic. The range of human hearing is about 20 Hz to about 20 kHz The easiest frequency range to perceive by humans is typically from about 2 kHz to 4 kHz The dynamic range, the ratio of the maximum sound amplitude to the quietest sound that humans can hear, is on the order of about 120 dB
EQUAL-LOUDNESS RELATIONS
|
Fletcher-Munson Curves
y
Equal loudness curves that display the relationship between perceived loudness ( (Phons Phons , in dB) for a given stimulus sound volume (Sound Pressure Level, also in dB), as a function of frequency
EQUAL-LOUDNESS RELATIONS
Fig 1: Flaetcher-Munson Fig.1: Flaetcher Munson Curves (re-measured (re measured by Robinson and Dadson)
THRESHOLD OF HEARING
|
0.8
6.5 e
+ 10 0 3 ( f / /1000) 000) 4
The threshold units are dB; the frequency for the origin (0,0) in formula (14.1) is 2,000 Hz: Threshold(f) = 0 at f =2 kHz
6
FREQUENCY MASKING
|
Experiments have shown that the human ear has 24 frequency bands (Basilar membrane). Frequencies in these so called critical bands are harder to distinguish by the human ear.
FREQUENCY MASKING
|
Suppose there is a dominant tonal component present in an audio signal. The dominant noise will introduce a masking threshold that will mask out frequencies in the same critical band (see Figure below). This frequency-domain phenomenon is known as simultaneous masking, which has been observed within critical bands.
Frequency masking is studied by playing a particular pure tone, say 1 kHz again, at a loud volume and determining how this tone affects our volume, ability to hear tones nearby in frequency
y
one would g generate a 1 kHz masking g tone, , at a fixed sound level of 60 dB, and then raise the level of a nearby tone, e.g., 1.1 kHz, until it is just audible
The threshold in Fig. 3 plots the audible level for f a single masking tone (1 kHz) Fig. 4 shows Fi h how h the h plot l changes h if other h masking ki tones are used
9
10
CRITICAL BANDS
|
Critical bandwidth represents the ears resolving power for simultaneous tones or partials
y
At the th low-frequency l f end, d a critical iti l band b d is i less l than th 100 Hz wide, while for high frequencies the width can be greater than 4 kHz
12
13
14
BARK UNIT
Bark unit is defined as the width of one critical band, for any masking frequency | The Th idea id of f the th Bark B k unit: it every critical iti l band b d width idth is roughly equal in terms of Barks (refer to Fig. 5)
|
15
f /100, for f < 500 , Critical band number (Bark) = og 2 ( f / /1000), 000), for o f 500.% 9 + 4 log
|
(2) ( )
Another formula used for the Bark scale: b = 13.0 13 0 arctan(0.76 arctan(0 76 f)+3.5 f)+3 5 arctan(f2/56.25) arctan(f2/56 25) where f is in kHz and b is in Barks (the same applies to all below) (3)
The critical bandwidth (df) for a given center frequency f can also be approximated by: df = 25 + 75 [1 + 1.4(f2)]0.69 (5)
16
TEMPORAL MASKING
|
Phenomenon: any loud tone will cause the hearing receptors in the inner ear to become saturated and require time to recover The following figures show the results of Masking experiments:
17
Fig. 6: The louder is the test tone, the shorter it takes for our hearing to get over hearing the masking.
TEMPORAL MASKING
|
Temporal masking occurs in the time-domain. A stronger tonal component (masker) will mask a weaker one (maskee) if they appear within a small interval of time. The masking threshold will mask weaker signals pre and post to the masker. Premasking usually lasts about 50 ms while postmasking will last from 50 to 300 ms, depending on the strength and duration of the masker as shown in Figure below.
18
TEMPORAL MASKING
Fig. 14.8: For a masking tone that is played for a longer time, it t k longer takes l before b f a test t t tone t can be b heard. h d Solid curve: masking tone played for 200 ms; dashed curve: masking tone played for 100 ms.
19