Acoustic structure of consonants

1. Revision of basic concepts

a) The source-filter model; quasi-periodic and aperiodic (noise) sources.

b) The filtering effect of the vocal tract, acting as a variable resonator.

c) The characterisation of vowel sounds by formants.

d) Other resonant sounds are also characterised by formants: sonorant consonants i.e. nasals, medial and lateral approximants. Obstruents - stops, fricatives and affricates - are characterised by a combination of intervals of noise, silence, and changing formant transitions.

2. Stops

a) In articulation, stops consist of:

i) if the stop is preceded by a vowel, a closing phase, or transition between the vowel and the stop;

ii) an interval of complete closure;

iii) an audible burst, as the closure is opened;

iv) in some stops, an interval of aspiration;

v) an interval of transitions from the stop into the following vowel or approximant, often overlapping with (iv).

b) Because of (ii), all stops have a period of silence. If the stop is fully voiced (e.g. some intervocalic voiced stops in English), there will be a low level of quasiperiodic energy during the closure.

c) The burst is very brief, with energy across the spectrum. Unreleased stops do not show a burst, of course.

d) Aspiration is a longer interval of noise, with a broad-band frequency distribution that varies with the place of articulation of the stop.

e) Prevocalic stops have a formant frequency pattern which changes in time for an interval after the release of the stop, during the early part of the vowel - the transition.

f) Postvocalic stops have similar (but 'mirror-image') formant frequency transitions at the end of the preceding vowel, as the stop closure is formed.

For the most part, (b-d) are acoustic realizations of the manner of articulation; (e-f) realize the place of articulation.

2.1. Acoustic details

a) Voicing. A low frequency voicing buzz is present in phonetically voiced sounds, seen on spectrograms as 'voice bar'. Also, in English, presence vs. absence of aspiration cues voicelessness vs. voicing, respectively.

b) Place of articulation. There are three major acoustic correlates:

(i) The spectrum of the burst:

Bilabials - energy predominantly from 600 - 800 Hz.

Alveolars - the energy is distributed across frequency range, more energy in higher areas of spectrum (4000 Hz and above).

Velars - there is a compact burst in lower to mid part of frequency range, typically about 1,800 - 2,000 Hz, but up to around 4,700 Hz.

Irrespective of other variations, bursts tend to show a strength hierarchy of voiceless aspirated > voiceless unaspirated > voiced.

(ii) Formant transitions and (iii) Loci: - Both the apparent origin and trajectory of first three formant transitions reflect articulatory movements, and are therefore considered crucial for stop recognition.

c) Post-vocalic stops: In post-vocalic position (VC sequences) the trajectories are the opposite of those in pre-vocalic stops, although these may be less clearly visible spectrograms in the case of voiceless stops.

3. Fricatives and affricates

a) Main features:

(i) frication noise -aperiodic spread of sound over part of spectrum

(ii) areas of 'anti-resonance' - where there is no energy. The frequency of these vary according to place of articulation.

b) Voicing. Again, phonetically voiced fricatives have low level voicing along with frication noise. Phonemically voiceless fricatives may have short period of aspiration before voicing onset in some languages.

c) Place of articulation. Two major acoustic features carry place distinctions:

(i) Formant transitions. Similar to those of stops. Labiodentals similar to labials, dentals similar to alveolars, though F2 locus higher.

(ii) Spectral envelope - mainly for sibilants, distribution of energy across spectrum shows:

Alveolars are characterised by random energy from approximately 2000 Hz, up to and above 8000 Hz. Peaks of energy at 4500 and 7500 Hz.

Post-alveolars have noise from about 2000 Hz - 6/7000 Hz. Peak at around 4000 Hz.

Both of the above have strong frication noise (sibilants). For non-sibilants, spectral features less distinctive.

Labiodentals and dentals both have flat spectra with a slight emphasis in high frequency energy.

Bilabials have relatively more energy in lower part of spectrum.

d) [h] is acoustically a voiceless vowel. It has a weak formant structure appropriate to following vowel, but with noise excitation rather than voicing.

e) Affricates have frication portions similar to the corresponding fricatives, preceded by stop-like 'silent' portion.

4. Nasals

a) As a class nasals share certain features. They are nearly always voiced, and they show a sudden change in formant structure from/to adjacent vowel (a 'fault transition').

Furthermore:

(i) Presence of nasal formants, the frequencies of which vary from speaker to speaker;

(ii) F1 (sometimes called N1 in nasals) is low in all nasals - typical figures 250 - 300 Hz.

(iii) Formants much weaker in nasals than in vowel sounds.

(iv) Nasals have several antiformants. Little energy above approximately 3500 Hz. Also, the vocal tract produces an antiformant between 800 and 2000 Hz - the region of F2 for most vowels.

(v) Locations of other formants and antiformants vary according to place of articulation (though also varies between individuals).

b) Place of articulation. Combination of formant, antiformant and transition information are all used in recognition, but transition information is apparently the most important.

[m] has an antiformant between 750 Hz and 1250 Hz. N2 between 1000 Hz and 1300 Hz. F2 therefore generally rising to following vowels.

[n] Fujimura gives N2 at 1000 Hz and antiformant at 1450 - 2200. On a spectrogram, second visible formant is often N4, at about 2000 Hz, giving similar transitions to alveolar stops.

[ŋ] Main anti-resonance is that above 3000 Hz. theoretically, N2 is at 1100 Hz, N3 at 1900. In practice, N2 is rather weak. There is considerable speaker to speaker variation.

5. Liquids

a) Liquids are normally voiced (though they are sometimes devoiced or realised as voiceless fricatives e.g. following a voiceless obstruent) - hence clear formant structure, though with less energy than vowels.

b) Laterals have antiformants, though usually less strong than in nasals. For example, in /l/ there is one around 2500 Hz, between the apparent F2 and F3.

c) In both [ɹ] and [l], F1 and F2 are lower than in adjacent vowels: in [l], and laterals in general, transitions very sudden (cf. nasals); in [ɹ], less so.

d) In [ɹ], F3 also falls from adjacent vowels; in [l] this is much less pronounced.

e) Clear and dark /l/ differ in their formant structure; F2 is somewhat higher - about 1500 Hz - for a clear /l/ and lower for a dark /l/.

f) Trills are characterised acoustically by a 'pattern of pulses of closures and openings' (Lindau, 1986). Their spectral structures vary considerably.

Further reading

Fry, D.B. The Physics of Speech, esp. Chapters 10 & 11.

Johnson, K. (1997) Acoustic and Auditory Phonetics. Chapters 6-8.

Ladefoged, P. (993) A Course in Phonetics. Chapter 8

Lindau, M. (1985) The story of /r/. In V. Fromkin, ed. Phonetic Linguistics

Olive, Greenwood and Coleman (1993) Acoustics of American English Speech