The Source-Filter Model of Speech Production

1. The sources of sound in speech production

We consider the production of speech as consisting of two kinds of operations: (1) the generation of sound sources, at the glottis or at some point along the length of the vocal tract, and (2) the filtering of these sources by the vocal tract.

There are two principle kinds of sources:

a) turbulence noise (present in [s], [ʃ], [f], [tʃ], etc.), and

b) vocal-fold vibration (present in vowels, nasals etc.).

Both sources can act together in some cases (e.g. [z], [ʒ], [v], etc.).

2. Turbulence noise sources

Turbulence occurs in the rapid air flow at a constriction; the turbulence consists of fluctuations in the velocity of the air flow. These fluctuations give rise to a source of sound pressure in the vocal tract at the point where the turbulence occurs.

There are two ways of generating turbulence noise at a constriction above the glottis, and these two ways lead to a source of noise quite different in intensity.

(1) The air stream at a constriction is directed against a surface or obstruction. For example, the tongue may be grooved to direct the air stream against the teeth or alveolar ridge. This leads to noise of rather high intensity. Consonants produced with this type of noise are sometimes called strident. The consonants [f], [s], [ʃ], [v], [z] and [ʒ] are strident.

(2) The air stream is directed in such a way that it does not impinge on any surface. The noise is of much lower intensity. Consonants produced with this type of noise are sometimes called non-strident or mellow. The consonants [ɸ], [θ], [ç], [β], [ð] and [ɣ] are non-strident.

Turbulence noise can be produced at a constriction at the glottis, or at a constriction made with the tongue or lips above the glottis.

Aspiration noise is produced at a glottal constriction. The sounds [h], [ɦ], [p^h], etc. are produced with aspiration noise; this noise can exist together with vocal-fold vibration (as is the case for [ɦ], for instance).

Frication noise is produced at a supraglottal constriction. The constriction can occur at many different points along the vocal tract. Examples of sounds with frication noise are [s], [ʃ], [tʃ], [dʒ], [z], [ʒ], [f], [v].

3. Voice source

When the vocal cords vibrate, the rate of air flow through the glottis rises and falls. Typical average values for frequency of voicing:

adult male voice:		125 Hz
adult female voice:		220 Hz
child's voice:		300 Hz

During normal speech production, the frequency of voicing varies over an octave or more (e.g. 80‒160 Hz for an adult male voice).

f (Hz)

Spectrum of the voice source
(Click on the graph to hear the sound of the source.)

4. Some ideas concerning spectral analysis and filtering

For an acoustic resonator like the vocal tract, if we were to apply a sinusoidal source at the glottis (instead of the pulse-like source of voicing) then we would obtain an output at the mouth that was also sinusoidal with the same frequency, but with a different amplitude. If the amplitude of the input is A, and the amplitude of the output is B, then at this frequency f, we say that the transfer function is

For each frequency at the input, we will get a different transfer function, i.e. a different ratio of output amplitude to input amplitude. For the vocal tract with the source at the glottis, the transfer function has several peaks, at different frequencies. These are the resonant frequencies of the vocal tract. They appear as peaks in the spectrum of the sound pressure, e.g.

p(f)

5500

f (Hz)

(Click on the graph to hear the sound from which this spectrum was obtained.)

The peaks of this spectrum are called formants. For this particular vowel, what are the formant frequencies?

The formant frequencies are determined by the cross-sectional area of the vocal tract at different points along its length.

Further reading

Johnson, K. (1997) Acoustic and Auditory Phonetics. Chapter 4.