1. The sources of sound in speech production
We consider the production of speech as consisting of two kinds of operations: (1) the generation of sound sources, at the glottis or at some point along the length of the vocal tract, and (2) the filtering of these sources by the vocal tract.
There are two principle kinds of sources:
a) turbulence noise (present in [s], [ʃ], [f], [tʃ], etc.), and
b) vocal-fold vibration (present in vowels, nasals etc.).
Both sources can act together in some cases (e.g. [z], [ʒ], [v], etc.).
2. Turbulence noise sources
Turbulence occurs in the rapid air flow at a constriction; the turbulence consists of fluctuations in the velocity of the air flow. These fluctuations give rise to a source of sound pressure in the vocal tract at the point where the turbulence occurs.
There are two ways of generating turbulence noise at a constriction above the glottis, and these two ways lead to a source of noise quite different in intensity.
(1) The air stream at a constriction is directed against a surface or obstruction. For example, the tongue may be grooved to direct the air stream against the teeth or alveolar ridge. This leads to noise of rather high intensity. Consonants produced with this type of noise are sometimes called strident. The consonants [f], [s], [ʃ], [v], [z] and [ʒ] are strident.
(2) The air stream is directed in such a way that it does not impinge on any surface. The noise is of much lower intensity. Consonants produced with this type of noise are sometimes called non-strident or mellow. The consonants [ɸ], [θ], [ç], [β], [ð] and [ɣ] are non-strident.
Turbulence noise can be produced at a constriction at the glottis, or at a constriction made with the tongue or lips above the glottis.
Aspiration noise is produced at a glottal constriction. The sounds [h], [ɦ], [ph], etc. are produced with aspiration noise; this noise can exist together with vocal-fold vibration (as is the case for [ɦ], for instance).
Frication noise is produced at a supraglottal constriction. The constriction can occur at many different points along the vocal tract. Examples of sounds with frication noise are [s], [ʃ], [tʃ], [dʒ], [z], [ʒ], [f], [v].
3. Voice source
When the vocal cords vibrate, the rate of air flow through the glottis rises and falls. Typical average values for frequency of voicing:
adult male voice: | 125 Hz | |
adult female voice: | 220 Hz | |
child's voice: | 300 Hz |
dB![]() f (Hz)
(Click on the graph to hear the sound of the source.) |
4. Some ideas concerning spectral analysis and filtering
For an acoustic resonator like the vocal tract, if we were to apply a sinusoidal source at the glottis (instead of the pulse-like source of voicing) then we would obtain an output at the mouth that was also sinusoidal with the same frequency, but with a different amplitude. If the amplitude of the input is A, and the amplitude of the output is B, then at this frequency f, we say that the transfer function is
p(f) ![]() 5500
f (Hz)
|
The peaks of this spectrum are called formants. For this particular vowel, what are the formant frequencies?
The formant frequencies are determined by the cross-sectional area of the vocal tract at different points along its length.
Further reading
Johnson, K. (1997) Acoustic and Auditory Phonetics. Chapter 4.