Speech can be synthesized (or resynthesized) by providing
either the prediction residual, or a synthetic version of the residual, together
with the predictor coefficients. In the simplest method of synthesis, we
first work out which portions of the original signal are voiced and which
are unvoiced. For the voiced parts of the signal, the error can be modelled
by an impulse source that approximates the sequence of spikes seen in the
prediction residual. To do this, we mainly need to make sure that the spacing
between the spikes is right for the desired frequency of voicing (as described
in
lecture 2
, for the Klatt synthesizer — the idea is the same). In this way we can alter
the pitch contour to a fair degree, if we wish. The voiceless parts of the
signal can modelled with white noise signal of roughly the right amplitude.
Generating voiced speech with an impulse source in LPC tends to produce synthetic
speech which is rather “buzzy”. Other forms of excitation have also been
investigated. Atal also demonstrated that an improvement could be made to
the quality of LPC-synthesized speech by using
multipulse excitation
, that is, modelling each pitch period with several impulses per period.
For the voiceless parts, higher fidelity synthesis can be attained by spectral
analysis of segments of the prediction residual. If we use a library
of stored spectra, each of which is given a reference number, we could describe
the spectrum of each portion of the residual signal by the number of the
most similar stored spectrum. A synthetic version of the residual can then
be made from a sequence of spectral reference numbers, pulling each spectrum
out of storage by its code number. That method is the code-book excitation
method of linear prediction, usually referred to by the acronym CELP (code-book
excited linear prediction). It yields a fairly naturalistic encoding and
resynthesis of speech, for which reason it is used in a variety of communications
contexts, such as military communications. See, for example Tremain (1982),
concerning LPC-10, “the US government standard linear predictive coding algorithm”,
or Campbell et al. (1991), concerning the US Federal Standard 1016 CELP coder.
Further useful information, including software, is available from the comp.speech
FAQ web site.
Coding of this kind is suitable for use in the transmission of speech down
telephone wires or other kinds of telecommunication links, such as mobile
telephony and “Internet phones”. In these applications, the listener isn’t
usually aware that the speech that they are hearing is a reconstruction of
an encoded version of the speaker’s speech.
One of the benefits of linear prediction, perhaps the main benefit, is that
it is easy to compute: getting the coefficients from natural speech is not
difficult. Contrast this with formant synthesis, or articulatory synthesis:
for both of these, obtaining the synthesizer parameters from natural speech
is extremely challenging, making these synthesis methods impractical for
many purposes. Finding linear predictor coefficients is simply a question
of minimising the size of the prediction residual. A number of algorithms
have been developed to do this, as Schroeder (1985) and Wakita (1976, 1996)
discuss.
Next: Applications of LPC (2):
spectral analysis