Applications of LPC (1): speech synthesis

Speech can be synthesized (or resynthesized) by providing either the prediction residual, or a synthetic version of the residual, together with the predictor coefficients. In the simplest method of synthesis, we first work out which portions of the original signal are voiced and which are unvoiced. For the voiced parts of the signal, the error can be modelled by an impulse source that approximates the sequence of spikes seen in the prediction residual. To do this, we mainly need to make sure that the spacing between the spikes is right for the desired frequency of voicing (as described in lecture 2 , for the Klatt synthesizer — the idea is the same). In this way we can alter the pitch contour to a fair degree, if we wish. The voiceless parts of the signal can modelled with white noise signal of roughly the right amplitude.

Generating voiced speech with an impulse source in LPC tends to produce synthetic speech which is rather “buzzy”. Other forms of excitation have also been investigated. Atal also demonstrated that an improvement could be made to the quality of LPC-synthesized speech by using multipulse excitation , that is, modelling each pitch period with several impulses per period. For the voiceless parts, higher fidelity synthesis can be attained by spectral analysis of  segments of the prediction residual. If we use a library of stored spectra, each of which is given a reference number, we could describe the spectrum of each portion of the residual signal by the number of the most similar stored spectrum. A synthetic version of the residual can then be made from a sequence of spectral reference numbers, pulling each spectrum out of storage by its code number. That method is the code-book excitation method of linear prediction, usually referred to by the acronym CELP (code-book excited linear prediction). It yields a fairly naturalistic encoding and resynthesis of speech, for which reason it is used in a variety of communications contexts, such as military communications. See, for example Tremain (1982), concerning LPC-10, “the US government standard linear predictive coding algorithm”, or Campbell et al. (1991), concerning the US Federal Standard 1016 CELP coder. Further useful information, including software, is available from the comp.speech FAQ web site.

Coding of this kind is suitable for use in the transmission of speech down telephone wires or other kinds of telecommunication links, such as mobile telephony and “Internet phones”. In these applications, the listener isn’t usually aware that the speech that they are hearing is a reconstruction of an encoded version of the speaker’s speech.

One of the benefits of linear prediction, perhaps the main benefit, is that it is easy to compute: getting the coefficients from natural speech is not difficult. Contrast this with formant synthesis, or articulatory synthesis: for both of these, obtaining the synthesizer parameters from natural speech is extremely challenging, making these synthesis methods impractical for many purposes. Finding linear predictor coefficients is simply a question of minimising the size of the prediction residual. A number of algorithms have been developed to do this, as Schroeder (1985) and Wakita (1976, 1996) discuss.

Next: Applications of LPC (2): spectral analysis