Linear prediction equation

We estimate the magnitude of the current sample as a linear combination of the previous p samples, as in figure 5.5 . Equation (5.1) expresses this idea.

(5.1) x[t] = –a₁ x [t–1] – a₂ x[t–2] – a₃ x[t–3] … – a_p x[t– p ] + e[t]

p is called the order of the predictor: it is typically 12 to 18 samples. We predict that the current sample is the sum of the previous p samples, each multiplied by some weighting factor, the a coefficients, also called predictor coefficients. However, as the prediction is only an approximation to the actual value of the current sample, the difference between the predicted value and the current sample is an error quantity, e [t]. The sequence of error values when equation (5.1) is used to model a signal is called the prediction residual.

We can compute p coefficients for each sample, say 12 coefficients, and in addition we calculate the prediction residual, for every sample. On the face of it this may seem to be hardly an improvement in the economy of our representation of the signal. But it is not as problematic as it sounds for two reasons. First, the magnitude of the prediction residual as I said before is very much smaller than the magnitude of the original signal, so the error signal can be stored with fewer bits of information than the original signal. Second, we don’t need to compute or store 12 coefficients for every sample in our representation of the speech, because the coefficients change very slowly, i.e. more slowly than the original signal changes. The coefficients collectively encode the slowly changing resonances of the vocal tract: slowly changing, that is, compared to the very fast rate at which the speech signal changes. So instead of saving 12–18 coefficients per sample, we only need to save that many coefficients at about 10 millisecond intervals (every 80th sample for signals at 8000 samples/s, which is the usual sampling rate used with this technique).

Next: an example