Linear prediction equation
We estimate the magnitude of the current sample as a linear combination
of the previous p samples, as in figure 5.5
. Equation (5.1) expresses this idea.
(5.1) x[t] = –a1 x
[t–1] – a2 x[t–2] – a
3 x[t–3] … – ap x[t–
p ] + e[t]
p is called the order of the predictor: it is typically 12
to 18 samples. We predict that the current sample is the sum of the previous
p samples, each multiplied by some weighting factor, the a coefficients,
also called predictor coefficients. However, as the prediction is only an
approximation to the actual value of the current sample, the difference between
the predicted value and the current sample is an error quantity, e
[t]. The sequence of error values when equation (5.1) is used to model
a signal is called the prediction residual.
We can compute p coefficients for each sample, say 12 coefficients,
and in addition we calculate the prediction residual, for every sample. On
the face of it this may seem to be hardly an improvement in the economy of
our representation of the signal. But it is not as problematic as it sounds
for two reasons. First, the magnitude of the prediction residual as I said
before is very much smaller than the magnitude of the original signal, so
the error signal can be stored with fewer bits of information than the original
signal. Second, we don’t need to compute or store 12 coefficients for
every sample in our representation of the speech, because the coefficients
change very slowly, i.e. more slowly than the original signal changes. The
coefficients collectively encode the slowly changing resonances of the vocal
tract: slowly changing, that is, compared to the very fast rate at which the
speech signal changes. So instead of saving 12–18 coefficients per sample,
we only need to save that many coefficients at about 10 millisecond intervals
(every 80th sample for signals at 8000 samples/s, which is the usual sampling
rate used with this technique).
Next: an example