Linear prediction equation
 
 
 We estimate the magnitude of the current sample as a linear combination
of the previous p samples, as in figure 5.5
. Equation (5.1) expresses this idea.
 
 (5.1)    x[t] = –a1 x
 [t–1] – a2 x[t–2] – a
3  x[t–3] … – ap x[t–
p ] + e[t]
 
 p is called the order of the predictor: it is typically 12 
to 18 samples. We predict that the current sample is the sum of the previous 
p samples, each multiplied by some weighting factor, the a coefficients,
also called predictor coefficients. However, as the prediction is only an
approximation to the actual value of the current sample, the difference between
the predicted value and the current sample is an error quantity, e
[t]. The sequence of error values when equation (5.1) is used to model
a signal is called the prediction residual. 
 
 We can compute p coefficients for each sample, say 12 coefficients, 
and in addition we calculate the prediction residual, for every sample. On 
the face of it this may seem to be hardly an improvement in the economy of 
our representation of the signal. But it is not as problematic as it sounds 
for two reasons. First, the magnitude of the prediction residual as I said 
before is very much smaller than the magnitude of the original signal, so 
the error signal can be stored with fewer bits of information than the original 
signal. Second, we don’t need to compute or store 12 coefficients for 
 every sample in our representation of the speech, because the coefficients 
change very slowly, i.e. more slowly than the original signal changes. The 
coefficients collectively encode the slowly changing resonances of the vocal 
tract: slowly changing, that is, compared to the very fast rate at which the
speech signal changes. So instead of saving 12–18 coefficients per sample, 
we only need to save that many coefficients at about 10 millisecond intervals 
(every 80th sample for signals at 8000 samples/s, which is the usual sampling 
rate used with this technique). 
 
 Next: an example