Linear Prediction of Speech
The originators of the technique, Bishnu Atal and colleagues, were working
on television pictures, two dimensional images changing in time. One way of
representing such an image is as a two dimensional matrix of numbers, each
representing the brightness of one spot on the screen, each pixel. The representation
of a single image would require a very large matrix of numbers, and to represent
a changing sequence of images, many images per second, would require huge
quantities of data. However that approach overlooks a fact about images:
the brightness of any particular point is not independent of the brightness
of neighbouring points in the image, because on a television image there
are regions of light and dark and different shades in between. A pixel doesn’t
vary too much from the brightness of its neighbours. In other words there
is a strong correlation between the brightness of any given pixel and the
brightness of its neighbours. The same holds true for speech signals, as
we can see in figure 5.5.
Figure 5.5. Portion of a signal modelled by regression
Sample number
There is a strong correlation between the magnitude of a signal at any given
sample and the magnitude of the immediately preceding samples. The magnitude
of the signal at each sample is often predictable by considering the magnitude
of the signal for the preceding few samples. Of course if we predict the value
of the magnitude of the signal at one point in time on the basis of linear
regression from the preceding samples, the actual magnitude of the signal
in that sample might be something different from our prediction. The prediction
could be in error one way or the other, too high or too low. In figure 5.5,
for example, the equation of the linear regression line (dashed) is x
[t] = 308.35 t – 5928.8, so for the next point, t =
41, we predict that x[t] will be 6714. In fact, the actual
value is 8738. However if we take the difference between the predicted value
and the actual value, the size of the difference between the prediction and
the actual value is in general very much less than the magnitude of the signal
itself at that point. In this case, the difference is –2024, an underestimate.
As we go through the signal making predictions as to what the next value of
the signal is going to be based upon the previous handful of samples, our
prediction will be more or less in error. But if we store the errors as a
separate signal the combination of the predicted signal (according to the
predictor coefficients) and the stored error accurately encodes the original
signal. The amount of information that we need to store can be significantly
less than simply storing the value of each successive sample.
Next: Linear Prediction
equation