Computation of the cepstrum in C


The first noteworthy difference between cepstrum.c and spectrum.c is that the array in which the spectrum is stored, logpsd, has 1024 cells, not 512, because it is submitted to the four1 function. The values of the spectrum go into the even-numbered cells, logpsd[2*i] , and the odd-numbered cells, logpsd[2*i+1], are set to 0.0.

The four1 function is used twice, first to calculate the spectrum, and then towards the end, to calculate the inverse Fourier transform. The third argument, -1, in the expression "four1(logpsd-1,512,-1);" indicates that the inverse transform is required. As before, four1 puts its result back in the array supplied to it, in this case logpsd . In the print-out of quefrency, the lower half of the inverse Fourier transform has 256 values, representing a quefrency range from 1 sample to 256 samples. The duration of 1 sample (at 16000 samples/s) is 0.0625 ms, so the quefrency of the i'th value is i times 0.0625, corresponding to a frequency of 16000/i. As with spectrum.c, the textual output of cepstrum can be redirected to a text file. Figure 4.8 was made by importing the cepstrum of joe.dat around sample 1000 into a graph plotting program.

A large peak is apparent in the upper panel of figure 4.8 at a quefrency of about 7 ms. The plot of the same cepstrum, on a frequency scale, is much less revealing. The spike is clearly very close to 0 Hz (because the scale goes up to 8 kHz!), but we cannot tell much more than that. Other peaks in the range 0-2 kHz correspond to vocal tract resonances, but we cannot tell their frequencies with precision. On the other hand, inspection of the quefrency data around the 7 ms point reveals that the actual quefrency of the biggest spike is at 6.5625 ms, corresponding to a frequency of 152.4 Hz. That is the fundamental frequency for that portion of the vowel.

Next: Pitch tracking using cepstral analysis