Pitch tracking using cepstral analysis

 To conclude this discussion of cepstral analysis, let us consider a modification of cepstrum.c, such that it works right the way through a file, calculating the cepstrum every 80 samples (every 5 ms at 16,000 samples/s) and writing out the f 0 each time.

Although this program, cepstral_f0.c , is longer than any of the other programs we've seen so far, there is hardly anything new in it. It can be divided into three sections. First, lines 1 to 26 are virtually the same as the beginning of spectrum.c and cepstrum.c. These lines read the input file into the array x_in .

Second, lines 33-61 package up the remainder of cepstrum.c into a function, cepstral_f0, that calculates the cepstrum at a particular frame (as in cepstrum.c), and then finds the biggest peak in the cepstrum below 400Hz. So, instead of printing out the whole cepstrum, it just gives the cepstral peak, which is the f0 of that frame. Lines 63-71 find the peak in the cepstrum. The variable max provides working memory for the maximum value of the cepstrum so far, and max_f0 is the f0 corresponding to that peak. The loop set up in line 65 counts i through each quefrency, from 88 samples (182 Hz) to 256 samples (62.5 Hz). To alter the range in which f0 is sought, therefore, these limits can be changed.  Lines 67-69 can be paraphrased as "if the cepstral amplitude at the i'th quefrency is greater than the maximum found so far, make it the new maximum, and recalculate the peak  f0 from the current quefrency".

Third, lines 28-30, at the end of the main function, is main addition to cepstrum.c. The loop "for (i=319;i<*length-256;i+=80) " counts through the input signal from sample 319 as far as the sample which is 256 samples from the end, in jumps of 80 samples (5 ms). (Sample 319 is the 320'th sample, or the very end of the fourth 80-sample interval.) Each time, it calls the function cepstral_f0, thus computing and printing the peak f0 for that frame. Because the cepstrum computation uses a window of 512 samples centred on the current sample, the first cepstrum cannot be computed until at least 256 samples into the signal, and no later than 256 samples from the end of the signal. Since the frames are at 80-sample intervals, the first frame is at the first multiple of 80 after 256, which is sample number 319 (taking account of the fact that the first sample is numbered 0). Figure 4.11 shows the result of applying cepstral_f0 to joe.dat. (Once again, the text stream printed out by cepstral_f0 was redirected to a text file and then used to plot this graph.)

It is evident that this method of f0 tracking suffers from the limitation that it will find f 0 values even during voiceless portions, causing the  f0 track to fluctuate wildly, within the prescribed limits. These measurements are quite useless, but if we had a way of working out which parts of the signal were voiced and which voiceless, we could set the f0 track to zero during the voiceless portions. The rising pitch of the vowel of "joe" (c. samples 7000-10000) and the falling pitch of "(f)ather" (c. samples 14000-19000) are evident. Closer inspection shows that the f 0 tracking works well in shorter voiced portions, too.

Next: Exercises and Reading