A program for f0 estimation using the autocorrelation method
The program autocorr_f0.c uses the function “correl(data,data,512,ans);” to calculate the correlation between each 512-sample portion of the signal (referred to by the pointer variable data) and itself at every lag from –512 samples to +512 samples. This function is defined in correl.c , taken from Press et al. (1992: 546). It uses an FFT and an inverse FFT to calculate the autocorrelation function efficiently, but further explanation is beyond the scope of this course. The magnitude of the autocorrelation at each lag is returned in the array ans, so that ans[j] gives the “strength” of the autocorrelation at lag j. ans has 1025 cells: ans[0] is the correlation at lag 0, ans[1] to ans[512] give the autocorrelation at lags 1 to 512, respectively, and ans[513] to ans[1024] the autocorrelations at lags –512 up to –1. Because we are interested in f0, we inspect ans to determine the maximum autocorrelation in the range of lags specified by the variables top and bot, corresponding to the range of f0’s expected in speech.
As with cepstral_f0.c, in order to use this method we must specify upper and lower limits of the f0 range: in this case the maximum f0 was set at 180 Hz and the minimum at 80 Hz. top, the smallest lag of interest, is earlier defined to be SR/MAXF0, and bot, the largest lag of interest, is SR/MINF0. SR, the sampling rate, MAXF0 and MINF0 are #define ’d in the header, so that when the program is compiled these symbols are actually replaced by the numbers 16000, 180 and 80 respectively. These limits are not appropriate for all speakers, however, and may need to be altered. A drawback of autocorr_f0.c is that it estimates a value of f0 every sample, not just every 80 samples. This makes it very slow to run — it takes 10 seconds to analyse 1 second of speech on my computer — though it would not be difficult to alter it so that it only estimates f0 value every 80 samples (see exercise 5.1).