A program for f0 estimation
using the autocorrelation method
The program autocorr_f0.c
uses the function “correl(data,data,512,ans);” to calculate the
correlation between each 512-sample portion of the signal (referred to by
the pointer variable data) and itself at every lag from –512 samples
to +512 samples. This function is defined in correl.c
, taken from Press et al. (1992: 546). It uses an FFT and an inverse
FFT to calculate the autocorrelation function efficiently, but further explanation
is beyond the scope of this course. The magnitude of the autocorrelation
at each lag is returned in the array ans, so that ans[j]
gives the “strength” of the autocorrelation at lag j. ans
has 1025 cells: ans[0] is the correlation at lag 0, ans[1]
to ans[512] give the autocorrelation at lags 1 to 512, respectively,
and ans[513] to ans[1024] the autocorrelations at lags
–512 up to –1. Because we are interested in f0, we inspect
ans to determine the maximum autocorrelation in the range of lags
specified by the variables top and bot, corresponding to
the range of f0’s expected in speech.
As with cepstral_f0.c, in order to use this method we must specify
upper and lower limits of the f0 range: in this case
the maximum f0 was set at 180 Hz and the minimum at
80 Hz. top, the smallest lag of interest, is earlier defined to
be SR/MAXF0, and bot, the largest lag of interest, is
SR/MINF0. SR, the sampling rate, MAXF0 and MINF0
are #define ’d in the header, so that when the program is compiled
these symbols are actually replaced by the numbers 16000, 180 and 80 respectively.
These limits are not appropriate for all speakers, however, and may need
to be altered. A drawback of autocorr_f0.c is that it estimates
a value of f0 every sample, not just every 80
samples. This makes it very slow to run — it takes 10 seconds to analyse
1 second of speech on my computer — though it would not be difficult to alter
it so that it only estimates f0 value every 80 samples
(see exercise 5.1).
Next: Explanation of the
program