Obtaining speech parameters from sound files

Two example data files: FIFTEEN3.wav and FIFTEEN165.wav

Using a general-purpose package/tool such as wavesurfer or Praat

Using a programme or script

ESPS (Entropics Signal Processing Package)

ESPS uses a proprietary audio file format (".sd" format), so you first have to convert audio files in other formats to that. First, find out what the sampling rate etc are of your original audio file; e.g.

$ sox --info FIFTEEN3.wav

Second, convert the audio file to .raw (headerless) format:

$ sox FIFTEEN3.wav FIFTEEN3.raw

Third, convert the .raw file to .sd using the ESPS btosps function, e.g.

$ btosps -f 16000 -n 1 -t SHORT -c "" FIFTEEN3.raw FIFTEEN3.sd

Now you can run analyses such as voicing and f₀ tracking, using e.g. ESPS get_f0 function:

$ get_f0 -i 0.005 FIFTEEN3.sd FIFTEEN3.f0

Formant frequencies and bandwidths:

$ formant FIFTEEN3.sd FIFTEEN3.fb

LPC coefficients:

$ refcof -p '1:50000' -l 160 -S 80 -m "burg" -z FIFTEEN3.sd FIFTEEN3.rc
$ transpec -m "AFC" FIFTEEN3.rc - | pplain - >FIFTEEN3.afc.csv
$ get_resid -a 2 FIFTEEN3.sd FIFTEEN3.rc FIFTEEN3-resid.d

You can convert the various output files (.sd, .f0, .fb, .rc etc) to plain ASCII using e.g.

$ pplain FIFTEEN3.f0 >FIFTEEN3.f0.csv

You can convert output .d or .sd audio files to .raw by "beheading" them:

$ bhd FIFTEEN3-resid.d FIFTEEN3-resid.raw
$ sox -r 16000 -b 16 -e signed-integer FIFTEEN3-resid.raw FIFTEEN3-resid.wav

Fitting a known function to some observed data