Practical Examples

Practical Examples

Two very widely used toolkits for building automatic speech recognisers:

    HTK - the Hidden Markov Model Toolkit, from Cambridge University Engineering Department
    Sphinx - from Carnegie-Mellon University

As beginners on a short course, however, you won't want to (and don't need to) build your own complete system. If you just want to have a go at recognition of some text, you could try:

    WebASR - an online service at http://www.webasr.org/ from the Speech and Hearing Research Group at Sheffield University

Full speech recognition is only called for when a transcript is unavailable. But very often in linguistics and phonetics research we do have an orthographic transcription (or, equally, a script that was used to make a recording). In that case, we can use an adaptation of speech recognition, forced alignment, to time-align the words of the orthographic transcription to the audio. As a by-product, forced alignment also gives us the time-aligned segmental (typically phonemic) transcription of the audio.

Two notable systems for forced alignment are:

WebMAUS - from the Institute for Phonetics and Speech Processing, Munich

FAVE-align - An online interface to the Penn Phonetics Laboratory Forced Aligner P2FA

Both of these provide forced aligned labels in the form of Praat TextGrids.