Speech Processing

(Local) on-line resources

The domain for the book website, http://www.islp.org.uk/, is no longer supported, but all the content is still available from http://www.phon.ox.ac.uk/jcoleman/SLP/outline.htm

Core textbook: Coleman, J. (2005) Introducing Speech and Language Processing. Cambridge.

Other useful textbooks

Johnson, K. (1997) Acoustic and Auditory Phonetics. Blackwell.

Jurafsky, D. and J. H. Martin (2000)  Speech and Language Processing. Prentice-Hall. But this book is directed at computer science students, and is focussed more on language than on speech.

Preparatory reading for class 1

Coleman chapters 1 and 2 and p. 14. But ignore all the C code; we'll be working in GNU/Octave (i.e. Matlab)


Class 1. a. Digital signals

Coleman chapter 2; Johnson pp. 3-28.

b. Digital filters

Coleman chapter 3; Johnson pp. 28-44.


Class 2. Frequency analysis

Coleman chapter 4

More detail, if you really want it

Javkin, H. R. (1996) Speech analysis and synthesis. Chapter 7 of N. J. Lass, ed. Principles of Experimental Phonetics. Mosby.

Wakita, H. (1996) Instrumentation for the Study of Speech Acoustics. In N. J. Lass, ed. Principles of Experimental Phonetics. Mosby. Or, alternatively, an earlier version of the same paper: Wakita, H. (1976) Instrumentation for the Study of Speech Acoustics.In N. J. Lass, ed. Contemporary Issues in Experimental Phonetics. Academic Press. 3-40.

Schroeder, M. R. (1985) Linear Predictive Coding of Speech: Review and Current Directions. IEEE Communications Magazine 23 (8).54-61.


Class 3. Finite-state machines

Coleman chapter 5

Background reading: Chomsky, N. (1956) Syntactic Structures. Mouton. pp. 18-20.

Jurafsky and Martin pp. 33-52, 105-110.

Prolog reference textbook: Clocksin, W. F. and C. S. Mellish (2003) Programming in Prolog.


Class 4. Probabilistic finite-state models

Preparatory reading for class 4: Charniak, E. (1993) Statistical Language Learning. MIT Press. Chapter 2.

Coleman chapter 7

Other reading: Charniak Ch. 3.

Manning, C. D. and H. Schütze (1999) Foundations of Statistical Natural Language Processing. MIT Press. Chapter 9.

Rabiner, L. R. (1989) A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE. Reprinted in A. Waibel and K.-F. Lee (eds) Readings in Speech Recognition. Morgan Kaufmann. 267-297.



Class 5. Parsing: a quick introduction

Coleman chapters 8 and 9