A few technicalities

(After Young et al. 1997)

Let each spoken word be represented by a sequence of speech vectors or observations O, defined as

O = (o₁ o₂ o₃o₄o₅ ... o_n )

where o_t is the speech vector observed at time t. The isolated word recognition problem can then be regarded as that of computing

arg	max	{p(w_i\| O)}
	i

(which means "the word w_i for which the probability of that word's occurrence given the observation sequence is maximum"). So, w_i is the i'th word in the dictionary, and p(w_i| O) is the probability of w_i being the right word, given O. This probability is not computable directly, but, according to Bayes' Rule, we have:

p(w_i| O) =

p(O | w_i) p(w_i)

p(O)

p(w_i) is referred to as the prior probability that of word w_i's occurrence. It is related to that word's frequency of occurrence in some domain. It is normally estimated on the basis of the preceding sequence of words, using another probabilistic finite-state automaton, grandly termed the language model, such as this.

p(O), the observation probability, is 1, because the observation sequence is known. Hence, p(w_i| O) depends only on p(O | w_i). That is why we use an HMM which is a generator of speech vectors, as in figure 7.6: to estimate p(O | w_i). Given a set of models M_i, corresponding to words w_i, we assume that

p(O | w_i) = p(O | M_i).

For a particular state sequence X in figure 7.6

p(O, X | M_i) = a₀₁b₁(o₁) × a₁₁b₁(o₂) × a₁₂b₂(o₃) ...

However, only the observation sequence is known: the underlying state sequence X is hidden. That is why it is called a Hidden Markov Model. Given that X is unknown, the required likelihood is computed by summing over all possible state sequences. Alternatively, the likelihood can be approximated by considering only the most likely state sequence.

All this, of course, assumes that the state transition probabilities a_ijand the observation probabilities b_j(o_t) are known for each model. Herein likes the elegance and power of the HMM framework: given a set of training examples corresponding to a particular model, the parameters of that model can be determined automatically.

Next: The three basic problems for HMMs