Dynamic Magnetic Resonance Imaging of speech movements

In the last decade or so, there has been a number of pioneering studies of vocal tract shapes, using Magnetic Resonance Imaging. (For a page of links to other sites click here.) To begin with, conventional static images were acquired. Since the "exposure time" required in these early studies was rather long (many seconds), only "prolongable" articulations - vowels and continuant consonants (e.g. nasals, liquids and fricatives) - could be examined.

More recently, a number of research groups have attempted to acquire dynamic MRI image sequences, i.e. MRI "movies". One very successful technique involves the acquisition of single images from an utterance that is repeated over and over again. The single images, from different times in the production of the utterance, can be put together to form an animation, showing the movements of speech organs with a very fine degree of temporal resolution. You can see a clip here (9.7 MB QuickTime movie) that is made from an MR image sequence obtained by a member of the Phonetics Laboratory, Greg Kochanski, at the University of Oxford Centre for Clinical Magnetic Resonance Research. It is of repeated utterances of "I'm a spotted chicken". (Note that the clip begins about half-way through the cycle, so that the sequence is actually "chicken ... I'm a spotted".) If the images are displayed too large, it might be worth while downloading the file to your disk and then playing it at a smaller scale using e.g. the Apple QuickTime player.

However, the animation technique is just that - animation, not a real-time record of the movements of a single speech event. It assumes that the movements produced on each repetition are identical, or at least do not vary significantly. Therefore, other research groups (including ours) are investigating the use of very fast MR image acquisition ("real time" MRI)

Our first real time investigation, illustrated in the following movie clip, shows the movements of the speech organs in a phonetically untrained speaker who repeats the word "Elgar" at a moderate rate several times in succession. The images were acquired at a rate of 6 frames per second. Although this is fast enough to give the visual impression of continuous, fluid movement, at 166 ms per frame it is incapable of capturing some very quickly-changing articulatory details, such as the rapid transitions associated with transitions between consonants and vowels.

(Click on the image to view or download the movie [2.1 MByte QuickTime].)

Click to run movie

In order to attain the high frame rate of this movie (high for MRI, that is), we acquire data in only a single plane, and we have to accept a relatively poor level of contrast between different types of tissue, cartilage, bone, fat etc. Nevertheless, the contrast between tissue and air is quite clear, which makes such images quiet adequate for studying certain aspects of articulation.

In a recent study, we have investigated the suitability of this method for investigating articulations of three degrees of complexity: i) single vowels, ii) CVC syllables, and iii) whole sentences. The second movie clip is of the syllable sequence "peat, pit, pet, pat, part":

Click on one of the following to view or download the movie: 1.3 MByte AVI clip (for Windows Media Player) or 5 MByte QuickTime clip.

This sequence was acquired on a newer scanner, at a lower frame rate (3 frames per second), with a corresponding improvement in the resolution of tissue contrasts. At this frame rate, vowels are easily discriminated from one another, and some details of the consonants can be made out: the raising of the tongue tip for the final [t]'s, for instance, but not the lip movements, which are not quite in the frame.

A table of tongue positions for British English vowels (as uttered by one speaker, at least) is available here.

Larynx position in vowels

Even though 3 frames per second is quite a slow frame rate, some aspects of speech articulation are slow enough to be perfectly visible at this rate. In acquiring image sequences of vowel articulations, we were particularly interested in the way in which the position of the larynx is altered for vowels of different pitch. It has been known for some time that the whole larynx moves as part of the process of pitch control, but it is difficult to observe these movements with other instruments. With dynamic MRI, however, it is relatively straightforward to observe and measure them. For example, in the 1.7 MByte QuickTime video clip associated with the figure below, it is easy to see the fall and rise of the larynx during the production of a vowel [i], spoken with a falling-rising pitch contour:

Click here for QuickTime video clip

By using the "frame forward" button on the QuickTime player, it is possible to observe the fall and rise of the larynx, frame-by-frame. (To view the sequence to best effect, it may be helpful to set the QuickTime player to display the images at half normal size.)  Two blue lines have been superimposed on each image. The upper, horizontal line segment marks the base of the first cranial vertebra. The lower end of the sloping line is positioned at the base of the epiglottis, which moves up and down (and forwards and backwards) along with the thyroid cartilage and other tissues of the larynx.

By making measurements of all the frames in a number of such sequences, as spoken by a single phonetically-trained subject, we have determined that, for this chap at least, his larynx is significantly higher and more advanced for high-pitched vowels than for the same vowels spoken with low pitch.

We have recently completed a study of such larynx movements in 6 phonetically untrained subjects that corroborates this pattern of larynx raising and lowering for high vs. low-pitched vowels. A summary of the project report can be inspected here. (This study was supported by British Academy grant SG-36269.)