1. What is intonation?

Jones (1960) - "the variations which take place in the pitch of the voice in connected speech, i.e. the variations in the pitch of the musical note produced by vibration of the vocal cords."

Unlike lexical tone (as in tone languages), changing intonation does not change the lexical identity/meaning of individual words, though it may alter the meaning of the sentence as a whole.

Pitch accent languages (e.g. Japanese, Swedish) used to be regarded as an intermediate case: superficially like lexical tone languages, but phonologically pitch functions like stress in these languages. In most stress-accent languages, pitch is an important correlate of stress, so the dividing lines between tone, stress and pitch-accent are fuzzy.

2. Early treatments

E.g. Steele (1775), Jones (1960) recorded intonation for whole sentences. Jones, following Kingdon (1958), analysed English intonation in terms of two sentence tunes. Refer to attached extracts from Jones for examples of the two tunes in use. It was recognised that the tunes might be distributed over a larger or smaller number of syllables, and that an utterance with several "sense groups" might have a multiply-peaked pitch contour, but the syntax of tunes was not explored deeply.

O'Connor and Arnold (1973) divided intonation groups into four parts:

1. The pre-head - all the initial unaccented syllables.

2. The head - between the pre-head and the nucleus.

3. The nucleus - the main accented syllable.

4. The tail - all the syllables after the nucleus.

They identified 10 tunes.

3. Tonetic stress marks

Kingdon, O'Connor and Arnold and others employed a variety of diacritic symbols known as tonetic stress marks to denote various intonational events. Accents were held to be dynamic (contour) tones. The most important accents in English are:

Tonetic stress marks

(Current IPA tone marks include: high (level) tone: é, low (level) tone: è, (high) falling tone: ê, rising tone: Rising tone)

This approach, characteristically of structuralist methodology, concentrates on compendious exemplification and collection of large, annotated, orderly corpora of categorized examples, rather than the formulation of inviolable rules for determining the intonation patterns and their alignment with text.

4. Origins of the autosegmental approach to intonation

Goldsmith (1981) proposed that English lexical stress could be characterised by a MHL autosegmental melody, in which the H tone corresponds with the strongest stress, marked with a *:

English stress as tone

Liberman (1975) pursued the same approach to characterise English intonation more generally. For example, he identified a LHM "calling" intonation, in which the H tone docks onto the main stress, and the initial L tone spreads in the usual autosegmental fashion to all pre-stress syllables:

Calling intonation

The fruition of this line of research is seen in Pierrehumbert (1980) and subsequent work from the same perspective (e.g. Liberman and Pierrehumbert 1984, Beckman and Pierrehumbert 1986, Pierrehumbert and Beckman 1988).

5. The phonetic basis

Pre-Liberman approaches to intonation were based on impressionistic pitch records, supplemented by some instrumental analysis of f0. Pierrehumbert (1980: 3):

Liberman, Pierrehumbert and Beckman were engaged in the construction of speech synthesis systems for English and Japanese, which required explicit control of f0and segmental durations (including pauses). (See Pierrehumbert 1981). All other phonetic parameters were generated by a scheme for concatenation of LPC-encoded diphones. Unlike much other research in linguistics, such work permits no hand-waving.

Some properties of f0:

a) f0 corresponds to rate of vibration of the vocal cords.

b) Therefore, f0 = 0 during unvoiced speech e.g. during voiceless consonants as well as pauses.

c) f0 is therefore discontinuous, though there may be an underlying appearance of continuity (see fig. 1.5).

d) The overall shape of the f0 contour is under the conscious control of the speaker, but some speech sounds introduce fine-scale "microprosodic" perturbations, often due to aerodynamic factors. In particular, high vowels tend to raise f0; voiceless obstruents tend to raise f0 at the start of the following vowel; and voiced consonants and the glottal stop are associated with a drop in f0. It is important not to mistake such perturbations for accents.

e) Speakers do not usually use their full pitch range in speech. The actual range may vary e.g. be larger in more animated speech. In addition, speakers may employ a higher or lower "register" within their normal spoken pitch range. In some languages, register appears to be phonological.

f) A speaker's pitch range may fall or rise during speech, independently of the falls and rises of f0:


This phenomenon is called downdrift or declination.

g) When the top line appears to step down, rather than gradually drift, we have the related phenomenon of downstep, catathesis or tone terracing:


In tone languages, downstep typically affects H tones after a L. "List intonation" is similar eg. "Blueberries, bayberries, raspberries, mulberries and brambleberries". The high-pitched "calling" intonation in fig. 1.1C shows two high peaks. Pierrehumbert analysed such cases as an instance of downstep, and thus analysed the first accent as not just a simple H tone, but as a H on the stressed syllable, combined with a L target at the end of the first syllable, which conditions downstep of the following H tone. As in other areas of autosegmental phonology, Pierrehumbert treated dynamic accents as a sequence of two tones (bitonal accents).

6. Functions of intonation

A. Intonation and syntactic structure

1a) Here's a word you can look ûp. ("Up" is a particle.)

b) Here's a chimney you can lóòk up. ("Up" is a preposition.)

2 a) Bond had instructions to léàve. (So he left.)

b) Bond had instrûctions to leave. (So he left them.)

In the preceding examples, placement of the accent encodes a difference in syntactic structure. In the following examples, the major intonational phrase may be broken into two intermediate phrases, to denote a higher syntactic boundary.

3 a) Have you seen any Martians who have green nôses? (One phrase: restrictive relative.)

b) Have you seen any Mârtians, who have green nôses? (Two phrases: non-restrictive relative.)

4 a) He can't see cléàrly. (One accent, one phrase.)

b) He can't sèe, clèarly. (Two accents, two phrases.)

In earlier descriptive studies, this phrasing was regarded as a question of two intonational boundaries:

In contemporary approaches, intonation is characterized by a constituent structure (the prosodic hierarchy). In its simplest form, this is a simple two level structure:

Prosodic structure

Richer hierarchical structures were developed in Pierrehumbert and Beckman (1988).

B. Intonation and meaning

1 a) Johni called Billj a Republican, and then héj insulted hîmi. (To call someone a Republican is an insult.)

b) Johni called Billj a Republican, and thén hei insûlted himj. (To call someone a Republican is not an insult.)

2 a) I didn't go, because my hâir was dirty.

b) I didn't go because my hâir was Rising tonedirty. (I went for some other reason.)

C. Intonation and discourse structure, specifically focus

Refer to fig. 15 A-C. The text is the same in each case. In fig. 1.5 A, vitamins is accented, and hence focussed. This intonation might be a suitable reply to the preface "Legumes aren't good for anything, are they?". In fig 1.5 B, good is accented, hence focussed. This pattern might be a suitable retort to "Aren't legumes a lousy source of vitamins?". In fig. 1.5 C, legumes is accented. Preface: "What's a good source of vitamins?".

7. Types of tones

Pierrehumbert distinguished between different types of tonal targets. We have seen various examples of dynamic accents, which are the head elements of intonational phrases. In addition, Pierrehumbert proposed to use H and L boundary tones at the beginning and end of major phrases, as well as a H or L phrase accent at the end of each intermediate phrase. Unlike standard autosegmental theory, Pierrehumbert did not employ spreading to derive the tone of unaccented syllables, but saw that as a matter of phonetic interpolation between phonologically-specified targets. In other words, the phonological representation of intonation is phonetically underspecified.

Pitch accents were marked with a *

Phrase accents were marked with a -

Boundary tones were marked with a %

*, - and % are just diacritics, unrelated to f0value. They only show how the tone is related to the text.

H* and L*

Phrase accents and boundary tones are not associated to segmental material, like pitch accents, but to prosodic nodes:

Association to prosodic structure

8. Same text, different tunes

Refer to figs. 1.1 and 1.2:

9. Same tune, different texts - compare figs. 1.4 A and B.

10. Boundary tones

11. Pierrehumbert's Bitonal Pitch Accents

L* + H- "Scoop". A low tone with sharp rise to a high peak. See fig. 1.1 D.

L- + H* "Rising peak". A high peak preceded by a sharp rise from a valley in the lowest part of the pitch range. (Not illustrated here.)

H* + L- A H* that induces following downstep. (Abandoned since Silverman et al. 1992). See fig. 1.1 C.

H- + L* Downstepped H that induces downstep on later H's. Characteristic of catathesis e.g.

H* + H- (Abandoned after Liberman and Pierrehumbert 1984).

12. The grammar of tonal sequences

Each English intonational phrase, then, has the following structure:

Optional intial boundary tone: one of One or more pitch accents: one of A phrase accent: one of A final boundary tone: one of
H% H* H- H%
L% L* L- L%
None L* + H-
L- + H*
H- + L*

Pierrehumbert (1980) characterised this structure by a finite-state transition network.

13. Work in progress

a) Extension to languages other than English:

b) Intonational typology - Ladd

c) Standardization of intonation corpora for English: ToBI (Silverman et al. 1992, Beckman and Ayers 1994, Pitrelli et al. 1994)

d) Completely new views of intonation (e.g. Taylor 1994)

Recommended Reading:

Ladd (1992, 1996), Beckman and Pierrehumbert (1986)


Beckman, M. E. and G. M. Ayers (1994) Guidelines for ToBI labelling (version 2.0) Electronic document /opt/tobi/TOBI-TRAINING/labelling_guide-V2.ASCII on OUPLSun.

Beckman, M. E. and J. B. Pierrehumbert (1986) Intonational structure in Japanese and English. Phonology Yearbook 3. 255-309.

Bolinger, D. (1972) Accent is predictable (if you're a mind-reader). Language 48. 633-44.

Crystal, D. (1969) Prosodic Systems and Intonation in English. Cambridge University Press.

Goldsmith, J. (1981) English as a Tone Language. In D. Goyvaerts, ed. Phonology in the 1980's. Ghent: Story-Scientia. Circulated in 1974.

Halliday, M. A. K. (1967) Intonation and Grammar in British English. The Hague: Mouton.

Inkelas, S. and W. R. Leben (1990) Where phonology and phonetics intersect: the case of Hausa intonation. In J. Kingston and M. E. Beckman, eds. Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech. Cambridge University Press. 17-34.

Jones, D. (1960) An Outline of English Phonetics. Ninth edition. Cambridge: Heffer.

Kingdon, R. (1958) The Groundwork of English Intonation. London: Longman.

Ladd, D. R. (1992) An introduction to intonational phonology. In G. J. Docherty and D. R. Ladd, eds. Papers in Laboratory Phonology II: Gesture, Segment, Prosody. Cambridge University Press. 321-334.

Ladd, D. R. (1996) Intonational Phonology. Cambridge University Press.

Liberman, M. (1975) The Intonation System of English. PhD dissertation, MIT. [IULC edition, 1978]

Liberman, M. and J. Pierrehumbert (1984) Intonational Invariance under Changes in Pitch Range and Length. In M. Aronoff and R. T. Oehrle, eds. Language Sound Structure: Studies in Phonology Presented to Morris Halle by His Teacher and Students. MIT Press. 157-233.

O'Connor, J. D. and G. Arnold (1973) Intonation of Colloquial English. 2nd edition. London: Longman.

Pierrehumbert, J. B. (1980) The Phonology and Phonetics of English Intonation. PhD dissertation, MIT. [IULC edition, 1987].

Pierrehumbert, J. B. (1981) Synthesizing intonation. Journal of the Acoustical Society of America 70 (4). 985-995.

Pierrehumbert, J. B. and M. E. Beckman (1988) Japanese Tone Structure. MIT Press.

Pitrelli, J., Beckman, M. and Hirschberg, J. (1994) Evaluation of prosodic transcription labelling reliability in the ToBI framework. International Conference on Spoken Language Processing, Yokohama, Japan.

Silverman, K., M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert and J. Hirschberg (1992) ToBI: A Standard for Labeling English Prosody. In J. J. Ohala, T. M. Nearey, B. L. Derwing, M. M. Hodge and G. E. Wiebe, eds. ICSLP 92 Proceedings: 1992 International Conference on Spoken Language Processing. Volume 2. Department of Linguistics, University of Alberta. 867-870.

Steele, J. (1775) An Essay towards Establishing the Melody and Measure of Speech. [Scolar Press Facsimile Edition, 1969.]

Taylor, P. (1994) The rise/fall/connection model of intonation. Speech Communication 15. 169-186.

't Hart, J., R. Collier and A. Cohen (1990) A perceptual study of intonation: An experimental-phonetic approach to speech melody. Cambridge University Press.

van den Berg, R., C. Gussenhoven and T. Rietveld (1992) Downstep in Dutch: implications for a model. In G. J. Docherty and D. R. Ladd, eds. Papers in Laboratory Phonology II: Gesture, Segment, Prosody. Cambridge University Press. 335-35.