Vowel reduction and elision
In unstressed prefixes in words such as suppose, it appears that the vowel has been deleted.
In YorkTalk/IPOX, this is achieved not by deletion, but by compressing a syllable to such an extent that the vowel is "eclipsed" by the surrounding consonants (Coleman 1994:318ff). The "underlying" presence of the vowel can still be noted in the consonantal transitions.
This is demonstrated below by several versions of the word suppose, generated with varying amounts of compression for the first syllable.
- with an underlying full vowel: /s^powz/; first syllable at 100%, 90%, 80%, 70%, 60%, 52%.
- with an underlying schwa: /s@powz/; first syllable at 100%, 90%, 80%, 70%, 60%.
Even with radical compression, the items in the two sets remain notably different.
The two screen pictures below illustrate the effects of syllable compression on formant transitions when /s^powz/ is generated with the first syllable compressed to 60% of its duration. The first panel displays all parameter tracks for the first three formants (F1-F3) entered into the database during phonetic interpretation. The second panel shows the formant tracks as presented to the synthesizer. As can be seen, the targets of the vowel /^/ are never reached, but they do serve their purpose in the calculation of consonant-vowel transitions.
A traditional analysis of these phenomena in terms of deletion cannot easily distinguish between the onset cluster /sp/ in a word such as sport and the heterosyllabic cluster /s-p/ in reduced forms of the word support. As a result, it is predicted that the two forms are phonetically identical. Our analysis in terms of compression, by contrast, correctly predicts subtle, but perceptually salient, differences in temporal as well as spectral structure. Compare:
The following examples illustrate the use of compression to generate an appropriate rhythm for disyllabic words, and to create the impression of a syllabic sonorant. While it possible to analyze a sonorant as the nucleus of a syllable, one would also need to write rules that assign a phonetic interpretation to such a structure.
- /bot@l/ bottle (compression: 80% - 62%)
- /boston/ Boston (compression: 90% - 70%)
- /k^zin/ cousin (compression: 70% - 62%)
Although in the present examples the values for syllable compression were optimized by hand, it is possible (as well as desirable) to derive these values by rule. A reasonable approximation is given by the following:
- monosyllabic foot: 100%
- disyllabic foot:
- 90% - 80% (first syllable is heavy)
- 80% - 80% (first syllable is light)
- polysyllabic foot: 80% ... 60% ... 70%
However, other factors would need to be taken into account as well (see Local and Ogden 1994).
Syllable compression is also used to model apparent changes in vowel quality in a stem such as photograph when it appears in a latinate derivation. For example:
- photograph /fowt@graf/
- photography /f@togr@fi/
- photographical /fowt@grafik@l/
On the face of it, we would need to use different vowels, depending on the distribution of primary and secondary stresses. However, quite reasonable approximations of these words can be synthesized by assigning different amounts of compression to the various positions within a metrical foot, while using identical vowels in each case:
/fowtograf/ (72% - 62% - 75%)
/fowtografi/ (60% - 85% - 68% - 75%)
/fowtografikal/ (72% - 62% - 85% - 75% - 68%)