Experimental Phonetics, TT 2016

Week 1. Experimental hygiene

Forced choice: subjects have tacit knowledge and awareness of acoustic cues that they are not explicitly aware of:

West, P. (1999) Perception of distributed coarticulatory properties of English /l/ and /r/. Journal of Phonetics 27, 405-426. http://dx.doi.org/10.1006/jpho.1999.0102

Categorical perception: boundary between two categories is attracted towards real words and away from nonsense words:
Ganong III, W. F. (1980) Phonetic categorization in auditory word perception. Journal of Experimental Psychology: Human Perception and Performance 6, 110-125.


Categorical perception: boundary between two categories is attracted towards higher frequency words and away from lower frequency words:
Connine, C. M., D. Titone and J. Wang (1993) Auditory word recognition: extrinsic and intrinsic effects of word frequency. Journal of Experimental Psychology: Learning, Memory and Cognition 19, 81-94.


Subjects' behaviour in speech perception experiments is biassed by what they think the experiment wants to discover:
Goldinger, S. D. and T. Azuma (2003) Puzzle-solving science: the quixotic quest for units in speech perception. Journal of Phonetics 31, 305-320.
üü

Subjects' behaviour in speech perception experiments is biassed by cues in the situation where the experiment is carried out:
Hay, J. and K. Drager (2010) Stuffed toys and speech perception. Linguistics 48 (4), 865–892.


Week 2. Experimental process & debugging

Week 6. How the stimuli were created

The endpoints of our four continua are synthetic copies of sound recordings (some natural, some edited), from the following sources:

acht - The original recording is of a female speaker of the Low Saxon (East Frisian) dialect of German, spoken by Frau Drost, b. Borkum, Leer 1921 (westernmost of the East Frisian islands); she pronounces the number eight as [axt@]. The source is recording ZW--_E_05401_SE_01_T_01 of the Zwirner corpus, downloaded from the Datenbank für Gesprochenes Deutsch, Insitut für Deutsche Sprache, Mannheim. I deleted the final schwa using standard a signal editing tool (wavesurfer, as it happens, but any signal tool would do).
OCTO-gem-ang-DE-NI-aehta-U0003-F0011-1P.wav

echt - The original recording is of a male speaker of "Doric" (Aberdeenshire) Scots, speaker M1042 "Bob", in the SCOTS corpus; he gives the number eight as [ext]. Source: http://www.scottishcorpus.ac.uk/av/1448.mp3 (http://www.scottishcorpus.ac.uk/document/?documentid=1448)
OCTO-gem-sco-GB-ABD-echt-U0005-M0006-1.wav

eight - The original recording is of a male speaker of RP recorded in the Phonetics Laboratory, University of Oxford
OCTO-gem-eng-GB-ENG-eight-U0007-M0015.wav

penkwe - This is a manually-edited hybrid of two original recordings. The initial "penk" part is from a recording of Lithuanian "penki" ENLT005-penki.wav, edited from http://www.50languages.com/book2/EN/ENLT/ENLT-all.zip
The final "que" part is from a recording of Italian "cinque", http://www.single-serving.com/Italian/numbers/05.au

pente - The original recording is of a female speaker from Crete, recorded in Mary Baltazani's "Vocalect" project.
CR-IO-f-pente1.wav

thre - The original recording is of speaker M1012, "Ian", a 52 year-old male from Hawick, in the Scottish Borders
Source: http://www.scottishcorpus.ac.uk/av/1430.mp3 (http://www.scottishcorpus.ac.uk/document/?documentid=1430)
TRES-gem-enm-GB-ELN-thre-U0010-M0031-1P.wav

three - The original recording is of Scots speaker M1042 "Bob" in the SCOTS corpus (same speaker and source as for "echt", above)
TRES-gem-sco-GB-ABD-three-U0005-M0006-1.wav

The choice of recordings to use in each pair/continuum was based upon selection of examples that were very similar in duration. All sound recordings were converted to .wav PCM format, and downsampled (if necessary) to a rate of 16,000 Hz, at a resolution of 16 bits, monophonic. The pair of recordings at each end of a desired continuum were then time aligned by trimming the leading and trailing silences so that the medial spoken portions were well synchronized, and their amplitudes were normalized using sox. The sound files were then converted to cepstrograms (vectors of 39 cepstral coeffecients at 5 ms intervals) using "ahocoder" software, written and supplied by Daniel Erro. (For details, see e.g.
D. Erro, I. Sainz, I. Saratxaga, E. Navas, I. Hernaez, "MFCC+F0 extraction and waveform reconstruction using HNM: preliminary results in an HMM-based synthesizer", Proc. VI Jornadas en Tecnologia del Habla & II Iberian SLTech (FALA), pp. 29-32, Vigo, November 2010.)

Let E1 denote the cepstrogram at one end of the desired continuum and E2 denotes the corresponding cepstrogram from the other end of the continuum. The increment of difference, delta = (E2-E1)/20, is thus 5% of the acoustic change needed to move from E1 to E2. A succession of intermediate cepstrograms is thus easily obtained:

E1 + 0.delta   = E1
E1 + 1.delta
E1 + 2.delta
E1 + 3.delta
...
E1 + 19.delta
E1 + 20.delta  = E2

For each continuum, these 21 derived cepstrograms were then converted back into an audio signal using Daniel Erro's "ahodecoder" synthesizer, which implements the inverse of the "ahocoder" analysis routine.

The general technique of this interpolation method is briefly described in Coleman, J., J. Aston and D. Pigole. 2015. Reconstructing the sounds of words from the past. In The Scottish Consortium for ICPhS 2015 (Ed.), Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, UK: the University of Glasgow. ISBN 978-0-85261-941-4. Paper number 0296 http://www.icphs2015.info/pdfs/Papers/ICPHS0296.pdf, mirrored here.
Note that in that paper we refer to 11-step continua (i.e. 10 x 10% increments) synthesized using LPC analysis/synthesis, whereas the stimuli used in this experiment are 21-step continua made using MFCC analysis/synthesis. Other than that (which is just a choice about what acoustic parameters we like best, and how finely to slice the continua), the method is exactly the same.