mixing acoustic phonetics, statistics and comparative philology to bring speech back from the past

Oxford University logospeaker
University of Cambridge
 Phonetics Laboratory

 Statistical Laboratory  

What we are trying to do
Audio demonstrations
Indo-European digits database

Interesting links



Since before the start of the project, I've been tweeting about it at @sounds_ancient ( Here I preserve a selection of the main highlights.

7 December 2017. Grimm's Law for Thursday: *t > θ. The "th" sound of "thorn" (Old English "þorn") likely comes from Proto-Indo-European [t], preserved in the Sanskrit cognate [tŕ̩ɳam], as pronounced here by @suhasm: 

11 October 2017. Kicking off “Introducing Ancient Scripts” seminar series this afternoon: 

5 July 2017. Chinese 蜜, honey, is a loan-word from Indo-European Tocharian mjət, mit (cognate with "mead"). In Min Dong Chinese: 

Japanese 蜜, mitsu, was borrowed from Chinese. Proto-Indo-European *médʰu has travelled a long way!

14 December 2016. John Coleman presented an invited seminar at the Institute of Phonetics and Speech Processing, Ludwig-Maximilians-Universität, München

17 November 2016. Symposium on Statistics, Language Change and Variation

Taylor Institution, Oxford. This symposium brought together a local network of researchers and visitors working in various disciplines, with a common interest in quantitative modelling of language evolution, variation and change, and their causes and impediments to change.

1:00 Buffet lunch - let us know your dietary requirements
1:30 Welcome and introductions

1:45 Alex BOUCHARD-CÔTÉ, Department of Statistics, University of British Columbia, currently visiting Oxford
Probabilistic models of diachronic phonology and computational reconstruction methods
Can computational methods provide accurate protoform reconstructions? How to evaluate such methods and bring them to bear on data-intensive questions such as the role of functional load in sound change.
2:15 Luke KELLY and Geoff NICHOLLS, Department of Statistics, Oxford
Bayesian phylogenetic analysis of lexical traits with borrowing, and statistical issues arising in the estimation of language trees
Languages adopt words from other languages as they diversify, so that word-trees are networks over language-trees. How do we quantify uncertainty in estimated language-trees when the lexical trait data we are using includes borrowed words? Are our estimates and our estimated uncertainties reliable?
2:45 John COLEMAN, Phonetics Laboratory, Oxford and John ASTON, Statistical Laboratory, Cambridge
Statistical acoustic modelling of historical and prehistoric sound changes
How can we model (simulate) the acoustic sounds of words in the past, and the changes they have gone through over time? How can we measure and visualize the rate and direction of those sound changes.
3:15 Break

3:45 Davide PIGOLI, Statistical Laboratory, Cambridge
Spatial modeling of sound change: dialectal variation in the spoken part of the British National Corpus
Can we improve over the hard boundaries of isoglosses using speech recordings to study dialectal variation? How does the co-variability between frequency intensities in speech relate to dialectal variation in the UK?
4:15 Janet PIERREHUMBERT, Oxford e-Research Centre and Faculty of Linguistics
Modelling the spread of arbitrary innovations in language
How can novel forms, which are rare at their inception, become widely adopted? How do cognitive and social factors interact to determine the spread of linguistic innovations?
4:45 Inés PEÑA NOVAS, Linguistics, CUNY, and Marco ARCHETTI, School of Biological Sciences, UEA (Norwich) Phonetic robustness and the evolution of phoneme usage
Why are certain phonemes more used than others? Is phoneme usage arbitrary or is there an underlying tendency for certain phonemes to be used more often?
5:30 Open session, final thoughts and questions

25 July 2016. Local TV channel Cambridge TV did a nice piece about the project, focussing on the statistics and starring our photogenic colleagues Davide Pigole and Shahin Tavakoli.

20 July 2016. carried an article which was a reasonable paraphrase of the Cambridge University news article.

19 July 2016. Cambridge University Office of External Affairs and Communications published a nice news article about our project, "Time Travelling to the Mother Tongue". This piece was widely copied (sometimes as a slightly garbled paraphrase) on various online news sites.

16 July 2016. LabPhon15, Cornell University, Ithaca, USA. "Acoustic-phonetic modelling of historical and prehistoric sound change" Abstract Poster

9 May 2016. 5:15, Taylorian Institute, Oxford, Graduate Linguistics Seminar: "Acoustic modelling of historical and prehistoric sound change"

30 March-1 April 2016. "Modelling the changing rate and direction of historical and prehistoric sound changes." British Association of Academic Phoneticians, University of Lancaster.

Coleman lecturing at BAAP 2016

12 January 2016. The AHRC Science in Culture Innovation Award that has been supporting J. Coleman's research time for the past year has now ended. The summary final report is available from here. Even so, the Ancient Sounds research goes on ... (and on and on, we hope!)

8 January 2016. One-day teach-in in Oxford on acoustic modelling of sound change. Details here. All the materials (presentations, software, data etc.) from the workshop can be downloaded from here.

25-29 November 2015. London Mathematical Society at the Science Museum, London

12 November 2015. "Statistical Acoustic-Phonetic Historical Linguistics: a short introduction". Cambridge Language Sciences annual symposium. Slides (without audio) are here, and a nice video recording is here:

23 October 2015. Previously I posted demo of "five" from (Lithuanian) "penki". But PIE has *penkwe, not penki. So here done better:

Starting to fill new table of Indo-European digit sounds at New tokens of *treies, *ksweks, quinque and Ancient Gk, and *penkwe, *septm (wrong stress, but hey), quattuor (hybrid of Ladin kwater and Welsh pedwar, maybe too prominent). Comments +/- welcomed.

2 October 2015. A 78 rpm record of Sogdian from the Sorbonne Archive de la Parole, 1911-1914 [Bibliotheque nationale de France]:

Here's the official link: [Archives de la parole]. Langue sogdienne : [énoncé de syllabes] : [chant]
My guess is that it could be a liturgical chant.

24 September 2015. For slides and audio I gave at the British Science Festival or Oxford Alumni Weekend, look here: (NB big file)

22 September 2015. Great thing about acoustic (spectral) modelling of sound change is that you can quantify the rate and perhaps direction of change. At what rate did English "one" change from Old to Middle to Modern English? From 34 to 122 microradians per year.

8 September 2015. My university press office ran a blog post about the project today:

1 July 2015. Balochi for "three" is [se:], from *tre(yes) via *te:. In this demo, [te:] is from Sindhi:

tre(ye)s has lost final s at least 3 times: *treyes > PGer tre, Latin tres > Ital tre, Iberian Sp tres > American tre We model the loss of final -s by a kind of fade-out rather than an all-or-nothing deletion. This gives an intermediate stage tres > treh.

(Not sure how convincing this one is; maybe we can improve it.)

8 June 2015. dw > b (in eg *dwoH) in a number of I-E languages: Lycian kbi, Avestan bae, bitya, Latin bi-, bis, and Sindhi

tw > p in Ossetic tsuppor (from *kʷetwóres)

26 May 2015. "Three" comes from Proto-Indo-European "*treyes". Not from Spanish "tres", but that's the nearest I've got. Listen: Here's the MP3 version:

25 May 2015. "One" comes from Proto-Indo-European *oinos, via Middle English "oon", Anglo-Saxon "an", Germanic "oin(s)". Listen:

"One" from "oin(s)", MP3 format:

Previously [12th May] we generated a continuum of sounds from "two" to "twa" and vice-versa. Now, we follow "two" all the way back to Proto-Indo-European *dwo(H). WAV: MP3:

or .

18 May 2015. "Eight" came from Proto-Indo-European *Hokto, via changes something like this: (MP3 version )


15 May 2015. "Four" comes from Anglo-Saxon "feower":

15 May 2015. "Five" comes via fif, fimf, pemp from Proto-Indo-European *penkwe. Lithuanian penki is nearest living word. Listen:

Or if you prefer going forwards in time from Anglo-Saxon "twa" to Modern English "two":

1 April 2015. For , we made a ": to show Anglo-Saxon pronunciations (or something close to them) that survived until quite recently in various German dialects.

14 January 2015. AHRC Science in Culture Innovation Awards meeting, London. My presentation is available under "Presentations" (see link to the left).

John Coleman is supported by a Science in Culture Innovation Award from the
AHRC logo

John Aston is supported by a Fellowship from theEPSRC logo