Ancient Sounds: mixing acoustic phonetics, statistics and comparative philology to bring speech back from the past

AHRC Science in Culture Innovation Award AH/M002993/1

January 2015-January 2016

Summary Final Report

John Coleman


Background and Project summary


In this project, we examine an old question
what did words sound like in the past? in a revolutionary new way. Since the 19th century, historical linguists have studied in detail the forms of words in many languages at different points in history, the varieties and mechanisms of sound change, and, for the Indo-European language family in particular, they have used that knowledge to infer the forms of words from a time before writing. For example, from word-forms as diverse as Old English weorc, Old High German werc, Latin orgia, Greek ergon, and Armenian gorc, philologists infer a Proto-Indo-European stem u̯erg̑-, a formula that hints at a pronunciation something like werg. But what did it actually sound like? The innovation of this project is that, rather than reconstructing written forms of ancient words, we have been developing methods to triangulate backwards from contemporary audio recordings of simple words in modern Indo-European languages to regenerate audible spoken forms from earlier points in the evolutionary tree.

In 2015, with a grant from the Arts and Humanities Research Council, we extended this work to some Germanic languages, together with Modern Greek, to try to advance the horizon of audio reconstruction into the deeper past of the Indo-European language family. Although it was not part of the original proposal, we also generated a number of simulations of sound change in some Indo-Iranian and Indo-Aryan languages of Pakistan, to explore the eastern extent of the Indo-European language family. The research opened up a wide range of new questions, which were the focus of the research carried out under the Science in Culture Innovation Award:
How far back in time can extrapolation from contemporary recordings progress? How wide and diverse must a language family tree be in order to triangulate to sounds that are plausible i.e. reasonably consistent with written forms from antiquity? Are any attested sound changes outside the limits of the acoustic transformations we can currently model, and if so, how to address that? How do we deal with changes that not acoustically continuous or gradual, such as analogical formations and loanwords? We also began to be able to address questions of rate of change e.g. do sound changes proceed at a uniform, gradual rate? Or if not, how can we model varying rates of sound change in different branches of a language family, or in different periods? Some preliminary answers to these questions are presented in the following sections.

Findings


How far back in time can extrapolation from contemporary recordings progress?

We focused on modelling the processes of change, and the rate and direction of change, rather than extrapolation per se. Therefore, we cannot yet answer this question. However, we did succeed in modelling chains of sound change (in some words) from Proto-Indo-European right through to the present day, e.g. from *penkwe to Modern English five, a time span normally estimated as at least 8000 years. Our ability to do so crucially rests on artificial "proxy" recordings of Proto-Indo-European words. In this case, the sound of *penkwe is produced from a combination of the first half of the Modern Greek form pende and the second half of the Italian form cinque.

How wide and diverse must a language family tree be in order to triangulate to sounds that are plausible i.e. reasonably consistent with written forms from antiquity?

In posing this question in the original project proposal, we were wondering whether it would be more difficult to model sound change in branches of Indo-European represented today  by only a single modern language, e.g. Greek, than in branches of Indo-European represented today by a wide range of sister language, e.g. Romance or Germanic. In fact, since we model the process of sound change one word at a time, rather than over the whole lexicon of a language, it has proved no more difficult to model acoustic etymologies of Greek than any other languages, as the Ancient Greek reconstructions in our on-line database illustrate.

Are any attested sound changes outside the limits of the acoustic transformations we can currently model, and if so, how to address that?


Our method of modelling sound change using incremental continua of interpolants between endpoints has proved to be surprisingly robust and flexible. Below we tabulate all the sound changes that we have modelled to date:

Romance sound changes (generated in 2013, prior to the AHRC Ancient Sounds project)

Vowel raising (open-mid to close-mid) and diphthongization [ɛ] > [ei] in [trɛs] > [treiʃ] (3.2-tres-tres.wav)
Vowel raising (mid to close) [e] > [i] in [des] > [dis] (10.3-dez-dix.wav)
Vowel raising (mid to close) and diphthongization [o] > [oi] in [dos] > [dois] (2.6-dos-dois.wav)
Vowel raising (close-mid to close)/monophthongization [ei] > [i] in [seis] > [sis] (6.5-seis-six.wav)





Vowel lowering (close to mid) and fronting [ũ] > [̃] in [ũŋ] > [̃] (1.7-um-un.wav)
Vowel lowering (mid to low) [̃] > [ɛ̃] in [̃] > [ɛ̃] (1.8-un-un.wav)





Alveolar-to-postalveolar backing [s] > [ʃ] in [trɛs] > [treiʃ] (3.2-tres-tres.wav)
Alveolar-to-postalveolar backing [s] > [ʃ] in [seis] > [seiʃ] (6.2-seis-seis.wav)
Alveolar-to-postalveolar backing and affrication [t] > [tʃ] in [set] > [setʃ] (7.2-sept-sete.wav)
Alveolar-to-postalveolar backing and affrication [t] > [tʃ] in [otto] > [otʃo] (8.2-otto-ocho.wav)





Alveolar-to-velar backing [n] > [ŋ] in [un] > [ũŋ] (1.3-un-um.wav)

Germanic sound changes
Vowel raising (open to mid) [a] > [e] in [axt] > [et] (aeht-to-echt.wav)
Vowel raising (open to mid) [] > [eɪ] in [dʃmʔ] > [teɪn] (deshimt-to-teichn.wav)
Vowel raising (open to mid) and rounding [ɑ:n] > [o:n] in [ɑ:n] > [o:n] (an-to-one.wav)
Vowel raising (mid to close)/monophthongization [ei] > [i:] in [nein] > [ni:n] (neGn-to-niin.wav)
Vowel raising (mid to close)/glide formation
[o] > [w] in [o:n] > [won] (oon-to-one.wav)
Vowel raising (open to close) and monophthongization
[wɑ:] > [u:] in [twɑ:] > [tu:] (twa-to-two.wav)





Monophthongization
[jaʊ] > [ɔ] in [fjaʊəɾ] > [fɔə] (feower-to-four.wav)
Monophthongization
[aʊə] > [ɔ:] in [faʊəɾ] > [fɔə] (fowre-to-four.wav)





Mid-centralization with unrounding
[o] > [ə] in [oxto] > [oxtə] (okto-to-oxte.wav)





De-frication and voicing
[] > [ɪ] in [et] > [eɪt] (echt-to-eight.wav)
De-frication/vocalization and monophthongization
[eɪʝ] > [e:] in [teɪʝnə] > [te:nə] (teichne-to-tene.wav)
De-frication/vocalization
[ɣ] > [i] in [neɣn] > [nein] (neGn-to-nain.wav)





Vowel opening (close to open)/diphthongization
[i:] > [aɪ] in [fi:f] > [faɪv] (fif-to-five.wav)
Vowel opening (close to open)/diphthongization [i:] > [ai] in [ni:n] > [nain] (niin-to-nain.wav)
Vowel opening (close-mid to open-mid)
[e:] > [ɛ] in [te:nə] > [tɛn] (tene-to-ten.wav)
Vowel opening (close-mid to open-mid) and monophthongization
[ei] > [ɛ] in [treis] > [trɛ] (treis-to-tre.wav)
Vowel opening (mid to open) with unrounding
[o] > [ʌ] in [won] > [wʌn] (an-to-one.wav)
Vowel opening (mid to open) with unrounding
[o] > [ɑ] in [oxtə] > [ɑxtə] (oxte-to-aehta.wav)
Vowel opening (mid to open) and unrounding
[o:] > [ɑ:] in [two:] > [tʋɑ:] (twoa-to-twa.wav)
Vowel opening (open-mid to open) with unrounding/monophthongization 
[ɔi] > [ɑ:] in [ɔin] > [ɑ:n] (oin-to-an.wav)





Vowel fronting
[ɑ] > [a] in [oxtə] > [ɑxtə] (oxte-to-aehta.wav)
Velar-to-palatal fronting
[x] > [] in [axt] > [et] (aeht-to-echt.wav)





Bilabial-to-alveolar backing
[m] > [n] in [dʃmʔ] > [teɪn] (deshimt-to-teichn.wav)
Postalveolar-to-palatal backing
[ʃ] > [] in [dʃmʔ] > [teɪn] (deshimt-to-teichn.wav)
Labio-dentalization
[w] > [ʋ] in [two:] > [tʋɑ:] (twoa-to-twa.wav)
Frication
[p] > [f] in [pɪmp] > [fɪmf] (pimp-to-fimf.wav)
Frication
[t] > [θ] in [trɛ] > [θrɛ] (tre-to-thre.wav)
Frication
[k] > [x] in [okto] > [oxto] (okto-to-oxte.wav)





Loss of weak final [ə]
in [te:nə] > [tɛn] (tene-to-ten.wav)
Loss of final [ħ]
in [dwoħ] > [two:] (dwo-to-twoa.wav)
Loss of postvocalic [ɾ]
in [fjaʊəɾ] > [fɔə] (feower-to-four.wav)
Loss of final [s]
in [ɔins] > [ɔin] (oins-to-oin.wav)
Loss of final [s]
in [treis] > [trɛ] (treis-to-tre.wav)





Voicing [] > [ʝ] in [teɪnɐ] > [teɪʝnɐ] (teichne-to-tene.wav)
Devoicing [d] > [t] in [dwoħ] > [two:] (dwo-to-twoa.wav)
Denasalization and vocalization
[m] > [i] in [fimf] > [fi:f] (fimf-to-fif.wav)
Stress movement
in [oxt] > [xtə] (okto-to-oxte.wav)

Greek sound changes
Labialized velar to alveolar fronting [kʷw] > [t] in [penkʷwe] > [pente] (PIE-penkwe-to-Greek-pente.wav)

Indo-Aryan and Indo-Iranian sound changes
Vowel raising (open-mid to close-mid) [ɛ] > [e:] in [trɛ] > [te:] (tre-to-Sindhi-te.wav)





Vowel lowering (mid to open)/monophthongisation [ɛə] > [ɑ] in [kɛhər] > [tʃɑr] (Irish-ceair-to-Balochi-char.wav)
Vowel lowering (close to mid)/monophthongisation [wo] > [o:] in [dwo] > [do:] (PIE-dwoH-to-Balochi-doo.wav)





Monophthongisation [ʊwʌ] > [o:] in [dʊwʌħ] > [do:h] (Avestan-duwaH-to-Balochi-doo.wav)
Vowel unrounding [o] > [ʌ] in [dwoħ] > [dʊwʌħ] (PIE-dwoH-to-Pashto-duwa.wav)
Vowel unrounding and mid-centralization [o] > [ə] in [dwoħ] > [ɓə] (PIE-dwoH-to-Sindhi-buh.wav)
Velar-to-postalveolar fronting and affrication [k] > [tʃ] in [kʲɛər] > [tʃɑr] (Irish-ceair-to-Balochi-char.wav)





Frication [t] > [s] in [te:] > [se:] (Sindhi-te-to-Balochi-se.wav)
Fusion of alveolar stop + labial-velar glide into labial implosive [dw] > [ɓ] in [dwoħ] > [ɓə] (PIE-dwoH-to-Sindhi-buh.wav)
Loss of [r] in [tr]
in [trɛ] > [te:] (tre-to-Sindhi-te.wav)
Loss of final [ħ]
in [dwoħ] > [ɓə] (PIE-dwoH-to-Sindhi-buh.wav)

Celtic (Irish and Welsh) sound changes
Loss of medial [h]   in [kʲahɚɹ] > [kʲɛər] (Irish-ceathair-to-ceair.wav)
Vowel raising (mid to close) [e] > [ɪ] in [penkwe] > [pɪmp] (penkwe-to-pimp.wav)
Mid-centralization of final [e] > [ᵊ] in [penkwe] > [pɪmpᵊ] (penkwe-to-pimp.wav)
Alveolar-to-bilabial fronting [n] > [m] in [penkwe] > [pɪmp] (penkwe-to-pimp.wav)
Labialized-velar-to-bilabial fronting [kw] > [p] in [penkwe] > [pɪmp] (penkwe-to-pimp.wav)

There are, however, some attested kinds of sound change that we have not yet attempted (or had cause to attempt) to model. e.g. metathesis, or epenthesis.

How do we deal with changes that not acoustically continuous or gradual, such as analogical formations and loanwords?

A phylogenetic tree has a single root and multiple leaves. Although the root is the ancestor of all features that are found in all daughter languages, that does not imply that the root is the source of all traits at the leaves. In phylogenetic linguistics, novel forms that arise in a language due to processes other than gradual sound change are treated as if they arise spontaneously at some interior node in the tree. For example, the initial [kw] in Latin quinque is thought not to have evolved from the initial [p] of its Proto-Indo-European ancestor, but arose in anticipation of the [kw] in the second syllable of quinque. So we treat
quinque as if it were just a new word. From that point onwards, that word-initial [kw] evolved into the modern Romance languages (as variously [tʃ], [s], [θ] etc) by processes of gradual, incremental sound change.

This method does not quite do justice to the fact that
in Latin quinque, the -[inkwe] part is indeed inherited, not borrowed. So to treat such cases properly, we need to first model quinque as if it did evolve from *penkwe, but then treat the change of the initial [p] > [kw] as spontaneous. So it has a hybrid history, with one part evolved and one part an innovation, like a random mutation.

We also began to be able to address questions of
rate of change e.g.
do sound changes proceed at a uniform, gradual rate? Or if not, how can we model varying rates of sound change in different branches of a language family, or in different periods?

Do sound changes proceed at a uniform, gradual rate? No.

How can we model varying rates of sound change in different branches of a language family, or in different periods?

We measure the amount of sound change from one sound file to another in terms of the cosine distance between the spectra of the two time-aligned sound files in the portion of their greatest difference. For example, in the development from 
Anglo-Saxon fīf to Modern English five, the change from [i:] to [ɑɪ] is greatest in the early part of the diphthong.  It is at this point that the cosine distance between the spectra of the time-aligned words is greatest (about time point 40 in the following figure):

Cosine distance between spectra:  

Time (5 ms steps)

The inverse (arccosine) of the distance is an angle that relates to how similar or different the sounds are at that point: a 0 angle means that there is no difference, i.e. the sound has not changed at all. In the figure above it can be seen that towards the beginning of the graph, the difference is less than 0.1 for the first 25 time steps, because the initial [f] of five is hardly different from the initial [f] of fīf. (But a little bit different, perhaps because the initial [f] of fīf is slightly palatalized, [fʲ], on account of the following [i:].)

Estimating the length of time each sound change takes is rather more difficult, as it rests not on acoustic measurement but by reference to historical texts and a certain degree of educated guesswork concerning the prehistoric period.

A sound change of x
that took about y years can be represented on a graph as a vector, pictured as an arrow or line segment of length y units at an angle of x from the direction previously taken. (For the beginning of a sound change, we assume that the "direction previously taken" is rightwards, i.e. the conventional way of portraying an angle at 0.) In this way, we can build up a picture of a sequence of sound changes, such as the following (which is, incidentally, the first such figure to have ever been devised):

Lessons learnt

The opportunity to work intensively on these problems that the award gave me enabled us to make quite rapid incremental progress on all of the research "fronts" identified in the proposal, as described above.
Finding suitable audio recordings on which to build my models and simulations led from the outset into a survey of a wide variety of (modern) Indo-European language resources, and the collection of a database of recordings of spoken digits in languages and dialects spanning the whole Indo-European family. This spectrum of languages, and an incrementally growing selection of these recordings, are published at http://www.phon.ox.ac.uk/jcoleman/ancient-sounds-database.html (which, contrary to its name, is just a large web page).

Learning how to measure the changing rate and direction of sound change, as explained above, was probably the most novel new technique to come out of this project (since we had already figured out how to model sound change using interpolation/morphing before 2015). Although we've only plotted a couple of such figures, it is evident that the shape of the change trajectory is certainly not a straight line, nor a simple closed curve such as a great circle, an ellipse or a parabola - early hypotheses since shown not to be right. Rather, the curves are gently undulating surfaces. They may exhibit reversals in direction, but they are more like gentle inflections rather than complete U-turns, in what we've seen so far. In any case, since we have learned that the trajectory of change is not a straight line, we're in a better place to model sound changes that are not well-modelled by our previous experiments. For example, we showed in 2013 that the development of Modern French trois [tʁwɑ] from Latin [tre:s] does not follow the shortest path along a straight line.

We learned that the interpolation-resynthesis method can be used, not only in modelling paths of sound change, but also as a means of generating "hybrid" pronunciations from two given recordings. This is a useful new addition to the range of techniques at our disposal for simulating pronunciations from the past, where no modern proxy is available. For example, no modern Indo-European language now has a pronunciation like
[dwoħ] (PIE *dwoH, "two"). Elfdalian [two] is pretty close, but it has a voiceless initial consonant, not a voiced one, and lacks final laryngealization. Russian [dva] has completely the wrong vowel, but it does have a voiced initial consonant, and in one of our Russian recordings, the speaker produces the vowel with some final laryngealization. By interpolating a continuum between
Elfdalian [two] and Russian [dva], we were able to generate a very passable token of [dwoħ]. (Whether this is a good Proto-Indo-European *dwoH may be impossible to test, but I hesitate to say that because most of what we are doing in this project would have been considered impossible some years ago.)

Future plans

Although the 12 months of the AHRC Science in Culture Innovation Award has come to an end, research on the Ancient Sounds project goes on. Under that grant we proposed to write and publish two papers, one for a general science readership and one for an academic readership with philological interests. These will be completed and submitted for publication in the early months of 2016.

Our database of digit pronunciations across the Indo-European languages is published online
http://www.phon.ox.ac.uk/jcoleman/ancient-sounds-database.html but is incomplete. We shall complete the table and put in all the links to audio clips. We shall also publish the key pieces of software that we have developed.

The new methods for measuring and graphing rate and direction of change have so far been applied to just a few words in one branch of the Indo-European family tree: the branch from Proto-Indo-European at the root to Modern English at the tip. We shall map other words in a wider range of languages in order to discover the detailed shape of the resulting acoustic-phylogenetic tree.

Methods for statistical regression of sounds over a phylogenetic tree have been explored by Hadjipantelis (2013) for one set of words (cognates of Latin unus, unum) in Romance, but that is an initial experiment that needs refinement, in order to properly factor apart speaker-specific from language-related components of the acoustics. The technical basis for doing that has been developed in recent work by Pigoli, Hadjipantelis, Coleman and Aston.

Now that we have developed and demonstrated techniques for acoustic-phonetic modelling of sound change over several thousand years, on a small vocabulary over a wide range of languages, in the next phase of my research in 2016-- I will work on modelling the sound changes from Proto-Indo-European to Modern English, over a much larger vocabulary of about 300 words.

References

Hadjipantelis, PZ. (2013) Functional Data Analysis in Phonetics. PhD thesis, University of Warwick.

Pigoli, D, PZ Hadjipantelis, JS Coleman and JAD Aston. Under review. The analysis of acoustic phonetic data: exploring differences in the spoken Romance languages. Preprint at http://arxiv.org/abs/1507.07587
 

Note
1. The AHRC Science in Culture Innovation Award supported my (John Coleman's)
research time during 2015, and the findings presented in this document relate mostly to my work in particular. However, the Ancient Sounds project has been, since its inception, a collaboration with John Aston and colleagues at the Cambridge University Statistics Laboratory. In 2015 I had a number of short stays at the StatsLab, during which we worked on the problems discussed above, plus others relating to Aston and colleagues' statistical research (not discussed here, but presented in PAPERS that we also completed in 2015). At the end of the project, on January 8th, 2016, we collectively ran a one-day teach-in workshop on the new methods and discoveries we have made in the Ancient Sounds project. It is only right, therefore, for me to retain the collective "we" in this report.