Phonetics at Oxford University

The Oxford Aesop Corpus 2010

Overview

This corpus contains the data from the Speech rhythm project funded by ESRC. (See also the Greg Kochanski's page).

This corpus of data consists of short paragraphs and children poetry read by native speakers of Southern British English, Russian (Moscow and St.-Petersburg), Greek (Athens), Taiwanese Mandarin and French (Paris). The paragraphs were selected so that they did not contain any dialogue. Most poems contained 8 syllables per line.

Speakers were 20-28 years old, born to monolingual parents, and had grown up in their respective countries. At the time of the recording, all speakers were living in Oxford, UK. Non-English participants had lived outside their home country for less than 4 years. Recordings were made through a condenser microphone and a lapel microphone in a soundproofed room in the Oxford University Phonetics Laboratory and saved direct to disc at a 16 kHz sampling rate. Texts were presented on a screen in standard orthography for each language. The data collection software for this project can be downloaded from the releases page or Greg Kochanski's website.

All speakers of Greek, French and Russian and read the same 45 texts and retold Cinderella. Mandarin speakers read 73 shorter texts and also retold Cinderella. English speakers were divided into 2 groups (12 speakers each). The first group read the same 45 texts as the speakers of other languages and retold Cinderella. The second group did not read or retell Cinderella. Instead these speakers repeated 4 texts three times. Each repetition was recorded on a separate day.

In addition to short texts, all speakers also read up to 700 randomly selected short sentences which were intended to use for training an automatic speech recognition system. These sentences are offered in separate archives.

The experimental data itself consists of speech recordings, and they are stored in subdirectories. It also contains the orthographic texts, automatically generated transcriptions and metadata files with information about each file.

Metadata files

The corpus has one metadata file for each tar.gz file. The metadata files are in FIAT format. Fiat files may be read by the fiatio python module in the gmisclib package available from Sourceforge.org, with documentation at http://kochanski.org/gpk/code/speechresearch/gmisclib/gmisclib.fiatio-module.html. They contain a line-by-line description of each utterance. The explanation of each field is contained in a Readme file included with the archive.

Data Files

The corpus contains a large number of directories. There are several files inside each:

raw.wav
The original recording, in Microsoft WAV format. It is a two-channel file. The first (lower numbered) channel (0 or 1) contains the recording done using a (5 mm diameter) lapel microphone; the second (higher numbered) channel contains the same recording done using a (15 mm diameter) condenser microphone.
The lapel microphones sometimes malfunctioned, therefore we recommend using the recordings obtained from the condenser microphone. The malfunctions should be detectable, as they either led to high amplitude noise or near silence; about 20% of the lapel recordings have malfunctions.

text.txt
The orthographic text which was presented to the speaker. The files use Unicode encoding (UTF-8).

text.phones
This is an ASCII file which contains automatically generated transcription in X-Sampa format. The transcriptions do not always reflect false starts, hesitations or repetitions.

Downloads

You can download the corpus here.

References

Please, refer to one of our papers.

This page and the Oxford Aesop Corpus 2010 by Anastassia Loukina and Greg Kochanski is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales License. Based on a work at the Phonetics Lab.. You may copy and/or use this file (and referenced files) for noncommercial purposes so long as the author is properly acknowledged. For commercial licensing, contact Isis Innovation.