The 2008 Oxford Tick1 Corpus
The 2008 Oxford Tick1 Corpus by Greg Kochanski is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 UK: England & Wales License.
Based on a work at http://www.phon.ox.ac.uk/tick1_info. This data (and the data it refers to) is copyright 2007, 2008 by Greg Kochanski 2010. For commercial licensing, contact Isis Innovation.
This corpus contains the data from the "Tick1" experiment from ESRC grant "Articulation and Coarticulation in the Lower Vocal Tract" with G. Kochanski and J. Coleman as principal investigators. Data is courtesy of the UK's Economics and Social Research Council, derived from project RES-000-23-1094, 7/2005 through 3/2008.
The files DB.fiat, DBsub.fiat and DBsent.fiat contain metadata describing the recordings. This FIAT file format is described at http://www.phon.ox.ac.uk/files/pdfs/fiat.pdf.
The experimental data itself consists of speech recordings, and they are stored in subdirectories. It also contains hand-checked files that mark the beginning and end of utterances, and hand checked positions for finger taps and metronome ticks.
This corpus of data consists partly of short files of repetitive speech: phrases like "Nothing Matters. Nothing Matters. Nothing Matters. ..." (There are 75 different phrases.) The remainder consists of the same phrases (and a few others) spoken in a more standard laboratory phonology context: of a randomized list of phrases.
It also includes some longer, rhythmic passages from Dr. Suess.
The speakers are all speakers of Southern British English. It contains 1308 audio files and totals 2.6 gigabytes of uncompressed audio. There are 14 speakers.
The corpus contains a large number of directories. Inside each, it contains several files of interest:, described below.
A fraction of the corpus is available here: this is the "tick1 taster" corpus. It has the metadata describing the entire corpus, but only has data from one subject called "cw". It is laid out to be browseable, so you can walk through the tree of folders and look and listen to some of the data.
If you need the entire corpus, it is downloadable as a 1.3 Gigabyte compressed tar file here. The compression is via bzip2; on Linux, you would open it with "tar xvjf filename.tar.gz2" or "bzcat filename.tar.bz2 | tar xvf -".
The Oxford library system also maintains this corpus here.
The original recording, in Microsoft WAV format. It is a two-channel file. One channel contains the recorded speech, and the other channel contains either metronome ticks or an audio channel from a microphone positioned to pick up finger taps. (The subject's finger tapped on a hardcover book about 2cm from the microphone.) The finger tap channel will pick up some speech, but faintly, and the speech channel will pick up some finger tap sounds. However, metronome ticks were coupled in electronically and are completely isolated from the speech channel.
These are the start and end-points of the speech in the utterance, automatically generated but checked for accuracy by a human. A small amount of silence (probably <100ms) is included within the marked endpoints on either side of the utterance. See the above publication for details. The data files are in a format suitable for reading by the ESPS package Xwaves, and can be read by Wavesurfer. Python 2.5 code for reading these files is available on Sourceforge, in the speechresearch project, in file gmisclib/xwaves_lab.py . In brief, the format contains a bunch of header lines of basically useless information, then a line consisting of a single hash mark ('#'), then two relevant lines. The one containing an asterisk in the third field marks the utterance start (the time is in the first field). Likewise, the line containing '%' marks the end. Times are relative to the beginning of the raw.wav files.
This file contains experimental tick or tap events. For the metronome data, it contains the times at which metronome ticks occur. For the "tick" data, if it exists, it lists the times at which the subject's finger tapped to mark a stressed syllable. This is computed from one of the channels of the raw.wav file, but manually checked. This file is in the Xwaves label format, same as ue.lbl.
This file contains computed tick or tap locations. It is meaningful only for metronome data, where it simply marks the metronome ticks.
This file (and other files with the ".dat" extension) are stored in the GPK ASCII Image format. This can be read by code available on Sourceforge, in the speechresearch project, in file gpkio/ascii_read.c and gpkio/read.c . (Note, the gpklib library is required for this code; that can be found in the gpklib subdirectory in the same project.) A Python interface to these libraries is available in the gpk_img_python subdirectory of the same project, and is documented at:
This data format consists of a header, followed by data. The header consists of lines in the form attribute=value and the data section is a two-dimensional array of values, either in ASCII in IEEE-754 binary format for floating point values, on in binary integer formats. The header information loosely follows NASA's FITS standard (Flexible Image Transport Standard). (Incidentally, the same software will read and write FITS format images, too.)
Other files are computed from the raw data, and are preserved for convenience. These were used in the "What marks the beat of speech?" paper.
An irregularity measure that separates voiced speech from unvoiced. It quantifies speech that is not fully voiced.
The perceptual loudness.
A measure of duration for the current syllable. Essentially, it measures how far one can go (in time) before the spectrum changes substantially.
The RMS (intensity or power).
A standard computation of the speech fundamental frequency.
A measurement of the average slope of the speech spectrum.
When using the data with "rep*" in the "text" field, the appropriate publication to reference is DOI: 10.1121/1.2890742, "What Marks the Beat of Speech?" G. Kochanski and C. Orphanidou, Journal of the Acoustical Society of America, ISSN 0001-4966, Volume 123(5), pages 2780-2791.
Files whose text field is in the form "sent" are long lists of randomized sentences. These "sent" files were used, along with the "rep*" files in another publication: "Testing the Ecological Validity of Repetitive Speech", Greg Kochanski and Christina Orphanidou, presented at the 2007 International Congress of the Phonetic Sciences (ICPhS2007), 6-10 August 2007. It is available on the web at:
Utterances with "rep*" in the text field are repetitive speech; each phrase is repeated 10-15 times in succession. Files where the text field equals "fox", "king", and "lucky" are longer texts that were not used. They are from three books by Dr. Suess (Geisel).
More detailed documentation is in the DB.fiat file that contains the bulk of the metadata. Some comments by the Oxford Library system on putting this corpus on the web are here and here. A near-copy of this page is available here.