Progress Report 16/08/2001

Release of speech data from the IViE corpus

We had a very positive response to the release of the speech data from the IViE corpus last month; we received over 70 requests in the first week. We have now run out of CD-packs, but the complete corpus is available on the web (see URLs below).

NB: Information for colleagues who have ordered a CD-pack: all CD-sets have been burned and packaged (12/08/01) and they'll be sent out this week.

Here are the URLs for the on-line versions of the corpus:

Audio Page: Searching the corpus, listing to individual files and downloading individual files

Download Page : Downloading of packs of data from the corpus, sorted by variety and speaking style (.tar)

Update on the annotated IViE CD

Labelling is (again) in progress. The annotated CD which we will publish early next year wil contain a selection of data from seven varieties: Belfast, Bradford, Cambridge, Dublin, Leeds, Liverpool, London and Newcastle and five speaking styles.

Where we are now:

We have completed the labelling of
(1) the sentence data produced by three male and three female speakers from the seven varieties listed above (approximately 90 minutes of speech)
(2) the read speech data (three male and three female speakers, one section from the Cinderealla passage, seven varieties; approximately 35 minutes of speech).

Additionally, we have labelled complete Cinderella passages from 6 Belfast and 6 Cambridge speakers (approximately 50 minutes of speech)

A one-minute file of read or semi-spontaneously produced speech data can be labelled in approximately one hour; one minute of interactive speech data requires approximately two hours.


Esther and Brechtje gave a talk about the IViE project at uklvc in York.