ICT Tools for Searching, Annotation and Analysis of Audiovisual Media

Alan Marsden*, Harriet Nock, Adrian Mackenzie*, Adam Lindsay*, John Coleman, and Greg Kochanski

* Lancaster Institute for the Contemporary Arts, and
Institute for Cultural Research,
Lancaster University

Phonetics Laboratory,
University of Oxford

AHRC ICT Strategy Project report
October 2006

Executive Summary

  1. This report concerns the use of ICT tools in research in the arts and humanities using speech, music, video and film in digital form, hereafter referred to as AV (audio-visual material).
  2. The quantity of AV available to researchers is now massive and rapidly expanding, far exceeding the quantity of available print material in sheer number of bytes.
  3. The main problem for researchers is no longer a paucity of AV but how to locate the material of interest in the vast quantity available, and how to organise material once collected.
  4. Metadata and tagging continue to be important to facilitate search. Standards for metadata for AV do exist but are not yet widely adopted.
  5. Content-based search is becoming possible for speech, but is still beyond the horizon for music, and even more distant for video and film. Mixed speech, music and noise is very hard to search.
  6. Copyright protection hampers research with AV, and digital rights management systems (DRM) threaten to prevent research altogether.
  7. Once AV has been located and accessed, much research proceeds by annotation, for which many tools exist. Systems for reuse and sharing of annotations are in their infancy, however.
  8. Many researchers make some kind of transcription of AV, and would value tools to automate this process. For speech, such tools exist with important limits to their accuracy and applicability.
  9. Full music transcription tools do not exist, but researchers can benefit from other sorts of visualisations, for which tools do exist.
  10. Researchers could work more effectively with better knowledge of ICT. A common failing is not so much ignorance of how to use particular tools as a misunderstanding of the processes the computer carries out and the validity of its results.
  11. In Section 1.3, recommendations are made concerning:
    1. provision of ICT infrastructure for arts and humanities research,
    2. training for researchers,
    3. copyright law and digital rights management (DRM),
    4. resource development unlikely to receive commercial support,
    5. dissemination of expertise and examples in research on AV with ICT,
    6. standards and commercial tools,
    7. metadata and digitisation projects outside the research community,
    8. management of researchers private collections of AV,
    9. deposit and sharing of AV, including annotations of AV.


We are very grateful to the following for their contributions to this survey: the Oxford Building a Virtual Research Environment for the Humanities Project team: Ruth Kirkham, John Pybus and Alan Bowman; Bill Byrne, Stanley Chen, Colin Connolly, Peter Enser, Thomas Hain, Jing Huang, Giridharan Iyengar, Sanjeev Khudanpur, Roger Moore, Jiri Navratil, Mari Ostendorf, Christine Sandom, Andrew Senior, Sue Tranter, Phil Woodland, Ed Whittaker and many others for informal conversations. We also gratefully acknowledge the generous amount of time and information given by all of the participants with whom interviews are reported in Appendix C.

This project has been supported by a grant from the Arts and Humanities Research Council.


1 Project report. Audiovisual media, ICT tools, and humanities research

1.1 Introduction

1.1.1 Scope of the report

1.1.2 Report website and project weblog

1.1.3 Other relevant reports

1.2 Overview of the report

1.2.1 Organisation of the report

1.2.2 Accessing audiovisual materials

1.2.3 Technologies state of the art, gaps, obstacles

1.2.4 User experience and expectations

1.3 Conclusions and Recommendations

2 Appendix A. Accessing: sources and types of audiovisual media

2.1 Digitisation

2.2 Quantity of data

2.3 Examples and sources of audiovisual data

2.4 Technology and formats

2.5 Platform survey

2.6 Availability

2.7 Access rights

2.8 Altered rights management

3 Appendix B. Technologies for researching speech, music and moving image

3.1 Other sources of information

3.2 Searching and collecting

3.2.1 Searching the spoken word

3.2.2 Searching for music and sound

3.2.3 Searching video and film

3.2.4 Searching for AV on the web

3.2.5 Content management systems

3.3 Annotation

3.3.1 Annotation and standards

3.3.2 Manual annotation

3.3.3 Collaborative annotation

3.3.4 Automatic annotation

3.4 Transcription

3.4.1 Speech-to-text transcription

3.4.2 Transcription-related annotation of speech

3.4.3 Music transcription

3.5 Analysis

3.5.1 Analysis of audio and music

3.5.2 Analysis of film

3.6 Presentation

3.6.1 Summarisation

3.6.2 Speech-to-Speech Translation

3.6.3 Visualisation

3.7 Integration

3.7.1 Malach (Multilingual Access to Large Archives)

3.7.2 Variations2

3.7.3 Informedia Digital Video Library project

3.7.4 National Gallery of the Spoken Word

4 Appendix C. Researchers: practices, possibilities and expectations

4.1 Snapshot of Current Humanities Uses of Audiovisual Media

4.2 User Needs Study

4.2.1 Methodology

4.2.2 Institutions represented

4.2.3 Subjects represented

4.2.4 Limitations of study

4.3 Interview Results

4.3.1 Obtaining research resources

4.3.2 Data preparation

4.3.3 Analysis and interpretation

4.3.4 Dissemination

4.3.5 Other uses

4.4 Technical expectations

4.4.1 Error

4.4.2 Robustness

4.4.3 A lack of appreciation for the demo effect

4.4.4 I cant do that [with that tool]

5 References

