(First) (Next) (Contents) (Home) (Previous) (Last)

2     Appendix A. Accessing: sources and types of audiovisual media

There is a tremendous breadth of culturally interesting material in audiovisual form. A constantly growing proportion of it can be accessed via the internet. Under the rubric of access, we consider issues concerning the location of this material, its quantity, its nature, forms and format, and the problems of the availability of and right to use this material in digital form for research.

2.1      Digitisation

The transformation of pre-existing audiovisual material into digital form is largely outside the scope of this report. However, user interviews clearly indicate that this remains an important issue. The Arts and Humanities Data Service (AHDS) offers a good practice guide on Creating Digital Audio Resources (AHDS, 2006a). It aims to provide information and more specific technical guidance for those considering small or medium-scale audio digitisation projects. The guide is aimed at a non-technical audience and will be of interest to holders of analogue collections considering digitisation, managers who need enough information to plan resources for a digitisation project and those experimenting with or piloting digitisation on a small scale for research, teaching, promotion or creative projects. (Plichta & Kornbluh, n.d.) gives somewhat more technical guidance.

Many archives have digitised some or all of their collections, or plan to do so, and this is often done in conjunction with a programme to make items available online. One such example is the Imperial War Museums Collections Online (Imperial War Museum, 2006a). Another large UK digitisation effort is being led by JISC, the JISC digitisation programme (JISC, 2006), funded with a 10 million grant from the Higher Education Funding Council for England. The program covers many resources, not just sound and moving pictures, but also archival sound recordings at the British Library (3900 hours) and Newsfilm Online (6500 hours) (JISC, 2005).

A published interview with the project manager of Newsfilm Online (eGovMonitor, 2005) reveals some of the complexities of these digitisation projects, which extend beyond the merely technical difficulties associated with choosing and converting data into formats that will be future proof and those associated with cataloguing. The interview reports that the project will have a licence to access the data in perpetuity, bringing access to hours of news film together with supporting metadata offering contextual information about a film as well as studio scripts and running orders and raw news feed for some time spans. Newsreel data will also be digitised. However, some of the data comes from third parties and, where copyright cannot be negotiated, part of the material will be fuzzed out and substituted with a caption that maintains ITNs commentary. More generally, decisions must be made about which data to include and which to leave out. The project comprises a steering group of academics as well as technical staff, and is conducting regular focus groups with higher education users to make sure that needs are met.

There is also interesting activity outside the UK: Googles efforts to digitise text collections are well-known, but their plans also extend to video collections as part of Google Video. They recently began posting the results of a joint digitisation pilot project aiming to make as much as [] possible of the US National Archives public domain video content available online (News.com, 2006) (Google, 2006a).

There are many issues associated with the preservation of archived audiovisual material, both those which apply to all digital material needing to be preserved (e.g., Rosenzweig, 2003) and those which apply specifically to audio and moving image material (e.g., Besser, 2001). Interesting discussion about access and/or archive issues also arises in field specific papers, such as (Carson, 2005) and (Bignell, 2005), but these issues are beyond the scope of this project.

2.2    Quantity of data

Although digitisation projects continue to be important, it is now more common for research projects to have problems arising from too much rather than too little data. The UC Berkeley survey How Much Information? 2003 (Lyman & Varian, 2003) gives some information on the volume of audiovisual information being created currently, including that outside archives. For example:

There are other significant sources of data, though these are flows rather than stored data: (Lyman & Varian, 2003) report that information flows through electronic channels (telephone, radio, TV, the Internet) are dominated by the information sent and received in telephone calls (including both voice and data on fixed lines and wireless), which if represented digitally would amount to 17.3 exabytes (17,300,000 TB) of new information. Much of this information is ephemeral, but that does not prevent it from becoming a source for arts and humanities research. The capture and recording of ephemeral material is becoming increasingly common (for example, BBC radios listen again facility makes many radio programmes available online for a period).

(Lyman & Varian, 2003) also gives some indication of the amount of audiovisual data, reporting that approximately 370,000 motion pictures were made around the world from 1890-2002 and noting it would take 2108 years to play the entire universe of original film and video titles continuously. To put these quantities into perspective, the digitised version of the book collections of the US library of Congress would amount to 10 TB of information (19 million books and other printed collections). (Lyman & Varian, 2003) also comment on the often-discussed movement towards digital technologies and born-digital data (i.e., data that is originally created in a digital format, rather than being converted to digital from some older recording format).

In the UK, some of this information is available to universities and colleges holding the appropriate Educational Recording Agency Licence or, for institutions holding BUFVC membership, through their Off Air Recording Backup Service. (Connolly (2004) summarises the rights situation from a modern languages and film studies perspective.)

2.3    Examples and sources of audiovisual data

Audiovisual data in collections around the UK and abroad include national or regional sound, film and television archives, television and radio company archives, newsreel archives, museum archives, stock libraries and academic collections, as well as numerous small collections held by local organisations, companies and private collectors. Some feel for the vast number of collections is given by the British Universities Film and Video Councils Researchers Guide Online (BUFVC, 2005), which aims to be the most detailed, specialised, accessible and up-to-date database in the UK focusing upon the subset of film, television, radio and related documentation collections in the UK: at the time of writing (early 2006) it currently lists 547 entries, including 118 core radio collections and 319 core moving image collections.

Examples of archives, selected relatively arbitrarily, include:

Such archives contain recorded speech of various forms, such as:

Collections of recorded music exist also, and are increasing in size, but commercial interests and copyright mean that these are often not freely available.

Short extracts from recordings are available online from many different sources; for example, it is now common for composers to make extracts available on their web sites. Other kinds of sound recordings are also available online, such as the sound of the Churchill tank starting up and moving off (Imperial War Museum, 2006b).

The situation for film and other moving images is similar to that for recorded music. Freely available non-commercial recordings do exist (e.g., All Go Margate (1970), one of a number of seaside resort publicity films dating from the 1920s to the 1980s (South-East film and video archive, via moving history (AHRB Centre for British Film and Television Studies, 2005)). The Prelinger Archives (Prelinger Archives, 2006) holds over 48,000 ephemeral (advertising, educational, industrial, and amateur) films. CNN Image Source (CNN, 2006) contains CNN footage and makes it available to researchers for a price. Video upload sites such as YouTube (Youtube, 2006) are currently growing rapidly and comprise mainly home videos. There are also collections of collections. The Moving Image Collections (MIC) (MIC, 2006) allows catalogue access to several dozen moving image collections.

Audiovisual data is also beginning to accumulate in new, digital institutional data centres and data repositories. Some of these are at the national level. For example, the JISC funded Film & Sound Online service provides access (including downloading) to film and video collections relevant to teachers and students and is hosted by EDINA, a JISC designated national data centre (Edina, 2006). There are also subject specific data repositories, such as those supported by the Arts and Humanities Data Service (AHDS, 2006b). This contains submissions such as the Designing Shakespeare collection, which includes a text database of production and review information, an image database of production photographs, a collection of video interviews with designers and a collection of VRML theatre space models (AHDS, 2005). Some institutions are developing their own data repositories (JISC, 2005). One such framework is the open source MIT DSpace framework (MIT, 2006a), which enables the submission, management and preservation of digital research material including (potentially) audio and video. The MIT iCampus OpenCourseWare initiative (MIT, 2006b) is currently archiving course materials in DSpace and another of the iCampus projects is investigating the audio recording of lectures for later search and retrieval (MIT, 2006c).

Audiovisual data is also increasingly being made available through web and pay-per-view services. With increasing broadband uptake, film and TV-over-PC is becoming more popular. Relevant data extends as far as the performing arts e.g. the UK Theatre Network plans to produce the worlds first pay-per-view theatre (announced October 21, 2005), building on technology trialled for film downloading and sports to allow users to log on and view the latest play either live or recorded (OpenPress, 2005).

Other sources of audiovisual data include individual user generated content, which is increasingly born digital. Lightweight and easy-to-use technologies such as webcams, computer microphones and digital video cameras such as those in mobile phones as well as new editing technologies make data collection and manipulation considerably easier for individuals than in the past. Once collected, some of this data is made available on the Web in the form of pod casts (audio blogs), vlogs (videologs), moblogs (comprising content posted to the Internet from mobile devices, in this case devices supporting audiovisual capture), posted as art or uploaded to sites such as Google Video (Google, 2006b) (to be discussed later).

The Google Video National archives effort represents one of several on-line archives of material which has fallen or been given to the public domain. One such example is The Open Video Digital Library (OpenVideo, 2005), a publicly accessible digital video repository. The repository was developed in part as a testbed for video retrieval researchers but also to serve the practical needs of the public for an open collection of video: the collection spans categories including documentary, educational, ephemeral, historical and lecture (Marchionini & Geisler 2002).

2.4    Technology and formats

Audiovisual data is held in such archives and collections in a variety of formats. For example, the British Library supplied the following statistics for their recorded sound collection (Robinson, 2005):

A typical small video collection is the Oxford University Archive of Performances of Greek and Roman Drama (Oxford 2005a) which includes an audiovisual collection consisting of 250 videotapes and perhaps a hundred CDs and audio tapes.

Some material is available only in streamed formats (e.g., the Naxos Music Library), which carries issues of reliability and fidelity. Other material (particularly film and sound) can be available in compressed formats which might or might not lose information important for a research project. The Cylinder Preservation and Digitization Project takes the useful approach of making its material available in both compressed restored formats and in a high-resolution raw (unrestored) format; each format is likely to be more suitable for different kinds of research projects. The use of analog base magnetic tape (audio and video tape) has also decreased as digital storage has increased. The production and sale of retail audio CDs has declined, whilst DVDs have achieved the fastest market penetration of any recent technology innovation.

The issue of digitisation formats is much less critical than it was five years ago. Many recent desktop machines have the storage, memory, processing power, and network bandwidth to deal with consumer-grade audio and video without a problem. Still, there are some age-old principles that remain, and appreciating them can help any person who works with audiovisual resources:

Although it is not a general principle, relying on streamed media for scholarship serves nearly no one today. Although it makes some sense in a mobile data scenario, it is retrograde for researchers, who frequently need to hop around the material, slice it up, and focus on small sections.

2.5     Platform survey

As previously mentioned, desktop computers are currently capable of viewing high quality audio and video. They can store and manage a moderate collection of audiovisual resources. Desktop digital video editing is well within modern computers capabilities. Future computing capacity could go to even higher quality video, and/or viewing multiple streams. Typical current query-by-audio- or video-content algorithms, however, run roughly equivalent to real time, so would have a very difficult time being applied to large collections. For this, collaboration, a pooling of resources, and/or looking towards Grid technologies might be the way forward. Bringing a massive sharing of computing resources to bear on shared, batch processing of a known corpus of audiovisual resources could be the next step forward in creating online audiovisual resources.

The other trend, parallel to the increased computing power on the desktop, is the increased computing power in mobile devices. This points to a more significant change. Audiovisual resources have begun to accompany researchers throughout their work and personal lives (e.g. using a video iPod to store film collections).

2.6    Availability

Not all audiovisual archive material is readily available for research purposes. Firstly, not all of the data held in archives is catalogued: (Sandom & Enser, 2003) report that many film archives have large and growing backlogs of items for which there are no content description. The Presto Project, which examined archives of broadcast material, found the content is unavailable to the general public and often unavailable even to national archives and educational institutions. Much of the content is unique, e.g. master material that cannot be allowed to circulate generally, and all of the content has rights issues. (Presto, 2006) (But see comments on the Creative Archive later.)

Some archives maintain private catalogues, others make catalogues available via the World Wide Web but require appointments to be made and travel to the archive to view resources. Others offer some or all of their collections online, discussed below under digitisation. Where data is catalogued, it may not be catalogued consistently across archives (Sandom & Enser, 2003), although there are certainly efforts in this direction, such as the Open Archives Initiative (OpenArchives, 2006) which formed the basis of The Open Language Archives Community (Simons & Bird, 2003).

2.7     Access rights

Access to material can be restricted for commercial and copyright reasons. This is particularly true of film and recorded music. The Naxos Music Library referred to above, for example, is available at a cost. The JISC collection Film & Sound Online is currently only freely available until 31 July 2007. The Cylinder Preservation and Digitization Project makes its materials freely available. Although there are currently no copyright restrictions on the original material, copyright does apply to the restored digitizations. However, controls are waived for non-commercial use.

Looming large over all issues with audiovisual content in the humanities is the development of digital rights management (DRM): multimedia content owners want to protect their content and profits, and for distribution of digital-only content, insist upon some form of anti-copying technology. When it comes to research, especially as aided by data analysis tools, this becomes a grave concern and a major obstacle.

DRM is the general term for a variety of technical solutions (reinforced by legislation) designed to allow the rights-owner of content to determine how a consumer may use the content. In the case of digital audio, rights may be limited to listening to the content on a limited number of computers and/or associated compatible portable devices. In some cases, the rights may be time-limited, such as when the rights to listen are tied to a monthly subscription. The status of DRMd content (that is, content protected by some digital rights management system) that consumers pay for now no longer resembles the ownership that people have been accustomed to in the case of physical media. DRMd content, lacking any tangible form, is now licensed or leased, ultimately subject to the will of the rights owner.

The general technical method for implementing DRM is to encrypt the file and tie the encryption key to the content-purchaser, the computer, and/or the date. A specialised, trusted application on the computer or portable device has the ability to decrypt the file and play it. No other applications may do so. This causes difficulties. DRMd content lacks compatibility with data analysis methods. No applications other than those trusted by the DRM scheme provider have access to the decrypted content: such applications typically do not offer the analyses that researchers need, and even if they did, the algorithms are unknown and undocumented, so the results are of uncertain value to researchers. A DRM scheme provider will want to know what an application does with the decrypted content before granting it trusted status, so research applications cannot be trusted a priori because the researcher cannot know in advance all that is to be done with the decrypted content. Data analysis programmes are shut out of working with DRMd content directly. Cumbersome workarounds may be possible but are often impractical for large amounts of content.

The current trend toward strong protection for intellectual property will likely harm many research activities in the humanities and social sciences. It already restricts research in many areas. The especially damaging trend involves the combination strong intellectual property laws with Digital Restrictions Management (DRM) software. The problem comes about because DRM software typically is not written to allow the fair dealing exceptions that are allowed by copyright law. Thus, in practice, researchers are losing their rights to access data.

For instance in the UK, Section 29 (1) of Part I of the Copyright, Designs and Patents Act 1988 as amended (2003) states Fair dealing with a literary, dramatic, musical or artistic work for the purposes of research for a non-commercial purpose does not infringe any copyright in the work provided that it is accompanied by a sufficient acknowledgement. This clause should allow access to films, music, and documents for a wide variety of University research. However, the authors of this study are not aware of any DRM software that actually implements Section 29 (1). Broadly speaking, DRM software is written to make it hard for consumers to pirate the content, and researchers are incidentally treated as pirates.

DRM software is not just a technological trend, it is enforced by law. It is illegal to circumvent any copy protection scheme, and even illegal to construct or possess devices and computer programs that will be used to circumvent DRM. An example case where the High Court upheld the law was (1) Kabushiki Kaisha Sony Computer Entertainment Inc (2) Sony Computer Entertainment Europe Ltd (3) Sony Computer Entertainment UK Ltd v (1) Gaynor David Ball & 6 Ors, [2004] EWHC 1738 (Ch), 19 July 2004. Such a law is required by treaty obligations. Consequently, one cannot circumvent DRM software to gain access to protected content, not even for allowed research purposes. Even if it were legal to break DRM protections in pursuit of a fair dealing use, one could not legally possess the required tools.

DRM technology is also converging with the efforts of a group called TCPA (Trusted Platform Alliance), which aims to build hardware to allow strong control of what software can access what data. Other names for this are TC (Trusted Computing) and NGSCB (Next Generation Secure Computing Base). While this technology has its benefits, if adopted it will allow content providers to specify how software will display their product. For instance, a supplier of music could (and presumably would) require that Windows Media Player shall send the music only to the speakers and no where else. It would then be difficult and illegal to analyze the music with other software to understand the details of the musical performance.

Another related technology that comes under the general heading of DRM is CPRM (Content Protection for Recordable Media). CPRM is a mix of hardware (within the disk drive) and software (in an application program), and it aims to encrypt data as it is written to the disk in order to control the copying of sensitive data. This technology has been implemented since 2004 (CyberLink, 2004). If broadly implemented, it would check any disk accesses against rules provided by the content providers. The press release makes it clear that the technology is intended for controlling the playing of videos on DVD. Such a technology would be a severe problem for someone in film studies or someone who was studying advertisements. Likely, such a researcher would need to collect excerpts (or adverts), but the technology would prevent him or her from copying parts of the DVD.

We want to emphasize that there are many technologies under the general heading of DRM. By the time this report is read, the details may change. However, there is a strong economic incentive for entertainment companies to implement DRM so we expect that DRM will not disappear. Conversely, there is no significant economic incentive for companies to preserve the fair dealing exceptions specified in copyright law, so researchers cannot expect unhindered access. We note that the same problem also arises in using copyrighted materials for instructional purposes. The only practical solution seems to be an exception to DRM legislation that would allow the use and possession of circumvention tools for non-commercial research purposes.

2.8    Altered rights management

The most well-publicised response to the assertion of strict copyright is the Creative Commons model (CreativeCommons, 2006) and related UK variants. These legal instruments allow authors to reserve some, rather than all, rights (e.g. the right to benefit if the material is reused commercially). They provide a compromise between the extremes of copyrighted and public domain. This model has spurred the development of sites storing audiovisual data for certain kinds of reuse (particularly for creative, non-commercial purposes), such as the BBC Creative Archive and The Freesound Project. There exist other similar sites, for example the Internet archive movies section (InternetArchive, 2006). As with The Open Video Digital Library and similar projects, these archives may also prove useful for technological development, as well as stimulating creative artistic works.

The BBC Creative Archive (BBC, 2003a) was first announced in 2003 and is intended to increase licence payer access to the archives through the Creative Commons inspired creative archive licence (BBC, 2003b); clips can be downloaded for non-commercial use, stored on PCs and edited and shared. Releases so far include clips from Radio 1 and 1Xtra and BBC news; future releases for the 2005-2006 pilot programme include the subjects of science and nature. Other participants in the group include the British Film Institute, the Open University and Teachers TV (BBC, 2003a). The Freesound Project (FreeSound, 2006) is an Internet-based project supporting the free exchange of sound effects through a website which allows anyone to participate by contributing or downloading files. Sounds are made available under the Creative Commons sampling+ licence, which allows most uses of the sounds provided the source is acknowledged.

Public-good archives of data themselves encounter obstacles. BBCs experiment in free downloads of Beethovens nine symphonies was heavily criticised by classical music labels. They maintained that the BBC was diluting the commercial value of their products (Seltzer, 2005).

(First) (Next) (Contents) (Home) (Previous) (Last)