Internet and Database Resources (Music)

At a recent Methods Network workshop, Michael Casey remarked that the ideal web resource would learn to serve the user’s requirements rather than force the user to learn the requirements of using the resource. The potential for building ‘intelligence’ into systems is currently a focus of much research across disciplines and can be seen in various applications of ‘probabilistic’ matching and tagging systems. In the field of linguistics, a team at Lancaster University are developing a tagging process that will apply probable variant spelling tags to a large dataset containing historical word forms, based on a limited sample dataset that has been manually marked up to provide the blueprint for a subsequent and more substantial automated process. In the field of content-based image retrieval (CBIR), recent progress has been made by applying feedback to systems that attempt to supply similarity matches for a query image. The retrieval of meaningful records based on the formal properties of an image alone (rather than on any associated textual metadata) becomes increasingly more difficult as the system requires more semantic understanding of the objects depicted and the input of relevance data based on sample searching gives CBIR systems an opportunity to then apply probabilistic matching to the rest of the dataset.

Closer to the realm of musicology (in that there are significant holdings of music material there), the British Library uses a method that relates to these intelligent retrieval techniques. The website logs all the searches that users try to carry out, whether the attempts to retrieve information are successful or not, thereby building up a useful picture of how users intuitively prefer to try and find what they are looking for. By analysing these terms and the search results accrued by using them, it will then be possible to reformulate system ontologies to accommodate terms and ideas that were not originally perceived to be appropriate but which have subsequently proved to mean more to the general public than the system developers anticipated.

During the recent Goldsmiths workshop (see above), Matthew Dovey offered further opinions about how some of the current shortcomings of the British Library catalogue might be addressed and these related to remarks made about the enormous amount of retrospective material contained in the library catalogues that had insufficient indexed metadata associated with the records. His proposal took the form of ‘moving the code rather than the data’ and involved writing new algorithms which would go in and search available but unindexed data to answer specific and important research questions, using grid and agent technology. This bespoke approach to finding data is clearly an unusual model for arts and humanities research but is more orthodox in the context of science and e-science disciplines, where it is more acceptable to allow substantial research queries to take hours or days to return results. Stuart Jeffrey from the Archaeology Data Service recently reported that it was routine for that organisation to run a batch job overnight that ran all possible queries that a user might make on a click and browse system containing over a million detailed archaeological records. This sort of data processing approach provides arts and humanities researchers with an alternative way of looking at information management and may be of particular interest to musicologists due to the size and complexity of audio file formats and the processing overhead that is associated with carrying out even moderately complex analysis jobs.

As might be expected, the number of resources for musicologists on the web is prodigious. In broad terms though, despite all of the musical resources held in significant collections such as the National Sound Archive, the OCLC Music Library and the SONIC Library of congress catalogue, it is apparent from the frustrations voiced by academics working in the field that there is no accessible and coordinated repository of audio source material which will act in the same way as the large reference corpora that exist in the field of textual studies and linguistics. Tim Crawford recently remarked in the context of carrying out analysis using corpus search and retrieval techniques, that even if it were possible to analyze all of Mozart’s 41 symphonies, there is no reference corpus available against which one could compare and contrast the results.

One notable web resource that is enormously important to the study of an earlier period of music is the Digital Archive of Medieval Music (DIAMM -, an AHRC and Mellon funded initiative that has amassed a collection of around 7000 high resolution images of polyphonic sheet music. It is significant that this resource is image-based rather than audio-based, but in terms of the standards of image capture that are employed and the value added features available on the website, it represents a benchmark implementation of how to represent historical music-related material on the Web. The highly effective image zoom function, the user annotation area, the contextual help feature, the amount and quality of image metadata associated with the records and the links to other resources such as RISM (Répertoire International des Sources Musicales), all combine to add functionality and usability to the system. The embedded expertise in digital restoration techniques that is apparent in the presentation of some of the more damaged documents; and the explicit demonstration of those techniques in online tutorial pages is also extremely useful. (A Methods Network workshop organised by Julia Craig-McFeely on ‘Digital Restoration for Damaged Documents’ was a by-product of DIAMM’s expertise in this area and has resulted in a workbook detailing relevant techniques in Adobe Photoshop -

Syndicate content