Music Information Retrieval - MIR (Music)

Consideration of MIR tools separately from reflections on methods of musical representation is a rather arbitrary distinction, closely entwined as the two activities are. Nonetheless, MIR is still referred to as an ‘emerging discipline’, and as such, conveys the sense that the use of the tools and methods associated with this activity might sensibly and usefully form a sub-discipline in its own right. In practice, some of the broader issues that demand attention from MIR specialists replicate the questions already posed to do with how music can be adequately represented. Alain Bonardi in a paper at ISMIR 2000 enumerated a list of musical entities that would need to be considered by the contemporary musicologist which echo the domains laid out by Babbitt in 1965.[# Bonardi lists these as ‘graphical’, ‘sound’, ‘symbolic’ and ‘tool’ representations; all of which can be mapped onto what Babbitt refers to as ‘the three usual representations of musical experience’, the ‘acoustic’, ‘auditory’ and ‘graphemic’.
Bonardi, A., What the Musicologist Needs, International Symposium on Music Information Retrieval, 2000
(http://ismir2000.ismir.net/papers/invites/bonardi_invite.pdf)
Babbitt, M., ‘The Use of Computers in Musicological Research’, Perspectives of New Music, 3(2):74-83, 1965.
(http://www.jstor.org/).] However, where MIR practitioners are seeking ways of analysing pre-existing data that was captured without reference to the type of concerns that are currently being voiced within the MIR community, retrieval research can be considered a discreet practice and one that requires complex and sophisticated toolsets, many of which are daunting to the non-specialist and reflective of the historical involvement of computing science in the development of musical analysis tools.

The application that is widely regarded as the standard academic tool for music analysis is Humdrum, developed by David Huron well over ten years ago and originally based on a UNIX model of application development and command-line driven functionality. The FAQ section on the main Humdrum website makes it clear who the target user group for the application is:

[quote]Humdrum is rooted in the UNIX "software tools" design philosophy. That is, each tool in the toolkit carries out a simple operation. However, by interconnecting the tools, the capacity for music processing is legion. In essence, assembling Humdrum command lines amounts to a form of computer programming. Learning Humdrum is comparable in complexity to learning pascal, perl, or kornshell programming.

http://www.lib.virginia.edu/dmmc/Music/Humdrum/humdrumFAQ.ht... [/quote]

That complexity is undoubtedly a disincentive for many musicologists and it is clear that Humdrum remains an underused (if highly regarded) tool.

fig. 3 Computational Analysis Paradigmfig. 3 Computational Analysis Paradigm

The power and the flexibility of the application comes at the usual cost of intuitive usability and this kind of binary opposition is echoed in any kind of visualisation of the matrix of difficulty that one faces in processing different kinds of signals, a paradigm that is central to the problems of MIR and reflected in a range of other disciplines (see fig. 3).

Low level musical features such as loudness, pitch and brightness are relatively easy to quantitatively analyze but aggregating that analysis for the purposes of determining entities that might describe timbre properties or melodic passages quickly becomes difficult. One of the ‘holy grails’ of musicology is to devise an effective method of automatically transcribing music from a complex (polyphonic) acoustic source straight onto a page in musical notation. Despite a great deal of work in this area, current effectiveness begins to falter at the level of a solo piano and the prospect of capturing simultaneous multiple instrumentation is still an open problem. High level features (in common with so called ‘level 3’ features in the realm of image retrieval) involve human perception and interpretation and as such are generally outside the scope of automated computational processes. This doesn’t of course preclude methods of hybrid analysis that combine automated low-level analysis functions alongside the capacity to record manual interventions and annotations at higher perceptual levels, and applications do exist that include this kind of flexibility.

A tool that has been in development since around 1996 and has the status of an ISO standard is the MPEG-7 package, a comprehensive multi-level multimedia content description framework that uses metadata and is expressible in XML. As a flexible apparatus for carrying out the kind of research in question, it would appear to be ideal – and not only for musicology. As Adam Lindsay states, [quote]If someone were to invent a framework serving the arts and humanities research community for its metadata needs, it would resemble MPEG-7, at least conceptually.

Lindsay, A., Understanding the Capabilities of ICT Tools for Searching, Annotation and Analysis of Audio-Visual Media, Methods Network Expert Seminar: Modern Methods for Musicology, Royal Holloway, University of London, 3 March 2006 (http://www.methodsnetwork.ac.uk/redist/pdf/lindsay.pdf)[/quote]
Acceptance and wide usage of the standard has been inconsistent however, undoubtedly due to the complexity of applying its mechanisms to actual applications. Some development has been carried out however and one example is the MPEG-7 Audio Encoder which has an associated graphical user interface developed by Holger Crysandt. This tool provides users with a simple way of producing a large amount of very complex XML encoded information and this alone is instructive about the scale of the problem of processing audio information. Even when restricting the analysis to low level features, the processing of a 5.7 MB mp3 file results in the generation of an XML text file of 1200 pages (11MB) as viewed in a word processor application!

The development of the MPEG-7 standard was influenced by advances in ‘query by image content’ and ‘query by humming’ retrieval models that were being developed in the mid-1990’s and the remit of the standard grew as more applications of the technology were encompassed by the development group. The visual part of the standard concentrates on signal processing and very compact representations whilst the Multimedia Descriptions Schemes subgroup (MDS) opted for a very rich and complex set of description structures. MPEG-7 Audio went for a middle path which included options for using both high-level and generic, signal-processing-inspired descriptors.

Michael Casey has described the development of another tool based on the MPEG-7 standard, called MPEG-7 Audio Codec (MP7AC), which is an attempt to tackle the difficulty of analysing polyphonic music. The objectives in the tests described are to identify similar musical sequences from two contrasting collections of classical piano works and pop songs. The features used in the analysis roughly correspond to measures of timbre and harmony. The retrieval of relevant information across any collection greater than a few works is, however, computationally very onerous and another tool was developed, MPEG-7 Audio Retrieval (MP7AR), to enable searching to take place using locality sensitive hashing (LSH), a process that Casey describes as statistically dividing a feature space into regions of similarity and allowing searching to occur only within those regions. As with most musical analysis, similarity searching is open to varying degrees of specificity depending on the nature of the research, from audio fingerprinting where the concept of similarity has a very narrow definition to the concept of genre labelling where categorical boundaries are fuzzy and will involve the use of unspecific retrieval attributes.

Both MPEG-7 and Humdrum are toolsets rather than specific resources and this model is pervasive in musicology tools development. The Music Lab 2 (ML) project developed at IRCAM (L'Institut de Recherche et Coordination Acoustique/Musique) is a suite of software applications to support the study and teaching of music. ML-Annotation builds on the research done at the centre into a wide range of audio related challenges and offers users a visualisation and semantic and syntactic annotation tool for use on multiple views of works which might include: musical scores, composer’s sketches, recordings, other scholarly annotations, related textual material, and so on. It is designed to work as a standalone desktop resource or across distributed online libraries and works in conjunction with other tools in the suite, ML-Maquette and ML-Audio, all of which together provide representation, analysis, retrieval, playback and even compositional functionality.

Syndicate content