The Corpus Approach (Art History)

The amassing of detail and the application of a variety of methods has resonances with the type of approach favoured by those working with corpus information and in particular with linguistics research where the analysis of texts using a broad range of quantitative techniques has underpinned a number of areas of enquiry. Techniques for creating concordances, discovering collocations, word frequencies, the clustering of phrases and much else besides have been one of the mainstays of humanities computing for decades but as a broad approach, it appears to have had little cross-over effect on the study of the visual arts. It is true that a number of visual arts related projects have adopted the word ‘Corpus’ as part of their title, but there appears to be little that formally connect the ways that these corpora are designed to be interrogated and the ways in which linguists and others will analyse a resource such as the ‘British National Corpus’ (BNC), a collection of texts totalling 100 million words taken from spoken and written sources of language in use during the later part of the twentieth century.

There are some interesting parallels to be drawn between the activity of corpus construction and that of bringing together images for the purpose of studying art history, not least the problem of obtaining rights to the material in the first place. It is customary these days to put together corpora from existing digitized texts as the costs of checking optically scanned material or keying in data (which can still be required for spoken language gathering) are prohibitive. As with image data however, permissions from authors are required where copyright restrictions are in place and some authors will refuse to be represented in corpora or insist on high fees. The use of metadata to define the components of the corpus is of critical value also as patterns of language may become meaningless without the context of how they were originally used. At a slightly more abstract level, the components of language and the components of artistic imagery are both unlimitable phenomena and both might be said to constitute a collection of discrete elements that are arranged together to form meaning or achieve an effect. So with this in mind, the question arises, should we be constructing visual corpora in the same manner?

In some senses, this activity is happening with projects such as the Public Catalogue Foundation ( and the National Inventory of European Painting ( and it is hoped that these initiatives may eventually provide scholars with enough information from a wide enough range of sources to enable meaningful comparative research. At a recent Methods Network workshop on Corpus Approaches to the Language of Literature participants were invited to discover the frequency of use of particular words and phrases in a single chapter of a Dickens novel, the novel as a whole and then in relation to a corpus of 19th Century texts containing a large number of works by other writers. The outcome indicated that Dickens recycled phrases far more than many of his contemporaries, the product perhaps of a requirement to write quickly so that his serialised stories would meet their journalistic deadlines. Following this methodology and given the resources, similar comparisons across periods of art may provide similar evidence to support or refute existing theories or contribute to the formulation of new ones.

It would be disingenuous not to acknowledge that the challenge of returning meaningful data from automated searches across visual information is currently beyond the functionality of any system currently available to art history, as has already been referred to in the context of CBIR (see Content-Based Image Retrieval - CBIR (Art History)). This doesn’t render the concept meaningless however, it may just mean that manual and automated techniques would have to be used in tandem to provide useful analytical features on which to run comparisons. These might include: compositional layout, use of posture in figures, clustering of architectural elements, perspective, horizon lines, dimensions, geometry, sight lines, use of colour and shape. Much of this manual annotation could be achieved with software packages such as Adobe Photoshop of course but it is the type of analysis that is possible once the material has been made available in this format which makes the comparison with corpus techniques interesting.

On a less theoretical note, the use of text analysis tools such as textSTAT and Wordsmith would provide art historians with alternative ways of looking at the textual information that has been created about art objects, whether that is text that can be digitized from printed sources, metadata text exports from image databases or born-digital commentaries on the web or in word processed formats.

