IHR Seminar in Digital History: Magnus Huber (Giessen) 'The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries'

20/02/2012 17:12
21/02/2012 21:00

Venue: ST276 (Stewart House, second floor) and streamed live on the web at historyspot.org.uk

Time: Tuesday, 21 February, 5.15 pm GMT

Magnus Huber (Giessen)

'The Old Bailey Corpus: Spoken English in the 18th and 19th Centuries'

Magnus Huber will be discussing the use of historical court records in the investigation of langauge change. A full abstract can be found below.


The 'Proceedings of the Old Bailey', London's central criminal court, were
published between 1674 and 1913 and constitute a large body of texts from
the beginning of Present Day English (almost 200,000 trials, ca. 134
million words). The 'Proceedings' were digitalized by the social historians
Robert Shoemaker (University of Sheffield) and Tim Hitchcock (University of
Hertfordshire) and are searchable at the excellent 'Old Bailey Proceedings
Online' (http://www.oldbaileyonline.org/), which also provides detailed
background information on the Old Bailey and the publication history of the

This talk reports on a project that turned the 'Proceedings' into the
linguistic 'Old Bailey Corpus' ('OBC'). Corpus linguistics relies on the
statistical analysis of large collections of electronic texts to
investigate language variation and/or language change. In the absence of
recorded speech samples before the invention of the phonograph, language
historians have turned to written text types that are close to spoken
language. The 'Proceedings of the Old Bailey' are particularly suitable for
the study of spoken English as they were taken down by shorthand scribes,
and their verbatim passages are arguably as near as we can get to the
spoken word of the 18th and 19th centuries. The 'OBC' identifies about 114
million words as direct speech from the 1720s onwards, of which 22 million
words have received detailed mark-up for sociolinguistic (sex, profession,
age, residence of speaker, role in the court-room) and textual variables
(the shorthand scribe and publisher of individual 'Proceedings').

After an overview of the creation of 'OBC' 1.0, to be released early in
2012, I will illustrate the potential of this corpus for fine-grained
sociohistorical-linguistic studies. One aim is to point out the added forms
of analysis that this linguistic corpus and corpus-linguistic methods make
possible as compared to the historical resource described above. Examples
will be drawn from the development of, for example, negative contraction
(e.g. 'do not > don’t') and the development of relative clauses over the
two centuries covered by the 'OBC'.


The IHR Seminar in digital history is actively engaged in presenting and discussing new methodologies which have been made possible through the development of computational methods for the study of history. Further information can be found on the IHR Seminar page at http://www.history.ac.uk/events/seminars/321. Follow us on twitter @IHRDigHist or join the mailing list for seminar announcements: http://groups.google.com/group/ihr-digital-history-seminar-a...

