Database Structures (History)

Of all of the computer-related tools for historical research, database (and statistics) packages have had the most consistent use by the widest constituency of researchers using digital methods in the discipline. In general terms, approaches in the past generally focused on information originating from social science research and the data that resulted from this work often took the form of lists. As such, the manipulation and analysis of this data (e.g. census records, state finance data, demographic data, mortality data etc.) leant itself primarily to being handled by database packages and there is a considerable amount of literature devoted to discussion of the different approaches to structuring data and the use of relational database management systems (RDBMS), such as dBase, Paradox, Clipper, FoxPro and Microsoft Access. One of the principle areas of discussion within this area was how to most usefully represent historical data and much of this debate seems to have focused on the relative merits of a ‘source oriented’ versus a ‘model oriented’ approach to structuring the information.

The ‘model-oriented’ structure required a great deal of preparatory data analysis and provided logical organizational entities into which defined data elements could be deposited (i.e. database fields). At the outset of a project then, not only was the researcher obliged to work out a sensible way of dividing the information into rational and discreet entities, but he or she also needed to think very carefully about what sort of information would need to be extracted from the system during its working lifecycle. This model was particularly effective in cases where the data was relatively simple, complete and regular in its structure.

The ‘source-oriented’ approach attempted to more comprehensively represent the source information in its original form using a combination of textual markup and fielded data structures, thereby capturing as much of the subtlety and nuance of the original data as possible and also enabling future researchers to formulate their own retrieval and analysis approaches to the information. As a leading exponent of the ‘source-oriented’ camp, Manfred Thaller developed the KLEIO system which represented the original source data at two levels: firstly as uninterrupted strings of arbitrary characters and secondly, as meaningful units such as numerical or calendar data etc. The more formally structured element of the system, arranged hierarchically and referred to as the knowledge or logical environment, accurately referenced the transcription (i.e. the full-text) layer, and all queries made of the transcription layer were interpreted via that logical environment – the net advantage being that the researcher still had a ‘machine-readable’ version of the original to refer to.

Whilst contextually useful for talking about data modelling, the debate about ‘source’ versus ‘model’ oriented approaches – which very broadly speaking might translate into ‘encoding’ versus ‘database’ - has diminished in importance due to the potential of XML query processes (e.g. Xpath and XQuery) and the relatively recent acceptance of the term ‘XML database’ to denote data in XML format that is interoperable with relational database packages such as DB2, Oracle and Microsoft SQL Server. Where the functionality of a database is required, projects will undoubtedly continue to incorporate them into their technical strategy, but after a long period of standards being discussed intensively, it is clear that acknowledgement of some form of XML data interchange mechanism is almost a sine qua non of project funding, mainly because of concerns about preservation issues, but also because of the growing awareness of the need to have an effective way of globally harmonizing disparate information systems.

Syndicate content