Information Systems and Digital Libraries (Library and Information Studies)

As Howard Besser (Besser, H. (2004), ‘The Past, Present, and Future of Digital Libraries’, in Schreibman, S., Siemens, R., Unsworth, J., (eds), A Companion to Digital Humanities, (pp.557 - 575)) points out, if a large collection of books and associated material has all been assembled in one location, it doesn’t necessarily mean that it should be called a ‘library’. To concur with this definition, he argues that skilled intervention is required to arrange and manage the material in such a way that it can be stored and retrieved in the most effective manner possible, and this logic can also be used as a basis for defining how objects should be handled within digital libraries. Before addressing this issue however, it is useful to remember that the concept of an object being anything other than something that is referenced by a library system (because of its analogue nature) is relatively recent, and it is the incremental developments to older systems, some with analogue origins, that have provided information scientists with the means of effectively delivering the current generation of enormous and comprehensive catalogues.

Classification systems such as Dewey, Universal Decimal and Library of Congress have been in existence since the end of the nineteenth century and their hierarchical structures and alphanumeric strings were ideally suited to inclusion within computer systems. National variations of the MARC (Machine Readable Cataloguing) format, first developed in the 1960’s, are now used by libraries all over the world for defining bibliographic and related material in their catalogues, thereby greatly easing the transferral and exchange of records between organisations. A relatively recent iteration of the format is called MARC21 and is a combination of the U.S and Canadian versions, based on the ANSI (American National Standards Institute) standard Z39.2. As of June 2004, the British Library adopted MARC21 as its cataloguing format in the course of implementing a new integrated library system.

The use of classification systems and databases containing vast numbers of bibliographic records, in some cases joined together into very large union catalogues, (e.g. the Copac Academic and National Library Catalogue, and SUNCat – the Serials Union Catalogue) is the accepted and expected method by which users access library indexes. These OPAC (Online Public Access Catalogue) systems are inevitably now delivered as web-based interfaces and as such are capable of visualizing results in much more flexible and user-configurable ways than before.

Focusing specifically on the creation and development of digital libraries (i.e. systems that store items as digital objects and facilitate delivery of those objects to users via computerised means) there are currently various platforms which organisations are using to manage and deliver resources at departmental or institutional levels.
The development of Digital Library systems such as DSpace, EPrints, Fedora, Greenstone and dLibra is an attempt to combat the sort of ad hoc practices, rife in all organisations, where users who are uncertain what to do with the files they acquire or create often assign them to oblivion on institutional file or mail servers in deep and unmanaged folder structures.

Taking one of these as an example, DSpace is a digital repository system jointly developed by MIT Libraries and Hewlett-Packard Labs that ‘captures, stores, indexes, preserves, and redistributes an organization's research data.’ It is a freely available open source software platform that accommodates all forms of digital material including text, image, audio and video files and handles the submission and re-delivery of those materials via a web interface. In addition to the user-friendly interface, DSpace offers the following benefits for those choosing to deposit materials using this tool.

  • The system formalises the requirement for metadata to be added to objects, allowing them to be indexed for browsing and searching and defined within groups of similar objects that are placed together in logical collections.
  • Groups are designated as belonging to a particular community which can correspond to specific parts of an organisation (e.g. department, research centre or school)
  • Communities are modular and can be extended across institutional boundaries
  • Locations for objects can be specified with persistent URL names allowing for sustainable and reliable referencing
  • Preservation of file formats is managed by automatically updating material to avoid format and technology obsolescence

The functionality referred to above is indicative of the features that other digital repository software systems may offer and implementations of three of these systems have either been carried out or are planned in a number of UK Higher Education organisations, for instance:

DSpace – Loughborough University -
e-Prints – University of Southampton -
Fedora – University of Hull -

The planning required to set up these systems is significant and may often be contemplated in conjunction with complementary systems. At the University of Hull, the browser front end that serves as a gateway into their system is facilitated by uPortal, a free open-standard collaborative portal tool which allows users to customize their access to the institution’s online resources. This organisation is also looking at implementing the Sakai Collaboration and Learning Environment, another free and open source product, which is maintained by a largely U.S. based community of developers and users. A set of generic collaboration tools forms the core set of features within the system (e.g. wiki, forum, chat room, WebDAV, RSS, scheduling), but it is extensible to allow additional tools to be incorporated. Lancaster University Centre for e-Science is currently using a Sakai environment for its ReDress project, whose remit is to raise awareness of e-Science related technology initiatives amongst social scientists. One of the features of this site is a highly effective online archive of seminar and workshop presentations featuring full audio and video versions of the papers synchronized with their related Powerpoint slides.

The EU funded DELOS project (Network of Excellence on Digital Libraries) is an initiative to integrate and coordinate the ongoing work of major European efforts in the field of Digital Libraries and is currently working on the development of a Digital Library Reference Model and a prototype implementation of a Digital Library Management System. This ambitious collaborative project seeks to define how the next generation of digital libraries will be built and used and incorporates a middleware environment called OSIRIS (Open Service Infrastructure for Reliable and Integrated Process Support) and a set of services referred to as ISIS (Interactive Similarity Search). These components and the applications and central functions that they translate between are expected to be able to support a huge range of functions covering the entire data life-cycle from acquisition, through retrieval and analysis (for all data types including automatic feature extraction from image and audio material), through to archival and long-term preservation.

Fig. 2 Representation of the DelosDLMSFig. 2 Representation of the DelosDLMS

Such a proliferation of function is symptomatic of the fact that the term ‘digital library’ defines an entity which requires attention to strategy on many fronts, not least on issues to do with data standards and information formats. One of a number of useful papers by the Arts and Humanities Data Service (AHDS) introduces issues relating to digital repository systems and refers to their adoption of the OAIS (Open Archival Information System) reference model.[# AHDS, Curl Case Study,] Originating from work done in scientific disciplines, this is a framework that expresses concepts relating to the long term preservation of, and access to, digital information. This paper also makes useful reference to the Storage Resource Broker (SRB) data management tool which enables data that is distributed across multiple storage devices to be viewed as if it were part of a single file system.

Syndicate content