Project start date: 2007-01 Project end date: 2010-06
Increasingly within archaeology, the Web is used for dissemination of datasets. This contributes to the growing amount of information on the ‘deep web’, which a recent Bright Planet study estimated to be 500 times larger than the ‘surface web’. However Google and other web search engines are ill equipped to retrieve information from the richly structured databases that are key resources for humanities scholars. Important archaeological results and reports are also appearing as grey literature, before or instead of traditional publication. Typically these are not indexed or made available for searching other than as ordinary web documents. It is difficult using conventional search engines to link these to datasets or indeed to search them using terminology other than that employed by the authors. Cultural heritage and memory institutions generally are seeking to expose databases and repositories of digitised items, previously confined to specialists, to a wider academic and general audience. The mapping from lay (or related subject area) terminology to technical vocabularies in a particular domain is a critical problem. There is a need for tools to help formulate and refine searches and navigate through the information space of concepts used to describe a collection. Different people use different words for the same concept or may employ slightly different concepts and this ‘vocabulary problem’ is a barrier to widening scholarly access. Aims To investigate the potential of semantic terminology tools for widening and improving access to digital archaeology resources, including disparate data sets and associated grey literature. Objectives Open up the grey literature to scholarly research by investigating the combination of linguistic and KOS-based methods in the digital archaeology domain. Develop new methods for enhancing linkages between digital archive database resources and to associated grey literature, exploiting the potential of a high level, core ontology. Apply multi-concept query generalisation techniques to archaeology cross-domain research. Design and implement a demonstrator search system, in collaboration with English Heritage. Evaluate the demonstrator with a view to cost / benefit issues and application more widely in the archaeological domain. Engage with the archaeological community to inform research and disseminate outcomes.
Methods usedCategory
Data miningData analysis
Data modellingData structuring and enhancement
DocumentationStrategy and project management
Human factors analysisStrategy and project management
IndexingData analysis
Iterative designStrategy and project management
PrototypingStrategy and project management
Searching and queryingData analysis
Server scriptingData publishing and dissemination
System quality assurance and code testingStrategy and project management
Text encoding - descriptiveData structuring and enhancement
Text miningData analysis
Usability analysisStrategy and project management
Use of existing digital dataData capture
Web browser scriptingData publishing and dissemination
Funding sources: 
Arts and Humanities Research Council (AHRC)
Content types created: 
Software tools used: 
  • C#
  • Javascript
Source material used:  
various archaeological datasets and vocabularies from English Heritage and other archaeological units and grey literature from the ADS (particularly OASIS grey literature). The STAR project addressed these concerns by developing semantic and natural language processing techniques to link digital archive databases and the associated grey literature, via an overarching core ontology framework, the CIDOC Conceptual Reference Model (CRM), extended for archaeological purposes by English Heritage. The work has required methods to be developed in mapping datasets to the core ontology, extracting semantic web representations in RDF and developing semantic search techniques that operate over RDF generated from various datasets (and also grey literature via Natural Language Processing techniques). Terminology web services have been developed based upon SKOS thesaurus representations. These terminology services allow access to the SKOS thesauri and glossaries in a variety of (browser neutral) user interface widgets. They can be employed in a wide variety of applications for both data entry and display purposes, where access to controlled terminology, browsing of concept structures or query expansion is required. SKOS concepts are linked to CRM entities.
Digital resource created:  
Working on a research demonstrator that will cross search archaeological datasets and grey literature. This should be avialable Autumn 2010 The demonstrator operates over datsets and grey literature text but access to the data is via the Demonstrator. The resource is a Demonstrator programme rather than the datasets themselves.
Access to digital resource:  
Open Access

Institutions affiliated with this project: 

UK HE institutions involved:
University of Glamorgan
Other institutions involved:
English Heritage
Royal School of Library and Information Science Denmark

Project staff and expertise: 

Principal staff member:Douglas Tudhope,
Other staff:PhD student(s), Postdoctoral researcher(s) / Research assistant(s)
External expertise:archaeological domain expertise, Keith May, English Heritage

