Archaeotools: Data mining, facetted classification and E-archaeology

Project start date: 2007-09 Project end date: 2009-09
This two year project built upon previous ADS work to develop tools (the Common Information Environment - Archaeobrowser project) using advanced data mining and knowledge capture technologies to allow archaeologists to discover, share and analyse datasets and legacy publications that had hitherto been very difficult to integrate into digital frameworks. The project had three interrelated objectives, each represented by a distinct workpackage.
Subject domains: 
Methods usedCategory
Accessibility analysisStrategy and project management
Audio-visual interaction (synchronous)Communication and collaboration
Collaborative publishingData publishing and dissemination
Content analysisData analysis
Data miningData analysis
DocumentationStrategy and project management
General project managementStrategy and project management
General website developmentData publishing and dissemination
Human factors analysisStrategy and project management
IndexingData analysis
Interface designData publishing and dissemination
Iterative designStrategy and project management
Resource sharingCommunication and collaboration
Risk managementStrategy and project management
Searching and queryingData analysis
Security planningStrategy and project management
Spatial data analysisData analysis
Statistical analysisData analysis
Text miningData analysis
Usability analysisStrategy and project management
Use of existing digital dataData capture
Funding sources: 
Arts and Humanities Research Council (AHRC), Engineering and Physical Sciences Research Council (EPSRC), Joint Information Systems Committee (JISC)
Content types created: 
Software tools used: 
  • Aleph
  • Java
  • Java Server Faces
  • Runes
  • Solr
  • T-rex
Source material used:  
The project consists of three work packages each dealing with a particular type of data. Workpackage 1 - The underlying dataset comprises over 1,000,000 records (held in Oracle RDBMS) aggregated from the National Monuments Records of Scotland, Wales and England as well as Historic Environment Records from numerous local authorities and the ADS’s own archive holdings. The facets selected will be standard hierarchical ‘What’, ‘Where’, and ‘When’ facets plus a ‘Media’ facet to allow the selection of particular subsets of resources. The facets are populated from existing thesauri (e.g. the Thesaurus of Monument types) in XML format and extended/integrated to allow for geographical differences, such as terminological differences in monument and period types between Scotland and England. The Archaeotools project also integrates thesauri served in XML by Simple Knowledge Organisation Systems (SKOS ) based web services developed by the AHRC-funded Semantic Tools for Archaeology project (STAR ) based at the University of Glamorgan. Work Package 2 - deals with primariy unpublished archaeological reports (grey literature), in total approximately 1000 reports ranging from 10 to 500 hundred of pages. These reports are published by a wide range of archaeological organisations. As an example, OASIS project actively gathers digital versions of grey literature fieldwork reports and currently holds around 2300. This total grows by around 50-100 reports a month; all reports can be downloaded, free of charge, from the ADS. Work Package 3 - The system is extended to capture metadata from legacy historical documents, using the PSAS (annual Proceedings of the Society of Antiquaries of Scotland, from 1851 to 1999) as an exemplar corpus and utilising the University of Edinburgh’s geoXwalk service to recast place names and locations extracted from text as national grid references (NGRs), allowing enhanced geospatial searching of the data.
Digital resource created:  
The ultimate goal of this project is to create a faceted search, browse and knowledge management system for archaeologists to access, share and re-use archaeological data. The working system will be online by early 2010, and a demonstration system is available at A registration is required for accessing the demo.
Access to digital resource:  
Open Access
Data Formats created: 
The Archaeotools project, faceted classification and natural language processing in an archaeological context.
Jeffrey, S., Richards, J., Ciravegna, F., Waller, S., Chapman, S. & Zhang, Z. The Archaeotools project, faceted classification and natural language processing in an archaeological context. UK e-Science All Hands Meeting 2008,
Philosophical Transactions of the Royal Society A, 2009 367, 2507-2519
doi: 10.1098/rsta.2009.0038

S. Jeffrey, J. Richards, F. Ciravegna, S. Waller, S. Chapman, Ziqi Zhang. When ontology and reality collide: the Archaeotools project, facetted classification and natural language processing in an archaeological context. In 36th Annual Conference on Computer Applications and Quantitative Methods in Archaeology On the Road to Reconstructing the Past (2008)

Z. Zhang and J. Iria. A Novel Approach to Automatic Gazetteer Generation using Wikipedia. In Proceedings of the ACL'09 Workshop on Collaboratively Constructed Semantic Resources, Singapore, August 2009.

Institutions affiliated with this project: 

UK HE institutions involved:
University of Sheffield
University of York

Project staff and expertise: 

Principal staff member:Prof. Julian Richards, Dr Stuart Jeffrey, Prof. Fabio Ciravegna, Stewart Waller, Ziqi Zhang, Sam Chapman, Tony Austin
Other staff:
External expertise:

Metadata on this record
Author(s) of recordZiqi Zhang
TitleArchaeotools: Data mining, facetted classification and E-archaeology
Record created2010-02-01
Record updated2010-06-11 11:17
URL of record
Citation of recordZiqi Zhang: Archaeotools: Data mining, facetted classification and E-archaeology. <> created: 2010-02-01, last updated 2010-06-11 11:17
Syndicate content