Topic Detection and Tracking

Topic Detection and Tracking (TDT) refers to systems that monitor topically related material and sources, for example news stories, by algorithmic means and track these as they change over time. This data can be in a variety of different types of media formats, such as video, audio and text.

Applications of a TDT system include:

  • Story Segmentation: detection of changes in sections about related topics
  • Topic Tracking: following stories about particular topics, by comparison with a defined set of example stories
  • Topic Detection: compiling groups, or ‘clusters’ of stories about a particular topic
  • New Event Detection: determining whether a story is the first one written that relates to a particular topic
  • Link Detection: ascertaining whether two stories are linked by their topics

Related methods include: Data mining and Text mining.

tool: Solr

Solr is an open source enterprise search platform from the Apache Lucene project. It operates as a standalone full-text search server within an appropriate servlet container, such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language.
Features: 
Specifications:
Suite:
Platform:
Licence:
Methods relating to this toolCategory
Cataloguing and indexingData structuring and enhancement
CollatingData analysis
CollocatingData analysis
Content analysisData analysis
Data miningData analysis
IndexingData analysis
Searching and queryingData analysis
Text miningData analysis
Topic Detection and TrackingData analysis
Lifecycle stage:
Alternate tool(s):

Sphynx

tool: Lucene

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.
Features: 
Specifications:
Suite:
Platform:
Licence:
Methods relating to this toolCategory
Cataloguing and indexingData structuring and enhancement
CollatingData analysis
CollocatingData analysis
Content analysisData analysis
Data miningData analysis
IndexingData analysis
ParsingData analysis
Text miningData analysis
Topic Detection and TrackingData analysis
Lifecycle stage:
Alternate tool(s):

InQuira, Verity, dtSearch, ISYS

tool: CONTENTdm

CONTENTdm is digital collection management software that allows for the upload, description, management and access of digital collections. CONTENTdm is mostly used by libraries, archives, museums, government agencies, universities, corporations, historical societies, and other organizations that wish to host a digital collection.
Features: 
Specifications:
Suite:
Licence:
Methods relating to this toolCategory
Cataloguing and indexingData structuring and enhancement
Content analysisData analysis
IndexingData analysis
Searching and queryingData analysis
Topic Detection and TrackingData analysis
Lifecycle stage:
Alternate tool(s):

Fedora Commons, DigiTool, MetaStar

tool: MantisBT

MantisBT is a free popular web-based bugtracking system written in the PHP scripting language. The most common use of MantisBT is to track software defects. However, MantisBT is often configured by users to serve as a more generic issue tracking system and project management tool.
Features: 
Specifications:
Methods relating to this toolCategory
Accessibility analysisStrategy and project management
General project managementStrategy and project management
System quality assurance and code testingStrategy and project management
Topic Detection and TrackingData analysis
Lifecycle stage:
Alternate tool(s):

JIRA, Trac, Bugzilla

tool: Concordance

A software tool for performing concordance – the analysis of a set of words within its immediate context - on a body of text. The tool performs full concordance, reading and analysing each and every word in a text. It was initially written for the analysis of English texts, but has since been extended to cater for other Western languages. Limited support is also provided for text in East Asian scripts, such as Chinese and Korean.
Features: 
Specifications:
Licence:
Methods relating to this toolCategory
CollocatingData analysis
Content analysisData analysis
Text encoding - descriptiveData structuring and enhancement
Text recognitionData capture
Topic Detection and TrackingData analysis
Lifecycle stage:
Alternate tool(s):
Syndicate content