Text recognition

Text recognition is also known as OCR (Optical Character Recognition). This term refers to the conversion of scanned images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text documents. OCR can also be used to produce text files from files containing images of alphanumeric characters, such as those produced by fax transmissions.

Conversion into a text file using OCR can dramatically reduce computer storage space needed, and allows the text to be reformatted or searched. The resulting text file can also be used as an input for text processing applications such as natural language processing.

OCR works using pattern recognition, which is a form of artificial intelligence that can identify individual text characters on a page, including punctuation marks, spaces, and ends of lines.

Although earlier systems had to be trained to recognise different fonts, most OCR systems today have a high degree of accuracy and do not require training to process Latin-script typewritten text. However, research into the recognition of handwritten text and printed text in other scripts is ongoing.

Related methods include: 2D Scanning and Searching and querying.

tool: FocusOPEN Digital Asset Manager

Open source Digital Asset Management solution designed for medium size preservation, cataloguing, media archiving and batch transcoding.
Methods relating to this toolCategory
AnimationPractice-led research
Cataloguing and indexingData structuring and enhancement
Collaborative publishingData publishing and dissemination
CollatingData analysis
Content analysisData analysis
CurationStrategy and project management
Data miningData analysis
General project managementStrategy and project management
Graphical interaction (synchronous)Communication and collaboration
Graphical renderingData structuring and enhancement
Image feature measurementData analysis
Image manipulationPractice-led research
Image segmentationData analysis
IndexingData analysis
Manual input and transcriptionData capture
OverlayingData analysis
PhotographyPractice-led research
PreservationStrategy and project management
Record linkagesData structuring and enhancement
Resource sharingCommunication and collaboration
Server scriptingData publishing and dissemination
Statistical analysisData analysis
Streaming mediaData publishing and dissemination
Text encoding - presentationalData structuring and enhancement
Text encoding - referentialData structuring and enhancement
Text miningData analysis
Text recognitionData capture
Textual interaction (asynchronous)Communication and collaboration
Textual interaction (synchronous)Communication and collaboration
Use of existing digital dataData capture
User contributed contentData publishing and dissemination
Video and moving image compressionData structuring and enhancement
Video editingPractice-led research
Video post productionPractice-led research
Video-based interaction (asynchronous)Communication and collaboration
Lifecycle stage:
Syndicate content