Comparison of optical character recognition software explained

This comparison of optical character recognition software includes:

OCR engines, that do the actual character identification
Layout analysis software, that divide scanned documents into zones suitable for OCR
Graphical interfaces to one or more OCR engines
Software development kits that are used to add OCR capabilities to other software (e.g. forms processing applications, document imaging management systems, e-discovery systems, records management solutions)

Sortable table
Name	Founded year	Latest stable version	Release year	License	Online	Windows	Mac OS X	Linux	BSD	Android	iOS	Programming language	SDK?	Languages	Fonts	Output Formats	Notes
	1989	16	2022									C/C++		192^[1]	All fonts	DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2^[2]	ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.^[3]
	1989											VBScript					Works with structured, semi-structured, and unstructured documents.
Asprise OCR SDK	1998	15	2015									Java, C#,VB.NET, C/C++/Delphi		20+^[4]		Plain text, searchable PDF, XML^[5]	Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.^[6]
	1996	1.1	2011									C/C++		28	Any printed font	HTML, hOCR, native, RTF, TeX, TXT^[7]	Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
E-aksharayan	2010													14		RTF, TXT, BRL
	2000	0.52^[8]	2018		^[9]							C		20+
			2015		Yes	Browser	Browser	Browser	Unknown			Unknown	Yes	200+	All fonts	text	Google blog post^[10] ^[11]
		Office 2007	2007														Uses OmniPage
	2011		2007
	2009-03	0.8.5	2022									Python					Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
		0.29^[12]	2024									C++		Latin alphabet			Command line
	2007	1.3.3	2017									Python		All languages using Latin script (other languages can be trained)	Normal Latin script and Fraktur (other scripts can be trained)	TXT, hOCR,^[13] PDF^[14]
	1970s	19.2	2015									C/C++, C#^[15]		125^[16]	Machine and handprinted fonts	DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3	Product of Nuance Communications
			2009									C#		28	Any printed font		.NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
			14														Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
																	For working with localized interfaces, corresponding language support is required.
	1991	10.5.8	2015														For musical scores
	1985	5.3.3	2023									C++, C		100+^[17]	Any printed font	Text, ALTO, hOCR,^[18] PDF, others with different user interfaces^[19] or the API	Created by Hewlett-Packard; under further development by Google^[20]
Name	Founded year	Latest stable version	Release year	License	Online	Windows	Mac OS X	Linux	BSD	Android	iOS	Programming language	SDK?	Languages	Fonts	Output Formats	Notes

Evaluation

A 2016 analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others.^[21]

Notes and References

Web site: ABBYY FineReader 14: Technical Specifications . Finereader.abbyy.com . 2017-02-23.
Web site: ABBYY FineReader 11: Technical Specifications . Finereader.abbyy.com . 2013-09-12.
Web site: Top OCR Software . Ocrworld.com . 2010-03-30 . 2013-09-12 . https://web.archive.org/web/20170223213719/http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html . 2017-02-23 . dead .
Web site: Asprise OCR SDK Features . asprise.com . 2014-06-21.
Web site: Asprise Java OCR Library Features . asprise.com . 2014-06-21.
Web site: Asprise Java, C#/VB.NET OCR API . asprise.com . 2015-11-19 . 2015-11-19.
[Debian]
Web site: GOCR Homepage . wasd.urz.uni-magdeburg.de . 2018-10-17.
Web site: GOCR . Jocr.sourceforge.net . 2013-09-12.
Web site: Supported languages . Feb 11, 2022.
Web site: IEEE SPS: Optical Character Recognition for Most of the World's Languages. https://ghostarchive.org/varchive/youtube/20211220/E0y41YU85tI . 2021-12-20 . live. Sep 4, 2015. Ashok Popat. .
Diaz . Antonio . GNU Ocrad 0.29 released . info-gnu . 2024-01-20 .
OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results.
In combination with the hocr-tools
Web site: OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR . Nuance . 2013-09-12 . https://web.archive.org/web/20100824000156/http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp . 2010-08-24 . dead .
Web site: OmniPage Standard Document Conversion . Nuance . 2014-02-25 . https://web.archive.org/web/20140313193719/http://www.nuance.com/for-business/by-product/omnipage/standard/index.htm . 2014-03-13 . dead .
Based on count of language training files for version 3.04. Available at the download page.
Usage explained in the Tesseract Readme and FAQ
Such as ODF with OCRFeeder
Web site: GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository). . 2018-11-05.
Web site: OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. Assefi. Mehdi. 2016-12-01. ResearchGate. 2019-01-31.