Comparison of optical character recognition software explained

This comparison of optical character recognition software includes:

Sortable table
NameFounded yearLatest stable versionRelease yearLicenseOnlineWindowsMac OS XLinuxBSDAndroidiOSProgramming languageSDK?LanguagesFontsOutput FormatsNotes
1989 16 2022 C/C++ 192[1] All fonts DOC, DOCX, XLS, XLSX, PPTX, RTF, PDF, HTML, CSV, TXT, ODT, DjVu, EPUB, FB2[2] ABBYY also supplies SDKs for embedded and mobile devices. Professional, Corporate and Site License Editions for Windows, Express Edition for Mac.[3]
1989 VBScript Works with structured, semi-structured, and unstructured documents.
Asprise OCR SDK 1998 15 2015 Java, C#,VB.NET, C/C++/Delphi 20+[4] Plain text, searchable PDF, XML[5] Java, C#, VB.NET, C/C++/Delphi SDKs for OCR and Barcode recognition on Windows, Linux, Mac OS X and Unix.[6]
1996 1.1 2011 C/C++ 28 Any printed font HTML, hOCR, native, RTF, TeX, TXT[7] Enterprise-class system, can save text formatting and recognizes complicated tables of any structure
E-aksharayan201014RTF, TXT, BRL
2000 0.52[8] 2018 [9] C 20+
2015YesBrowserBrowserBrowserUnknownUnknownYes200+All fontstextGoogle blog post[10] [11]
Office 2007 2007 Uses OmniPage
2011 2007
2009-03 0.8.5 2022 Python Features a full user interface and has a command-line tool for automatic operations. Has its own segmentation algorithm but uses system-wide OCR engines like Tesseract or Ocrad
0.29[12] 2024 C++ Latin alphabet Command line
2007 1.3.3 2017 Python All languages using Latin script (other languages can be trained) Normal Latin script and Fraktur (other scripts can be trained) TXT, hOCR,[13] PDF[14]
1970s 19.22015 C/C++, C#[15] 125[16] Machine and handprinted fonts DOC/DOCX XLS/XLSX PPTX RTF PDF PDF/A Searchable PDF HTML Text XML ePUB MP3 Product of Nuance Communications
2009 C# 28 Any printed font .NET OCR SDK based on Cognitive Technologies' CuneiForm recognition engine. Wraps Puma COM server and provides simplified API for .NET applications
14 Scan, capture and classify business documents such as invoices, forms and purchase orders integrated with business processes.
For working with localized interfaces, corresponding language support is required.
1991 10.5.8 2015 For musical scores
1985 5.3.3 2023 C++, C 100+[17] Any printed font Text, ALTO, hOCR,[18] PDF, others with different user interfaces[19] or the API Created by Hewlett-Packard; under further development by Google[20]
NameFounded yearLatest stable versionRelease yearLicenseOnlineWindowsMac OS XLinuxBSDAndroidiOSProgramming languageSDK?LanguagesFontsOutput FormatsNotes

Evaluation

A 2016 analysis of the accuracy and reliability of the OCR packages Google Docs OCR, Tesseract, ABBYY FineReader, and Transym, employing a dataset including 1227 images from 15 different categories concluded Google Docs OCR and ABBYY to be performing better than others.[21]

Notes and References

  1. Web site: ABBYY FineReader 14: Technical Specifications . Finereader.abbyy.com . 2017-02-23.
  2. Web site: ABBYY FineReader 11: Technical Specifications . Finereader.abbyy.com . 2013-09-12.
  3. Web site: Top OCR Software . Ocrworld.com . 2010-03-30 . 2013-09-12 . https://web.archive.org/web/20170223213719/http://ocrworld.com/software/5-in-depth/149-top-ocr-software.html . 2017-02-23 . dead .
  4. Web site: Asprise OCR SDK Features . asprise.com . 2014-06-21.
  5. Web site: Asprise Java OCR Library Features . asprise.com . 2014-06-21.
  6. Web site: Asprise Java, C#/VB.NET OCR API . asprise.com . 2015-11-19 . 2015-11-19.
  7. [Debian]
  8. Web site: GOCR Homepage . wasd.urz.uni-magdeburg.de . 2018-10-17.
  9. Web site: GOCR . Jocr.sourceforge.net . 2013-09-12.
  10. Web site: Supported languages . Feb 11, 2022.
  11. Web site: IEEE SPS: Optical Character Recognition for Most of the World's Languages. https://ghostarchive.org/varchive/youtube/20211220/E0y41YU85tI . 2021-12-20 . live. Sep 4, 2015. Ashok Popat. .
  12. Diaz . Antonio . GNU Ocrad 0.29 released . info-gnu . 2024-01-20 .
  13. OCRopus includes the ocropus-hocr tool which produces hOCR from the recognition results.
  14. In combination with the hocr-tools
  15. Web site: OmniPage CSDK - OCR Document Capture Toolkit | Document Imaging & OCR . Nuance . 2013-09-12 . https://web.archive.org/web/20100824000156/http://www.nuance.com/imaging/omnipage/omnipage-csdk.asp . 2010-08-24 . dead .
  16. Web site: OmniPage Standard Document Conversion . Nuance . 2014-02-25 . https://web.archive.org/web/20140313193719/http://www.nuance.com/for-business/by-product/omnipage/standard/index.htm . 2014-03-13 . dead .
  17. Based on count of language training files for version 3.04. Available at the download page.
  18. Usage explained in the Tesseract Readme and FAQ
  19. Such as ODF with OCRFeeder
  20. Web site: GitHub - tesseract-ocr/tesseract: Tesseract Open Source OCR Engine (main repository). . 2018-11-05.
  21. Web site: OCR as a Service: An Experimental Evaluation of Google Docs OCR, Tesseract, ABBYY FineReader, and Transym. Assefi. Mehdi. 2016-12-01. ResearchGate. 2019-01-31.