Version 10 of OCR

Updated 2020-01-17 14:51:18 by APE

Optical Character Recognition -- typically the art of teaching a computer to read printed text (provided as scanned images).

[...]

Three principal open-source engines:

  • GOCR (appears to have a Tcl/Tk frontend)
  • Ocrad (GNU)
  • Tesseract OCR (originally Hewlett-Packard, but now released as open source)

Recommended proprietary packages:

Examples:


OCRchie : Modular Optical Character Recognition Software in C++ with Tcl/Tk interface.


http://capture2text.sourceforge.net/ has a CLI, options to tune the OCR and lots of languages packages available. Windows only.

Example I used (with whitelist and blacklist options) :

proc OCR {image {whitelist ""} {blacklist ""}} {
    global OCR_path
    if {$whitelist ne ""} {
        if {$blacklist ne ""} {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image --whitelist $whitelist --blacklist $blacklist]
        } else {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image --whitelist $whitelist ]
        }
    } else {
        if {$blacklist ne ""} {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image --blacklist $blacklist]
        } else {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image]
        }
    }
}