OCR

Difference between version 9 and 10 - Previous - Next
Optical Character Recognition -- typically the art of teaching a computer to read printed text (provided as scanned images).

[[...]]

Three principal open-source engines:
   * [http://jocr.sf.net/%|%GOCR] (appears to have a Tcl/Tk frontend)
   * [http://www.gnu.org/software/ocrad/ocrad.html%|%Ocrad] (GNU)
   * [http://sf.net/projects/tesseract-ocr%|%Tesseract OCR] (originally Hewlett-Packard, but now released as open source)

Recommended proprietary packages:
   * [http://www.vividata.com/index.html%|%Vividata OCR Shop XTR Lite]
   * [http://www.hamrick.com/%|%VueScan ]

Examples:
   * [Tcl does OCR with TWAPI and Microsoft Office]

-----
http://www.cs.berkeley.edu/~fateman/kathey/ocrchie.html%|%OCRchie%|%: Modular Optical Character Recognition Software in C++ with Tcl/Tk interface.
-----
http://capture2text.sourceforge.net/
has a CLI, options to tune the OCR and lots of languages packages available. Windows only.

Example I used (with whitelist and blacklist options) :

======
proc OCR {image {whitelist ""} {blacklist ""}} {
    global OCR_path
    if {$whitelist ne ""} {
        if {$blacklist ne ""} {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image --whitelist $whitelist --blacklist $blacklist]
        } else {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image --whitelist $whitelist ]
        }
    } else {
        if {$blacklist ne ""} {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image --blacklist $blacklist]
        } else {
            return [exec $OCR_path/Capture2Text_CLI.exe -i $image]
        }
    }
}
======


-----
<<categories>> Glossary | Handwriting Recognition| Human Language| Image Processing| Word and Text Processing