Optical Character Recognition -- typically the art of teaching a computer to read printed text (provided as scanned images). [[...]] Three principal open-source engines: * [http://jocr.sf.net/%|%GOCR] (appears to have a Tcl/Tk frontend) * [http://www.gnu.org/software/ocrad/ocrad.html%|%Ocrad] (GNU) * [http://sf.net/projects/tesseract-ocr%|%Tesseract OCR] (originally Hewlett-Packard, but now released as open source) Recommended proprietary packages: * [http://www.vividata.com/index.html%|%Vividata OCR Shop XTR Lite] * [http://www.hamrick.com/%|%VueScan ] Examples: * [Tcl does OCR with TWAPI and Microsoft Office] ----- http://www.cs.berkeley.edu/~fateman/kathey/ocrchie.html%|%OCRchie%|%: Modular Optical Character Recognition Software in C++ with Tcl/Tk interface. ----- http://capture2text.sourceforge.net/ has a CLI, options to tune the OCR and lots of languages packages available. Windows only. Example I used (with whitelist and blacklist options) : ====== proc OCR {image {whitelist ""} {blacklist ""}} { global OCR_path if {$whitelist ne ""} { if {$blacklist ne ""} { return [exec $OCR_path/Capture2Text_CLI.exe -i $image --whitelist $whitelist --blacklist $blacklist] } else { return [exec $OCR_path/Capture2Text_CLI.exe -i $image --whitelist $whitelist ] } } else { if {$blacklist ne ""} { return [exec $OCR_path/Capture2Text_CLI.exe -i $image --blacklist $blacklist] } else { return [exec $OCR_path/Capture2Text_CLI.exe -i $image] } } } ====== ----- <> Glossary | Handwriting Recognition| Human Language| Image Processing| Word and Text Processing