ReadME - a simple online character recognizer

Richard Suchenwirth 2001-06-18 - "Mouseprinting" is short for "drawing handprint characters with a mouse". Here I show a simple recognizer for mouseprinting, "Reading Mouseprint Environment", or briefly ReadME. More officially, this technology is called "online character recognition" and an indispensable component e.g. for PDAs without keyboard. (They probably didn't do it in two pages of Tcl ;-)

WikiDbImage readme.jpg

You can draw on the little square canvas, and on leaving it, the strokes are skeletonized - displayed in red, with yellow boxes at corners - and subsampled to a 5 by 3 grid. The resulting normalized pattern is tried to match within a certain distance to the reference patterns. The normalized shape is displayed in red, the best matched template in blue, in the top corners of the drawing canvas. Small as this plaything is, it can reasonably well recognize A-Z 1-9, and even some Japanese Hiragana and Kanji thrown in...

Initially, you get a default set of templates. You can however record characters you wrote with the "Add" button (they will then be displayed in the big canvas at the bottom), and save such a set of drawings to a .raw file. Such files (with proper labels to the characters) can be loaded as learn or test sets. "Learn" adds the current character to the learn set; "Run" walks over all chars in the test set and computes read and error rates.


Anyone who creates a set of templates matching the PalmOS Graffiti strokes please consider adding them somewhere online and inserting the URL here!

 namespace eval readme {
  proc main {} {
    variable a
    foreach i [winfo children .] {destroy $i} ;# for repeated sourcing
    option add *Button.borderwidth 1
    option add *Button.padY 0
    
    label .0 -text "readme - a little online character recognizer"\
        -relief ridge -font {Times 14 italic} -fg blue
    label .111 -text "Learn set:"
    entry .112 -textvar readme::a(templatefile)
    label .113 -textvar readme::a(statistics) -height 4
    frame .115
    button .115.1 -text Load -command {readme::load learn}
    button .115.2 -text Save -command {readme::save learn}
    button .115.3 -text Learn -command readme::learn
    eval pack [winfo children .115] -side left
    
    label .121 -text Testset:
    entry .122 -textvar readme::a(rawfile)
    bind .122 <Return> {readme::load test}
    entry .123 -textvar readme::a(threshold)
    bind .123 <Return> readme::runAll
    message .124 -textvar readme::a(rates) -pady 0 -width 100
    frame .125
    button .125.1 -text Load    -command {readme::load test}
    button .125.2 -text Save    -command {readme::save test}
    button .125.3 -text " Run " -command readme::runAll
    eval pack [winfo children .125] -side left
    canvas .131 -width 100 -height 100 -bg white -relief raised -borderwidth 2
    set a(canvas) .131
    bind .131 <1>                {readme::start %W %x %y}
    bind .131 <B1-Motion>        {readme::move %W %x %y}
    bind .131 <B1-ButtonRelease> {readme::done %W %x %y}
    bind .131 <Leave>            {readme::recognize}
    frame .135
    button .135.0 -text Add -command readme::add
    entry  .135.1 -width 4 -just center -textvar ::readme::a(result)
    button .135.3 -text Clear    -command readme::clear
    eval pack [winfo children .135] -side left -fill y
    label .141 -text "Normalized:" -just left
    entry .142 -textvar readme::a(last) -bg white -fg red -just left
    bind  .142 <Return> {readme::recognize $readme::a(last)}
    label .143 -text "Best match:" -just left
    entry .144 -textvar readme::a(best) -fg blue -relief flat
    entry .145 -textvar readme::a(detail) -width 30 -relief flat

    frame .2
    canvas .2.c -bg white -height 150 -width 500 -yscrollcommand {.2.y set}
    scrollbar .2.y -ori vert -command {.2.c yview}
    pack .2.y -side right -fill y
    pack .2.c -side right -expand 1 -fill both
    set a(picker) .2.c

    grid .0     -    -    -  -sticky ew -padx 2
    grid .111 .121 .131 .141 -sticky w -padx 2
    grid .112 .122  ^   .142 -sticky we -padx 2
    grid .113 .123  ^   .143 -sticky w -padx 2
    grid   ^  .124  ^   .144 -sticky new -padx 2
    grid .115 .125 .135 .145 -sticky w -padx 2
    grid .2   -  -    -    -  -sticky news
    grid rowconfigure . {0 1 2 3 4 5} -weight 0
    grid rowconfigure . 6 -weight 1

    bind . <Control-c> {catch {console show}} ;# for inspecting debug output
    bind . <Control-r> [list source [info script]]
    wm protocol . WM_DELETE_WINDOW readme::exit
    init
  }
  proc init {} {
    variable a
    variable learn; set learn(last) 0
    variable test;  set test(last) 0
    set a(new) 0
    set a(threshold) 0.4 ;# threshold for auto-accept
    set a(pickn) 0; set a(pickdx) 10; set a(pickdy) 10
    set a(target) test
    set a(templatefile) "(default)"
    set a(fname) readme.txt
    set a(templates) {
        A {{0 2 2 0 4 2} {1 1 3 1}}
        B {{1 0 1 2 4 2 1 1 3 0 0 0}}
        C {{4 0 0 0 3 2}}
        D {{1 0 1 2 3 0 1 0}}
        E {{0 0 1 2 4 2} {1 1 4 1} {0 0 3 0}}
        F {{0 2 0 0 4 0} {0 1 3 1}}
        G {{3 0 0 1 1 2 4 1 2 1}}
        H {{0 0 0 2} {0 1 4 1} {4 0 4 2}}
        I {{0 0 0 2}}
        I {{4 0 0 2}}
        J {{0 0 4 0 2 2 1 1}}
        K {{0 0 0 2 4 0} {2 1 4 2}}
        L {{0 0 0 2 4 2}}
        M {{0 2 1 0 2 1 3 0 4 2}}
        N {{0 2 1 0 3 2 4 0}}
        O {{3 0 1 0 0 2 4 2 3 0}}
        P {{0 2 1 0 4 1 1 1}}
        Q {{4 0 0 0 1 2 3 1 3 0} {2 1 4 2}}
        R {{1 2 1 0 3 1 0 1 4 2}}
        S {{4 0 1 1 4 2 1 2}}
        T {{0 0 4 0} {2 0 2 2}}
        U {{0 0 1 2 3 2 4 0}}
        V {{0 0 2 2 4 0}}
        W {{0 0 1 2 2 0 3 2 4 0}}
        X {{4 0 1 2} {0 0 3 2}}
        Y {{0 0 2 1 4 0} {2 1 2 2}}
        Z {{0 0 4 0 0 2 3 2}}
        - {{0 0 4 0}}
        + {{0 1 4 1} {2 0 2 2}}
        1 {{0 1 4 0 2 2}}
        2 {{0 1 2 0 3 1 1 2 4 2}}
        3 {{1 0 3 0 4 1 3 1 2 2 0 2}}
        4 {{0 0 0 1 4 1} {2 0 2 2}}
        5 {{0 0 4 0} {0 0 0 1 3 1 3 2 0 2}}
        6 {{4 0 0 1 3 2 4 1 1 1}}
        7 {{0 0 3 0 3 2} {2 1 4 1}}
        8 {{4 0 1 0 3 2 0 1 3 0}}
        9 {{3 0 0 0 2 1 4 0 0 2}}
    }
    set a(statistics) [statistics]
  }
  ################################## Mouse event handlers
  proc start {w x y} {
    variable a
    set x0 [$w canvasx $x]
    set y0 [$w canvasy $y]
    set a(id) [$w create line $x0 $y0 $x0 $y0 -width 3 -tags line]
    set a(new) 1
  }
  proc move {w x y} {
    variable a
    set x0 [$w canvasx $x]
    set y0 [$w canvasy $y]
    eval $w coords $a(id) [concat [$w coords $a(id)] $x0 $y0]
  }
  proc done {w x y} {
    variable a
    set coords [$w coords $a(id)]
    if {[llength $coords]>4} {
        set skeleton [eval $w create line [straighten $coords] \
            -fill red -width 2 -tag skeleton]
        showPoints $w $skeleton
    }
  }
  proc showPoints {c id} {
    foreach {x y} [$c coords $id] {
        $c create rect [expr {$x-2}] [expr {$y-2}] \
            [expr {$x+2}] [expr {$y+2}] -fill yellow
    }
  }
  proc statistics {} {
    variable a
    variable statistics
    set templates $a(templates)
    catch {unset statistics}
    foreach {label template} $templates {
        if {![info exists statistics($label)]} {
            set statistics($label) 1
        } else {incr statistics($label)}
    }
    set total [expr [llength $templates]/2]
    set nclasses [array size statistics]
    foreach {label n} [array get statistics] {
        lappend t [list [subst $label] $n]
    }
    set top [lrange [lsort -integer -decr -index 1 $t] 0 2]
    return "$total templates, $nclasses classes\ntop: $top"
  }
  proc exit {} {
    variable a
    if [info exists a(learnt)] {save learn}
    if [info exists a(added)]  {save test}
    ::exit 0
  }
  proc load {mode} {
    variable a
    set filename [tk_getOpenFile -title "Select $mode set"\
         -filetypes {{"Raw files" .raw} {"All files" .*}}]
    if ![file readable $filename] return
    if {$mode=="learn"} {
        variable learn
        if [info exists a(learnt)] {save learn}
        set a(target) learn
        unset learn; set learn(last) 0
        set a(templates) {}
        source $filename
        set a(target) test
        set a(statistics) [statistics]
        set a(templatefile) $filename
    } ;#--------- fall through to test mode for self-test
    set fp [open $filename r]
    set a(rawdata) [split [read $fp [file size $filename]] \n]
    set a(rawfile) $filename
    close $fp
    pick
    runAll
  }
  proc save {mode} {
    variable a
    if {$mode=="learn"} {
        variable learn
        if {$a(templatefile)=="(default)"} return
        set n 0
        set fp [open $a(templatefile) w]
        foreach i [lsort [array names learn *,label]] {
            regsub ,label $i ,raw rawindex
            incr n
            puts $fp [list + $learn($i) $learn($rawindex)]
        }
        close $fp
        set a(detail) "saved $n samples to $a(templatefile)"
        catch {unset a(learnt)}
    } else {
        set filename $a(rawfile)
        set    fp [open $filename w]
        puts  $fp [join $a(rawdata) \n]
        close $fp
        set a(detail) "saved [llength $a(rawdata)] samples to $a(rawfile)"
        catch {unset a(added)}
    }
  }
  proc + {label lines} {
    # the funny name is so "+ X {...}" in .raw files can be evalled
    variable a; set c $a(canvas)
    if {$a(target)=="learn"} {
        variable learn
        set id [incr learn(last)]
        set learn($id,label) $label
        set learn($id,raw) $lines
        lappend a(templates) $label [preprocess $lines]
    } else {
        clear
        foreach line $lines {
            eval $c create line $line -width 4 -tag line
            set skeleton [eval $c create line [straighten $line] \
                -fill red -width 2 -tag skeleton]
            showPoints $c $skeleton
        }
        recognize
    }
  }
  proc runAll {} {
    variable a
    . config -cursor watch; update
    foreach i {auto autoerr man manerr rej} {set N($i) 0}
    set n 0
    set time [time {
      foreach i $a(rawdata) {
        lassign $i op label data
        if {$op == "+"} {
            incr n
            set decision [+ $label $data]
            set res1 [lindex $decision 0]
            set ldecision [llength $decision]
            if {$ldecision==0} {
                incr N(rej)
                $a(picker) itemconfig pick$n -fill gray60
            } elseif {$ldecision==1} {
                incr N(auto)
                if {$res1!=[subst $label]} {
                    incr N(autoerr)
                    $a(picker) itemconfig pick$n -fill red
                } else {
                    $a(picker) itemconfig pick$n -fill green3
                }
            } else {
                incr N(man)
                if {$res1!=[subst $label]} {
                    incr N(manerr)
                    $a(picker) itemconfig pick$n -fill brown
                } else {
                    $a(picker) itemconfig pick$n -fill orange
                }
            }
         }
    }}]
    if {$n} {
        set t "Auto $N(auto)/$n: "
        append t "[expr {round($N(auto)*100./$n)}]%\n"
        append t "Errors: $N(autoerr)/$N(auto): "
        catch {append t [expr {round($N(autoerr)*100./$N(auto))}]%}
        append t "\n"
        append t "Man $N(man)/$n: "
        catch {append t "[expr {round($N(man)*100./$n)}]%\n"}
        append t "Errors: $N(manerr)/$N(man): "
        catch {append t "[expr {round($N(manerr)*100./$N(man))}]%"}
        set a(rates) $t
        set a(detail) "[format %.1f [expr {[lindex $time 0]*0.001/$n}]] ms/char"
    }
    . config -cursor {}
  }
  ################################### routines for the picker canvas
  proc pick {} {
    variable a
    set c $a(picker)
    set a(pickdx) 10; set a(pickdy) 10
    set a(pickn) 0
    $c delete all
    foreach i $a(rawdata) {
        lassign $i op label data
        if {$op == "+"} {addThumbnail $label $data}
    }
  }
  proc addThumbnail {label data} {
    variable a; set c $a(picker)
    set n [incr a(pickn)]
    foreach i $data {eval $c create line $i -width 2 -tag pick$n}
    $c create text 50 120 -text $n:[subst $label] -fill blue -tag pick$n
    $c move pick$n $a(pickdx) $a(pickdy)
    incr a(pickdx) 100
    if {$a(pickdx)>1600} {set a(pickdx) 10; incr a(pickdy) 150}
    $c scale pick$n 0 0 0.3 0.3
    $c bind pick$n <1> [list readme::+ $label $data]
    $c config -scrollregion [$c bbox all]
  }
  ############################################# Button event handlers
  proc add {} {
    variable a; set c $a(canvas)
    set rawlines {}
    foreach i [$c find withtag line] {
        lappend rawlines [lrange [$c coords $i] 2 end]
    }
    if ![llength $rawlines] return
    regsub -all {\.0} $rawlines "" rawlines
    lappend a(rawdata) [list + $a(result) $rawlines]
    addThumbnail $a(result) $rawlines
    set a(added) 1
  }
  proc clear {} {
    variable a; set c $a(canvas)
    $c delete all
    $c config -bg white
    set a(result) ""
    set a(detail) ""
    set a(new) 0
  }
  proc learn {} {
    variable a
    variable statistics
    if {$a(result)==""} {error "Nothing to learn - specify a label!"}
    set res ""
    set templates $a(templates)
    set pos [lsearch $templates $a(last)]
    if {$pos>=0} {
        set oldlabel [lindex $templates [expr $pos-1]]
        if {$statistics($oldlabel)==1} {
            error "cannot remove the only sample for $oldlabel"
        }
        set res "removed a '$oldlabel', "
        set a(templates) [lreplace $templates [expr $pos-1] $pos]
        incr statistics($oldlabel) -1
    }   ;# remove conflicting competitor
    set label $a(result)
    lappend a(templates) $label $a(last)
    if ![info exists statistics($label)] {
        set statistics($label) 0
        append res "created a new class, "
    }
    incr statistics($label)
    set a(learnt)     1
    set a(detail)     [append res "learned a '$label'"]
    set a(statistics) [statistics]
  }
  ############################################## Preprocessing
  proc preprocess rawlines {
    set lines {}
    foreach i $rawlines {lappend lines [straighten $i]}
    normalize $lines [bbox [join $lines]]
  }
  proc bbox xys {
    foreach i {minx miny} {set $i 999999}
    foreach i {maxx maxy} {set $i -999999}
    foreach {x y} $xys {
        if {$x<$minx} {set minx $x} elseif {$x>$maxx} {set maxx $x}
        if {$y<$miny} {set miny $y} elseif {$y>$maxy} {set maxy $y}
    }
    list $minx $miny $maxx $maxy
  }
  proc straighten coords {
    lassign $coords x0 y0 x1 y1         ;# get first two points
    set res [list $x0 $y0]              ;# keep first point
    foreach {x2 y2} [lrange $coords 2 end] {
        if {abs($x2-$x1)<5 && abs($y2-$y1)<5} continue
        set d01 [expr {hypot($x0-$x1, $y0-$y1)}]
        set d02 [expr {hypot($x0-$x2, $y0-$y2)}]
        set d12 [expr {hypot($x1-$x2, $y1-$y2)}]
        if {$d02>0 && (($d01+$d12)/$d02)>1.05 && ($d01+$d12-$d02)>2} {
            lappend res $x1 $y1
            set x0 $x1; set y0 $y1
        }
        set x1 $x2; set y1 $y2
    }
    if {[llength $res]==2 || abs($x0-$x1)>3 || abs($y0-$y1)>3} {
        lappend res $x1 $y1
    }
    set res
  }
  proc normalize {lines bbox} {
    set xsteps 4.0; set ysteps 2.0
    lassign $bbox minx miny maxx maxy
    set xstep [expr {($maxx-$minx)<10? 100: ($maxx-$minx)/$xsteps}]
    set ystep [expr {($maxy-$miny)<10? 100: ($maxy-$miny)/$ysteps}]
    set res {}
    foreach line $lines {
        set t {}
        set lasty2 -; set lastx -; set lasty -
        foreach {x y} $line {
            set x [expr {round(($x-$minx)/$xstep)}]
            set y [expr {round(($y-$miny)/$ystep)}]
            if {($x!=$lastx && $y!=$lasty2) || $y!=$lasty} {
                lappend t $x $y
                set lasty2 $lasty; set lastx $x; set lasty $y
            } ;# maybe suppress duplicated or collinear points
        }
        lappend res $t
    }
    set res
  }
  ############################################### Recognition
  proc recognize {{lines ""}} {
    variable a; set c $a(canvas)
    $c delete template
    if {$lines == ""} {
        foreach i [$c find withtag line] {lappend lines [$c coords $i]}
        if {![llength $lines]} return
        set lines [preprocess $lines]
    }
    if {[llength [join $lines]]%2} {error "non-paired linelist"}
    set a(last) $lines
    set res [classify $lines $a(templates)]
    set a(detail) $res
    set decision [decide $res $a(threshold)]
    set a(result) [lindex $decision 0]
    switch [llength $decision] {
        0 {set a(detail) "No results"; $c config -bg gray60}
        1 {$c config -bg white}
        2 {$c config -bg yellow}
    }
    viewTemplate $lines 5 red
    viewTemplate [lindex $a(best) 2] 80 blue
    set decision
  }
  proc classify {lines templates} {
    variable a
    set res {}
    set best -9; set a(best) ""
    set clines  [join $lines]
    set llines [llength $lines]
    set lclines [llength $clines]
    foreach {label data} $templates {       
        set cdata [join $data]
        set slabel [subst $label] ;# useful for \u.. escaped Unicodes
        if {$lclines==[llength $cdata] && $llines==[llength $data]} {
            set d 0
            foreach i $clines j $cdata {set d [expr {$d+abs($i-$j)}]}
            set cred [expr {1.0 - double($d)/$lclines}]
            if {$cred>$best} {
                set best $cred
                set a(best) [list $slabel [format %.2f $cred] $data]
            }
        } else {set cred 0.0}
        if {$cred>0.0} {lappend res [list $slabel [format %.2f $cred]]}
    }
    lsort -real -decreasing -index 1 $res
  }
  proc decide {res th} {
    if [llength $res] {
        lassign [lindex $res 0] res1 cred1
        if {[llength $res]>1} {
            lassign [lindex $res 1] res2 cred2
        } else {set res2 ""; set cred2 0.0}
        if {$cred1>$th && (($cred1-$cred2)>0.07 || $res1==$res2)} {
            return $res1
        } else {return [list $res1 ?]}
    } else {return {}}
  }
  proc viewTemplate {template dx {color black}} {
    variable a; set c $a(canvas)
    foreach line $template {
        set t {}
        foreach {x y} $line {
            lappend t [expr {$x*4+$dx}] [expr {$y*7+5}]
        }
        if {[llength $t]<4} {set t [concat $t $t]}
        eval $c create line $t -fill $color -tag template
    }
  }
 } ;# -- end namespace readme

 readme::main

TV Wonderful.

I had trouble writing a recogniseable, B, maybe I should do my own reference set? (Is there a way to 'see' the reference characteristics). I'll probably look at the code at some point to figure it out.. - RS: Apologies for the intro that is a bit too short. Write a B into the square canvas; label it as B in the entry below, if not correctly recognized; click Add to add it to learn set; repeat as often as needed until they get correctly recognized. Save to a file with (Learn set:)Save button. Such files are plain text and contain for each character the label and the pixel coordinates. For classification these are subsampled, as shown in the "Normalized" and '"Best match" fields.


LV RS, I have a palm, with physical documents which describe PalmOS grafitti - what does it take to create templates for ReadME? RS: see a few lines above :^)

  • write them (in the square canvas)
  • label them (in the entry below)
  • add them (Add button)
  • save the set to a "raw" file

LV Ah - I saw that comment - but didn't see a mention of templates, so I didn't know if something more was needed. Thanks.


Arts and crafts of Tcl-Tk programming