Playing SHARK

if 0 {Richard Suchenwirth 2005-09-02 - SHARK is for Shorthand Aided Rapid Keyboarding, also known as Shapewriter, a research project by IBM Almaden Laboratory [L1 ] . Here's an experiment to play with this concept in Tcl. Check the IBM site whether this concept is patented (it probably is), so best use this only as an illustrative toy.

WikiDbImage shark.jpg

The idea is that you do a pen (or mouse, with button 1 down) movement on the virtual keyboard, from the first to the last letter. The sequence of "visited" keys is matched against a dictionary according to the rule: if the first and the last letter match, and each character of a word is contained in the sequence, then that word is considered a candidate. Words not in the dictionary can be typed letter by letter (though you get blanks between them).

"Sharked" words are inserted into the text widget on top (alternatives separated by slashes - I was too lazy to add popup menu selection. For the last word, you also see the input string and the result in the window's title bar.

AK Regarding popup-menus, they are not really required. We have a Tk text widget. We can 'tag' the alternatives and arrange for a click on an alternative to select it as the text, it would also removes the other alternatives in its group from the display.

package require Tk
set keys {
    b d k g / c a n i m q / f l e s y x / j h t o p v / . . r u w z
#-- frequent English words, borrowed from somewhere in the Web
set words {
    about after all am an and are as at
    back bad be because been before being between big but by
    came can come could country
    day days debts did different do dollars down
    even every  first for found from
    general get go going good got great
    had has have he hello her here him his
    if in input interest into is it its  just  kick kind
    large last like little look
    made make man many may me more most must my
    new no not now  of off old on one only or other our out over
    people please power public
    said saw see shark she since so some still such system
    take test than that the their them then there these they this through
    time to today two  under up upon  very
    war was we well went were what when where which while who will
    with world would writing years you your
foreach word $words {set dic($word) ""}

#------------------ Callback for bindings
set buffer ""
set last ""
proc down {w key} {
    $w itemconfig key    -fill white
    $w itemconfig k-$key -fill yellow
    set ::buffer $key
    set ::last   $key
proc pass {w x y} {
    set id  [$w find closest $x $y]
    set key [string index [lindex [$w gettags $id] 0] end]
    if {$::last ne "" && $key ne $::last} {
        $w itemconfig k-$key -fill orange
        append ::buffer $key
        set    ::last   $key
proc up {w x y t} {
    set id    [$w find closest $x $y]
    set ::last ""
    $t insert insert "[join [lookup $::buffer] /] "
#----------------------- dictionary "logic" 
proc lookup string {
    switch [string length $string] {
        1       {set candidates $string}
        2       {set candidates [array names ::dic $string]}
        default {
            set first [string index $string 0]
            set last  [string index $string end]
            set candidates {}
            foreach word [array names ::dic $first*$last] {
                if [contained [dedup $word] $string] {
                    lappend candidates $word
    wm title . $::buffer/$candidates
    set candidates
proc contained {word string} {
    string match [join [split $word ""] *] $string
if 0 {

In testing, I found that duplicated letters (as n,s,e in Tennessee) are a bit difficult to write, so I added this de-duplicator that reduces duplicate letters to a single one, e.g. dedup Tennessee -> Tenese :

proc dedup string {
    set res  ""
    set last ""
    foreach char [split $string ""] {
        if {$char ne $last} {append res $char}
        set last $char
    set res

foreach op {+ - * /} {proc $op {a b} "expr {\$a$op\$b}"}
#------------------------ Demo UI
set   w [canvas .c]
pack $w -fill both -expand 1
set t [text $w.t -width 40 -height 5 -wrap word]
$w create window 0 0 -window $t -anchor nw
set x 30
set y 138
foreach key $keys {
    switch -- $key {
        / {set x 30; incr y 28}
        . {incr x 28}
        default {
            set id [$w create rect $x $y [+ $x 26] [+ $y 26] -fill white]
            $w itemconfigure $id -tag [list t-$key k-$key key]
            $w create text [+ $x 13] [+ $y 13] -text $key -tag t-$key
            $w bind t-$key <1>               [list down $w $key]
            $w bind t-$key <B1-Motion>       [list pass $w %x %y]
            $w bind t-$key <ButtonRelease-1> [list up   $w %x %y $t]
            incr x 28
wm geometry . 240x280+0+0
bind . <Up> {exec wish $argv0 &; exit}
bind . <F1> {console show}

if 0 {

After downloading this page via GPRS, the Shark runs pretty nicely on my HTC Magician too. Pascal reported that on another Windows/CE device, the recognition stopped after a few letters. - 2005-09-03 Added the dedup function, as well as "hello" and "world" to the dictionary :^)

LES "Illustrative toy" indeed. Too small a dictionary will fail to recognize many useful words. Too long a dictionary will plague you with false positives. Here is a sample of my attempts with a 47,000-entry dictionary (less than 10% of the entire English language), to save some bypassers' time:

 this:          this/toss
 is:            is
 interesting:   interesting/insetting/inning
 power:         poor/pour/purr
 power:         putter/power/peer/poor/pour/per/purr/potter/pouter
 advertising:   allotting/adversing/averting/advertising/along/atoning/aping/adopting/altering/adding/alerting/adhering/advising/apprising/adoring/arising/averring/adorning/adapting/addressing
 introspection: in/icon/iron/inspection/inion/inn/ion/intonation/intention/introspection
 programming:   panning/potting/pining/priming/porting/pang/prong/ping/pig/poring/posing/pinning/prosing 
 frustration:   fain/fern/fusion/fatten/flatten/fan/fen/fin/fun 
 prototype:     pope/pore
 theory:        toy/try/tty 
 never:         never/nor/noter 
 forever:       four/fleer/floor/flour/forever/flutter/freer/for/fur/footer/fluter/floorer/fetter/fever

Enjoy your patent, whoever you patent holder are. :-)

RS Thanks for stress-testing :) Your examples show that the word matching obviously can do with some refinements... Also, I've started to play with designing the key layout differently from IBM's model, but no results yet.

escargo This may be one of those cases where either a known word-frequency distribution would be helpful, or a heuristic of moving the chosen word to (or toward) the front of the list would allow a self-organizing list to make support more likely choices.

Zarutian Would it be possible to use a mouse gestureing method instead of lexicon?

What I would have said is: would it be possible to use the endpoint coordinates from mouse gesture instead of relying on lexicon? RS: As the target is to write words, you'd have to spell them letter by letter, as e.g. demonstrated in ReadME - a simple online character recognizer. But the SHARK project claims that one (maybe complex) gesture should be enough to write one word, and then of course you need a lexicon.

Zarutian I think you misunderstood me.

I am not proposing an OCR system like ReadME - a simple online character recognizer.

But a system which uses endpoints(vertices?) of the stylus-drawn curve to determine which letter buttons on the virtual keyboard to activate and in which order. <- image missing 1, 2, 3 are the vertexes in that order drawn.

LES on 2005 November 02: This concept [L2 ] seems to work better.

escargo 3 Nov 2005 - The relative resource requirements ought to be markedly different, given the dynamic nature of Dasher.