if 0 {Richard Suchenwirth 2005-09-02 - SHARK is for Shorthand Aided Rapid Keyboarding, also known as Shapewriter, a research project by IBM Almaden Laboratory [L1 ] . Here's an experiment to play with this concept in Tcl. Check the IBM site whether this concept is patented (it probably is), so best use this only as an illustrative toy.
The idea is that you do a pen (or mouse, with button 1 down) movement on the virtual keyboard, from the first to the last letter. The sequence of "visited" keys is matched against a dictionary according to the rule: if the first and the last letter match, and each character of a word is contained in the sequence, then that word is considered a candidate. Words not in the dictionary can be typed letter by letter (though you get blanks between them).
"Sharked" words are inserted into the text widget on top (alternatives separated by slashes - I was too lazy to add popup menu selection. For the last word, you also see the input string and the result in the window's title bar.
AK Regarding popup-menus, they are not really required. We have a Tk text widget. We can 'tag' the alternatives and arrange for a click on an alternative to select it as the text, it would also removes the other alternatives in its group from the display.
} package require Tk set keys { b d k g / c a n i m q / f l e s y x / j h t o p v / . . r u w z } #-- frequent English words, borrowed from somewhere in the Web set words { about after all am an and are as at back bad be because been before being between big but by came can come could country day days debts did different do dollars down even every first for found from general get go going good got great had has have he hello her here him his if in input interest into is it its just kick kind large last like little look made make man many may me more most must my new no not now of off old on one only or other our out over people please power public said saw see shark she since so some still such system take test than that the their them then there these they this through time to today two under up upon very war was we well went were what when where which while who will with world would writing years you your } foreach word $words {set dic($word) ""} #------------------ Callback for bindings set buffer "" set last "" proc down {w key} { $w itemconfig key -fill white $w itemconfig k-$key -fill yellow set ::buffer $key set ::last $key } proc pass {w x y} { set id [$w find closest $x $y] set key [string index [lindex [$w gettags $id] 0] end] if {$::last ne "" && $key ne $::last} { $w itemconfig k-$key -fill orange append ::buffer $key set ::last $key } } proc up {w x y t} { set id [$w find closest $x $y] set ::last "" $t insert insert "[join [lookup $::buffer] /] " } #----------------------- dictionary "logic" proc lookup string { switch [string length $string] { 1 {set candidates $string} 2 {set candidates [array names ::dic $string]} default { set first [string index $string 0] set last [string index $string end] set candidates {} foreach word [array names ::dic $first*$last] { if [contained [dedup $word] $string] { lappend candidates $word } } } } wm title . $::buffer/$candidates set candidates } proc contained {word string} { string match [join [split $word ""] *] $string } if 0 {
In testing, I found that duplicated letters (as n,s,e in Tennessee) are a bit difficult to write, so I added this de-duplicator that reduces duplicate letters to a single one, e.g. dedup Tennessee -> Tenese :
} proc dedup string { set res "" set last "" foreach char [split $string ""] { if {$char ne $last} {append res $char} set last $char } set res } foreach op {+ - * /} {proc $op {a b} "expr {\$a$op\$b}"} #------------------------ Demo UI set w [canvas .c] pack $w -fill both -expand 1 set t [text $w.t -width 40 -height 5 -wrap word] $w create window 0 0 -window $t -anchor nw set x 30 set y 138 foreach key $keys { switch -- $key { / {set x 30; incr y 28} . {incr x 28} default { set id [$w create rect $x $y [+ $x 26] [+ $y 26] -fill white] $w itemconfigure $id -tag [list t-$key k-$key key] $w create text [+ $x 13] [+ $y 13] -text $key -tag t-$key $w bind t-$key <1> [list down $w $key] $w bind t-$key <B1-Motion> [list pass $w %x %y] $w bind t-$key <ButtonRelease-1> [list up $w %x %y $t] incr x 28 } } } wm geometry . 240x280+0+0 bind . <Up> {exec wish $argv0 &; exit} bind . <F1> {console show} if 0 {
After downloading this page via GPRS, the Shark runs pretty nicely on my HTC Magician too. Pascal reported that on another Windows/CE device, the recognition stopped after a few letters. - 2005-09-03 Added the dedup function, as well as "hello" and "world" to the dictionary :^)
LES "Illustrative toy" indeed. Too small a dictionary will fail to recognize many useful words. Too long a dictionary will plague you with false positives. Here is a sample of my attempts with a 47,000-entry dictionary (less than 10% of the entire English language), to save some bypassers' time:
ATTEMPT RESULT this: this/toss is: is interesting: interesting/insetting/inning power: poor/pour/purr power: putter/power/peer/poor/pour/per/purr/potter/pouter advertising: allotting/adversing/averting/advertising/along/atoning/aping/adopting/altering/adding/alerting/adhering/advising/apprising/adoring/arising/averring/adorning/adapting/addressing introspection: in/icon/iron/inspection/inion/inn/ion/intonation/intention/introspection programming: panning/potting/pining/priming/porting/pang/prong/ping/pig/poring/posing/pinning/prosing frustration: fain/fern/fusion/fatten/flatten/fan/fen/fin/fun prototype: pope/pore theory: toy/try/tty never: never/nor/noter forever: four/fleer/floor/flour/forever/flutter/freer/for/fur/footer/fluter/floorer/fetter/fever
Enjoy your patent, whoever you patent holder are. :-)
RS Thanks for stress-testing :) Your examples show that the word matching obviously can do with some refinements... Also, I've started to play with designing the key layout differently from IBM's model, but no results yet.
escargo This may be one of those cases where either a known word-frequency distribution would be helpful, or a heuristic of moving the chosen word to (or toward) the front of the list would allow a self-organizing list to make support more likely choices.
Zarutian Would it be possible to use a mouse gestureing method instead of lexicon?
What I would have said is: would it be possible to use the endpoint coordinates from mouse gesture instead of relying on lexicon? RS: As the target is to write words, you'd have to spell them letter by letter, as e.g. demonstrated in ReadME - a simple online character recognizer. But the SHARK project claims that one (maybe complex) gesture should be enough to write one word, and then of course you need a lexicon.
Zarutian I think you misunderstood me.
I am not proposing an OCR system like ReadME - a simple online character recognizer.
But a system which uses endpoints(vertices?) of the stylus-drawn curve to determine which letter buttons on the virtual keyboard to activate and in which order.
http://zarutian.cjb.net/2005/myndir/atomik-shark-vertexes-example.png <- image missing 1, 2, 3 are the vertexes in that order drawn.
LES on 2005 November 02: This concept [L2 ] seems to work better.
escargo 3 Nov 2005 - The relative resource requirements ought to be markedly different, given the dynamic nature of Dasher.