Version 3 of AutoCorrect

Updated 2003-06-01 22:37:28

A MS Word-like AutoCorrect feature implementation.

Here is a story you don't hear every day. You know that AutoCorrect feature in MS Word? You type 'tihs' and Word replaces it with 'this'. I really love that feature. Not so much to correct my typos, but as a shorthand tool. I type 'ill go,w or wo u' and it automatically expands to 'I'll go, with or without you'. The part of the story that you don't hear every day is that I am so addicted to it and have used it for so long that I've built up a 10,000-entry list. Writing in Word, I shorthand all the time.

So I tried to implement it in a Tk app. I load the entire list from a SQLite database into a Tcl array at startup, and every time I hit space or punctuation, the app searches for the last "word" just typed in the array, deletes that word and prints its counterpart.

I still need to implement automatic capital letter in the beginning of sentences and some method to undo the auto correction. Ctrl+Z will not yield the expected(?) result. Apart from that, it is a perfect Auto Correct feature, ready to be implemented in any Tcl/Tk-based text editor.

Thanks to Michael A. Cleverly for very useful hints.

 # First, you must have this text widget: $w.textframe.texto1

 # Now, the binding. We want to launch 'AutoCorrect' every time we hit the space
 # and/or punctuation keys. These four lines will launch the function whenever
 # the last key pressed (%K) is found in the 'myACkeys' list:

 set myACkeys        {space period comma colon semicolon question exclam slash backslash less greater equal asterisk plus minus parenleft parenright bracketleft bracketright braceleft braceright quotedbl quoteright}
 foreach key $myACkeys        {
         bind $w.textframe.texto1 <$key> { autocorrect }
 }
 # Note: these bindings were obtained in Windows. They may vary in other platforms.

 # I have a database with two columns: 'type' and 'replace'. Let's load them into
 # an array called 'myAClist'
         set myQuery {select type,replace from autocorrect}
         sq eval $myQuery {}        { array set myAClist [ list $type $replace ] }

 # Now the autocorrect proc

 proc        autocorrect        {}        {
 global w myAClist

 # get the 40 last characters every time you type, like a "trail"
 # 40 is, of course, an arbitrary limit
         set myTrail [ $w.textframe.texto1 get "insert -40c" insert ]
 # myTypeString is a regular expression to get the last word
 # its value may change along the script
         set myTypeString {[^,.;: ]+}


 # The loop. Here is what this loop does:
 # Get the last word in the "trail". If the last word ('type' string) is found,
 # replace it with the 'replace' counterpart. If it is not found, get the TWO
 # last words and search for the two-word string in the array. If it is not found,
 # get the THREE last words and search again. It can go on forever, but I 
 # set the limit to 10 words. More than that is very little likely to be used and
 # might make everything run too slow. My actual application uses only 7.
         for        { set myIteration 1 } { $myIteration <= 10 } { incr myIteration }        {
                 regexp -line "($myTypeString)\$" $myTrail => myLastWord
                 set myLastWordWipeSize [ string length $myLastWord ]
 # Note that at this time, the RE is [^,.;: ]+

                 if        { [ array get myAClist $myLastWord ] != "" }                {
                         $w.textframe.texto1 delete "insert -$myLastWordWipeSize c" insert
                         $w.textframe.texto1 insert insert "$myAClist($myLastWord)" 
 # If the 'type' string is found, that's enough, so break the loop
                         break
                 }

 # What if what I just typed is a 'type' string, but I typed it in CAPITALS? It won't
 # be found in the array. Not unless we repeat the previous operation, but slightly
 # different:
                 if        { [ array get myAClist [ string tolower $myLastWord ] ]  != "" }                {
                         $w.textframe.texto1 delete "insert -$myLastWordWipeSize c" insert
                         $w.textframe.texto1 insert insert [ string toupper $myAClist([ string tolower $myLastWord ]) ] 
                         break
                 }
 # Good. Now, if we type 'TIHS', it will be replaced with 'THIS'.

 # Our single-word 'type' string is not found in the array. Now what? The loop
 # will be run again, of course. This time, let's look for [^,.;: ]+ [^,.;: ]+ 
 # i.e. the last two words. If it's not found either, the next iteration will look for 
 # [^,.;: ]+ [^,.;: ]+ [^,.;: ]+
                 set myTypeString "$myTypeString $myTypeString"
         }
 }

One final comment: isn't it great to read a script whose variables' names make it dead clear what they represent, instead of foreach i $a { set c "lindex z end is not $i, but could be $b" } If you're too lazy to type full names, heck, go write in Perl

In case you're wondering, the 'my' prefix give the variables the right color in my syntax highlighting scheme, even if they do not have $

Luciano Espirito Santo Santos - SP - Brasil

---