Version 17 of AutoCorrect

Updated 2003-06-01 23:17:04

An MS Word-like AutoCorrect feature implementation.

Here is a story you don't hear every day. You know that AutoCorrect feature in MS Word? You type 'tihs' and Word replaces it with 'this'. I really love that feature. Not so much to correct my typos, but as a shorthand tool. I type 'ill go,w or wo u' and it automatically expands to 'I'll go, with or without you'. The part of the story that you don't hear every day is that I am so addicted to it and have used it for so long that I've built up a 10,000-entry list. Writing in Word, I shorthand all the time.

So I tried to implement it in a Tk app. I load the entire list from a SQLite database into a Tcl array at startup, and every time I hit space or punctuation, the app searches for the last "word" just typed in the array, deletes that word and prints its counterpart.

I still need to implement automatic capital letter in the beginning of sentences and some method to undo the auto correction. Ctrl+Z will not yield the expected(?) result. Apart from that, it is a perfect Auto Correct feature, ready to be implemented in any Tcl/Tk-based text editor.

Note: this looks messy online, but should look perfect if you paste it in your favorite text editor.

Thanks to Michael A. Cleverly for very useful hints.

 # First, you must have this text widget: $w.textframe.texto1

 # Now, the binding. We want to launch 'AutoCorrect' every time we hit the space
 # and/or punctuation keys. These four lines will launch the function whenever
 # the last key pressed (%K) is found in the 'myACkeys' list:

 set myACkeys        {space period comma colon semicolon question exclam slash backslash less greater equal asterisk plus minus parenleft parenright bracketleft bracketright braceleft braceright quotedbl quoteright}
 foreach key $myACkeys        {
         bind $w.textframe.texto1 <$key> { autocorrect }
 }

 # Note: these bindings were obtained in Windows. They may vary in other platforms.

 # I have a database with two columns: 'type' and 'replace'. Let's load them into
 # an array called 'myAClist'

         set myQuery {select type,replace from autocorrect}
         sq eval $myQuery {}        { array set myAClist [ list $type $replace ] }

 # Now the autocorrect proc

 proc        autocorrect        {}        {
 global w myAClist

 # get the 40 last characters every time you type, like a "trail"
 # 40 is, of course, an arbitrary limit
         set myTrail [ $w.textframe.texto1 get "insert -40c" insert ]
 # myTypeString is a regular expression to get the last word
 # its value may change along the script
         set myTypeString {[^,.;: ]+}


 # The loop. Here is what this loop does:
 # Get the last word in the "trail". If the last word ('type' string) is found,
 # replace it with the 'replace' counterpart. If it is not found, get the TWO
 # last words and search for the two-word string in the array. If it is not found,
 # get the THREE last words and search again. It can go on forever, but I 
 # set the limit to 10 words. More than that is very little likely to be used and
 # might make everything run too slow. My actual application uses only 7.
 # If it is not clear, the purpose of having multi-word 'type' strings it to 
 # type, say "south america" and have it corrected to "South America".
         for        { set myIteration 1 } { $myIteration <= 10 } { incr myIteration }        {
                 regexp -line "($myTypeString)\$" $myTrail => myLastWord
                 set myLastWordWipeSize [ string length $myLastWord ]
 # Note that at this time, the RE $myTypeString is [^,.;: ]+

                 if        { [ array get myAClist $myLastWord ] != "" }                {
                         $w.textframe.texto1 delete "insert -$myLastWordWipeSize c" insert
                         $w.textframe.texto1 insert insert "$myAClist($myLastWord)" 
 # If the 'type' string is found, that's enough, so break the loop
                         break
                 }

 # What if what I just typed is a 'type' string, but I typed it in CAPITALS? It won't
 # be found in the array. Not unless we repeat the previous operation, but slightly
 # different:
                 if        { [ array get myAClist [ string tolower $myLastWord ] ]  != "" }                {
                         $w.textframe.texto1 delete "insert -$myLastWordWipeSize c" insert
                         $w.textframe.texto1 insert insert [ string toupper $myAClist([ string tolower $myLastWord ]) ] 
                         break
                 }
 # Good. Now, if we type 'TIHS', it will be replaced with 'THIS'.

 # Our single-word 'type' string is not found in the array. Now what? The loop
 # will be run again, of course. This time, let's look for [^,.;: ]+ [^,.;: ]+ 
 # i.e. the last two words. If it's not found either, the next iteration will look for 
 # [^,.;: ]+ [^,.;: ]+ [^,.;: ]+  etc.
                 set myTypeString "$myTypeString $myTypeString"
         }
 }
 # if the 10 last words don't match anything, we stop searching, of course

One final comment: isn't it great to read a script whose variables' names make it dead clear what they represent, instead of foreach i $a { set c "[ lindex z end ] is not $i, but could be $b" }

In case you're wondering, the 'my' prefix gives the variables the right color in my syntax highlighting scheme, even if they do not have $

Luciano ES