AutoCorrect

A Microsoft Word-like AutoCorrect feature implementation.

Luciano ES 2003-06: Here is a story you don't hear every day. You know that AutoCorrect feature in MS Word? You type 'tihs' and Word replaces it with 'this'. I really love that feature. Not so much to correct my typos, but as a shorthand tool. I type 'ill go,w or wo u' and it automatically expands to 'I'll go, with or without you'. The part of the story that you don't hear every day is that I am so addicted to it and have used it for so long that I've built up a 13,000-entry list. Writing in Word, I shorthand all the time.

So I tried to implement it in a Tk app. I load the entire list from a text file (but I've also done that with an SQLite database) into a Tcl array at startup, and every time I hit space or punctuation, the app searches for the last "word" just typed in the array, deletes that word and prints its counterpart. The following features have been implemented, assuming you have 'sh = shorthand' in your Auto Correct list:

1) Simple auto correction: 'sh' becomes 'shorthand'.

2) Auto correction in upper case: 'SH' becomes 'SHORTHAND'.

3) Capitalized auto correction: 'Sh' becomes 'Shorthand'.

4) Capitalization of words in the beginning of lines or sentences (after a period).

5) Space is inserted automatically after certain punctuation: period, comma, colon, semicolon, question mark and exclamation mark. Can be annoying if you're not used to it. Can make your typing faster if you get used to it. :-)

I still need to implement some method to undo the auto correction. Ctrl+Z will not yield the expected(?) result. Apart from that, it is a perfect Auto Correct feature, ready to be implemented in any Tcl/Tk-based text editor.

Thanks to Michael A. Cleverly for very useful hints.

HOW TO USE IT: This proc assumes two variables: $argWidget (the text widget) and $argKey (the key that triggers the proc, usually space or punctuation). Supposing you have this text widget: $::w.textframe.text:

foreach i {
    space period comma colon semicolon question exclam asterisk 
    slash backslash less greater equal plus minus parenleft parenright 
    bracketleft bracketright braceleft braceright quotedbl quoteright} {  

    bind $::w.textframe.text <Key-$i> {autocorrect  %W  %K}
}

Note: these bindings were obtained in Linux. They may vary in other platforms.

You could have a database with two columns: 'type' and 'replace', and load them into an array called 'AC_ARRAY':

set myQuery {select type,replace from autocorrect}
sq eval $myQuery {}     { array set ::AC_ARRAY [ list $type $replace ] }

But I just use a plain text file, it's easier. I use the best format possible:

abbrev=abbreviation
smtg=something
monday=Monday

In contrast, Vi(m) requires an explicit command in each damn single line.

You'll need a proc to parse this beautifully simple file and load the auto correction pairs:

proc LoadAutoCorrect {} {
    array unset ::AC_ARRAY
    set _fp [open /path/tofile/autocorrect.txt r]
        while {-1 != [gets $_fp _line]} { 
            if {[regexp {([^=]+)=([^=]+)} $_line => _a _b]} {
                array set ::AC_ARRAY [list $_a $_b]
            } else { 
                puts "ERROR: $_line"
                puts {Fix your autocorrection file}
                return
            }
        }
    close $_fp
}

Now here comes the autocorrect proc:

proc  p.autocorrect {argWidget argKey} {
    # get the 40 last characters every time you type, like a "trail"
    set _trail [$argWidget get "insert -40c" insert]

    # kill newlines
    set _trail [lindex [split $_trail \n] end]

    # _lastWordRegex is a regular expression that finds words (no punctuation).
    # Its value may change along the script, i.e. it may become repeated.
    set _lastWordRegex  {[^,.;: ]+}

    # The loop. Here is what this loop does:
    # Get the last word in the "trail". If the last word ('typeString' regex) is found,
    # replace it with the 'replace' counterpart. If it is not found, get the TWO
    # last words and search for the two-word string in the array. If it is not found,
    # get the THREE last words and search again. It can go on forever, but I
    # set the limit to 4 words. More than that is very little likely to be used and
    # might make everything run unnecessarily slow (although I had it set at '7' for years).
    # If it is not clear, the purpose of having multi-word 'type' strings it to
    # type, say "south america" and have it corrected to "South America". So most
    # people should probably just go with 2 or 3.

    for {set _c 1} {$_c <= 4} {incr _c}{
        # these variables need to exist beforehand AND be reset in each iteration
        variable  _lastWord {}  _lastWordLower {}  _autoCorrected {}

        # locate _lastWord, then sanitize it 
        regexp -line ($_lastWordRegex)\$ $_trail => _lastWord
        set _lastWord [string map  {\\ \\\\}  $_lastWord]
        if {[ string index $_lastWord end] eq "?"}  return
        if {[ string index $_lastWord end] eq "*"}  return

        # if _lastWord is not found in ACLIST, abort
        # if it is, define _autoCorrected
        set _lastWordLower [string tolower $_lastWord]
        if {[array get ::ACLIST $_lastWordLower] eq {}} { 
            set _lastWordRegex "$_lastWordRegex \[^ \]+"
            continue 
        } else {set _autoCorrected "$::ACLIST($_lastWordLower)"}

        # check casing of _lastWord. Change _autoCorrected to UPPER or Cap if needed.
        if {[string is upper $_lastWord]} { 
            set _autoCorrected [string toupper $_autoCorrected]
        } else { 
            if {[string is upper [string index $_lastWord 0]] \
                    &&  [string is lower [string index $_lastWord  1]]} {

                    set  _autoCorrected  [ string totitle  $_autoCorrected ]
            }
        }
                
        # THIS IS WHERE THE WORD IS AUTO CORRECTED
        set _lastWordWipeSize [string length $_lastWord]
        $argWidget delete "insert -$_lastWordWipeSize c" insert
        $argWidget insert insert $_autoCorrected
        set _lastWord {} 
        
        # If the 'lastWord' string is found, that's enough, so break the loop
        break

        # But if our single-word 'type' string is not found in the array, now what? 
        # The loop will be run again, of course. This time, let's look for
        # [^,.;: ]+ [^,.;: ]+
        # i.e. the last two words. If it's not found either, the next iteration
        # will look for [^,.;: ]+ [^,.;: ]+ [^,.;: ]+  etc.
        set  _lastWordRegex  "$_lastWordRegex \[^ \]+"
        
        # if the 4 last words don't match anything, we stop searching, of course
    }

    # OPTIONAL: automatic space after certain punctuation
    set _stops {period comma colon semicolon question exclam}
    if {[lsearch -exact $_stops $argKey] >= 0} { 
        after idle [list $argWidget insert insert { }]
    }

    # OPTIONAL: automatic capitalization of words in beginning of lines and sentences.
    set _trailx [$argWidget get "insert -40c" insert]
    set _trailx [lindex [split $_trailx \n] end]
    if {![regexp -line  {(^|[.!?] )([^,.;:!? ]+)$} $_trailx => _null _lastWordx]} {
        return
    }

    set _lastWordx [string map {\\ \\\\} $_lastWordx]
    if {[string index $_lastWordx end] eq "?"}  return
    if {[string index $_lastWordx end] eq "*"}  return

    set _lastWordWipeSizex [string length $_lastWordx]
    $argWidget delete "insert -$_lastWordWipeSizex c" insert
    $argWidget insert insert [string totitle $_lastWordx]
    set _lastWordx {}

    # THE END
}

Discussion

anonymous coward: Just a guess, but I believe that if you slightly alter your insertion method to use event generate it will update the undo / redo queue; albeit on a character by character basis. Your other choice is to implement your own undo/redo stack using an alternate keybinding each time you perform a replacement.

Either way it should be fairly easy.

LV I use a similar feature in vi - I have a variety of strings set up to remap to correct spellings, or URLs, or whatever.

The one thing that has to be available, though, is a way to tell the process 'stop that - leave what I typed alone'. Sometimes vi makes it a bit cumbersome - I have to type a few letters, stop, backup and then type the rest, etc.

LES: What bugs me about Vi is that it takes almost 10 seconds to start if I let it load my 13,000-entry abbrev list. This Tcl implementation, in contrast, loads instantly both in Tkabber (see below) and in the text editor I have been working on.

On 2005-01-20, LES finally surrenders to Tkabber. But not before implementing auto correction. Here is the recipe:

1) Download Pat Thoyts' version of Tkabber from [L1 ].

2) Follow the instructions provided at [L2 ] and unwrap the Starkit.

3) Download this autocorrect list [L3 ] and save it somewhere.

4) Open /tkabber.vfs/main.tcl and add these two procs:

proc loadAC  {}  {
    array unset  ::ACLIST
    set _fp [open path/to/autocorrect.txt r]
         while  {-1 != [gets $_fp _line]}  { 
             if  {[regexp  {([^=]+)=([^=]+)}  $_line => _a _b]}  { 
                 array set ::ACLIST [list $_a $_b]
             } else  { 
                 tk_messageBox -title Error! -message $_line
                 tk_messageBox -title Error! -message "Fix your autocorrection file" 
                 return
             }
         }
    close $_fp
}

proc autocorrect argW  {
     set _trail [$argW get 1.0 insert]
     set _typeString {[^"'(\[ ]+}

     for {set _c 1} {$_c <= 5} {incr _c} {
         set _lastWord {} 
         regexp -line  ($_typeString)\$ $_trail => _lastWord

         if {[string index $_lastWord end] eq "?"} return
         set _lastWordWipeSize [string length $_lastWord]
         if  {[array get ::ACLIST $_lastWord] ne {}}   {
             $argW delete "insert -$_lastWordWipeSize c" insert
             $argW insert insert "$::ACLIST($_lastWord)" 
             break
         }
         if  {[array get ::ACLIST [string tolow $_lastWord] ] ne ""}  {
             $argW delete "insert -$_lastWordWipeSize c" insert
             $argW insert insert [
                string toupper $::ACLIST([string tolower $_lastWord])] 
             break
         }
         set _typeString  "$_typeString \[^ \]+"
     }
}

5) Now open the file /tkabber.vfs/tkabber/chats.tcl and look for the first bind $cw.input line and add these lines:

loadAC

foreach i  { space  .  ,  :  ;  ?  !  \"  '  =  (  )  [  ] }   { 
    bind $cw.input <Key-$i> {autocorrect %W}
}
bind $cw.input <Control_L><F11> {loadAC}

6) Now the following characters trigger the automatic expansion of the aliases: space . , : ; ? ! " ' = ( ) [ ]

7) Whenever you change the autocorrect.txt file, press Ctrl+F11 and the file/list will be reloaded.

8) Note that these instructions for Tkabber are based on a very old and very buggy version of my Auto Correct proc. You might want to change it and use the newest version, the one that is given at the beginning of this page.

AutoCorrect

Discussion

See Also