Another weekend fun project by [Richard Suchenwirth] - This is a toy or demo thing that is not really fit for real-life use, but still I had some fun with it - it's amazing how little code it takes!. Give it a text widget and possibly a tag (my "hilite" is -bg orange, which stands out pretty clearly; curly red underline seems not to possible in Tk), and it will march through the text contents and highlight all those words that don't match its expectations (i.e. not in dictionary or not resolvable by rules): proc text:spell {w {tag hilite}} { set lineno 1 $w tag remove $tag 1.0 end foreach line [split [$w get 1.0 end-1c] \n] { foreach {from to} [string:wordindexes $line] { set word [string range $line $from [expr $to-1]] if {![spell:ok $word]} { $w tag add $tag $lineno.$from $lineno.$to update idletasks } } incr lineno } } Known bug: embedded images count as one character, but are not seen by the ''$text get'' command, so they shift the highlighting to the right. The following helper produces a list of starting and ending indices of words (as defined by Tcl) in a string: proc string:wordindexes s { set i 0 set res {} foreach c [split $s ""] { ##DKF## Use {$c ne " " && $i eq [string wordstart $s $i]} ##DKF## as test from Tcl 8.4 onwards! It's faster and less buggy if {$c!=" " && $i==[string wordstart $s $i]} { lappend res $i [string wordend $s $i] } incr i } set res } Here comes the word checker, returning 1 or 0 depending on whether it accepts one word (replace by your own if you have a better one - I will sometime in the future experiment with a graph parser): proc spell:ok s { global word ;# Faster to create local alias if {[string length $s]<2} {return 1} if {![regexp {[A-Za-z]} $s]} {return 1} set s [string tolower $s] if {[info exists word($s)]} {return 1} foreach sfx {s ing ed es d} { if { [regexp ^(.+)$sfx$ $s -> stem] && [info exists word($stem)] && [lsearch $word($stem) $sfx] >= 0 } then { return 1 } } return 0 } The following two are for data preparation, they take a string with possible linebreaks (may be a whole text file), extract the words only, resp. do a frequency count: proc string:words s { set res {} foreach line [split $s \n] { for {set i 0} {$i<[string length $line]} {incr i} { if {$i==[string wordstart $line $i]} { set w [string range $line $i [expr {[string wordend $line $i]-1}]] if {$w!=" "} {lappend res $w} incr i [expr {[string length $w]-1}];# always loop incr } } } set res } proc words:count s { foreach i [string tolower [string:words $s]] { if {[string length $i]>1} { if {[info exists a($i)]} { incr a($i) } else { set a($i) 1 } } } set t {} foreach {i n} [array get a] {lappend t [list $i $n]} ##DKF## Efficient in 8.4, not crippling before return [lsort -integer -decreasing -index 1 $t] } And here finally comes the "dictionary" (pretty poor yet, fits on less than a page). It does a crude subcategorization based on possible endings (the value of the array entries), so more words are matched: ########## load dictionary, distinguish suffix distributions ##### foreach i { about above after all already also always am an another any and are as at be been before below between body both but by child children could data different does doesn during each either empty found for from fully given got happy happily has have high his how however if in including into is isn it just later legal low may maybe more must never next no none not of on onto only or over perhaps same should since slow so some such tcl than that the their them then there these they this those three to too two under unless us using was we were what whatever when where whether which while who whom whose why with within would you zero automatic automatically } {set ::word($i) ""} foreach i { add accept allow append approach argument back book brief buffer button call check clear click color command consist contain convert count counter destroy display down end except exist export fill follow form import intend key last link list load look mark need number open order overview pair perform pick point position print reason represent return screen script second select shift show spell start style support test treat unit view want word work } {set ::word($i) "s ing ed"} foreach i { bind break do field find mean read see will window } {set ::word($i) "s ing"} foreach i { access focus index match search } {set ::word($i) "es ing ed"} foreach i { actual additional complete current definite direct exact frequent general immediate normal occasional optional previous proper quick recent silent symbolical total } {set ::word($i) "ly"} foreach i { action application area bar bottom can case center come context character computer content control current database effect element error even event example first font forget format friend get give global handler height her image information input it item left let make menu mouse new nothing one operation option other output package pattern procedure program real red refer region reset resolution right selection set simple single space special standard step stop string system table tag take text top up variable white widget width write your } {set ::word($i) "s"} foreach i { abbreviate associate change code coordinate create date declare define delete describe determine double execute file force generate ignore include indicate line like name note outline page remove rule size state terminate time type use value } {set ::word($i) "s d"} ---- ''DKF:'' Modified to run faster. :^) ---- LV: Any of you familar enough with the Wikit code to figure how to add this code so that after one edits a page, there could be a button for spell-checking the page, with the possible misspelled words highlighted in some manner? ---- RS: Before building this into the Wiki, remember I said this is a toy project. The problem is the dictionary, which has to be very much more comprehensive than the one above - otherwise you'll get so many false positives that it doesn't help much. So we need * data (10,000s of frequent English words) * an efficient access method (the one above will get slow with much data, because of the many regexps) Highlighting via http will also be very different, though not difficult: retransmit the received form contents, with dubious words braced in ... (Does HTML allow markup in a form?) DKF: No, you can't provide HTML markup in a form. Delivering this sort of functionality would require an applet of some form (either Java or Tcl.) Also, the ''ispell'' english dictionary is 300kB long (after I extract it from its storage format) with over 33 thousand words, omits many common prefixes and suffixes, and I still use a lot of words which it doesn't know about. I tempted to say that instead of writing our own spelling checker, we should just wrap up ispell instead... :^) FYI, the shell command I used to extract it was: strings /usr/common/lib/ispell/britishmed+.hash | sed '/[^A-Z]/d' | tr A-Z a-z | sort ---- RS: Could not find ispell in our Solaris or Linux boxes, but the old spell with a flat ASCII wordlist of 25143. Not bad. I'd only prefer a pure-Tcl solution, since my W95 box at home misses so many goodies... ---- [AK]: See http://freshmeat.net/appindex/console/text%20utilities.html for several spellcheckers, especially '''pspell''' [http://pspell.sourceforge.net/], the portable spell checker interface library. Contains an ispell module, appears to handle UTF-8. ---- [LV]: the aspell/pspell project has several word lists, as does the fsf.org people. So coming up with a word list isn't the problem. However, perhaps embedding such large word lists into a Wikit would be counter-productive... ---- NEM: Couldn't we write some Tcl scripts to trawl on-line dictionaries writing data to a MetaKit or other Tcl database? Some intelligent language parsing could pick out endings etc, and create a nice database. Could take a while to work tho - all those HTTP requests.... ---- Instead of embedding the dictionary in the Wikit, we could create a [metakit] database in a tequila [http://www.equi4.com/metakit/wiki.cgi/19.html] server...