Purpose: a pattern matching algorithm to how close two strings ''sound'' similar to one another. ---- DKF: This code has been greatly tightened and should be clearer too. Some idioms work better in Tcl than they do in other languages, so transcribing an algorithm from C is not always straight-forward... ## Be nice and friendly with namespaces namespace eval ::soundex {namespace export soundex} ## Set up some static data only once array set ::soundex::soundexcode { a 0 b 1 c 2 d 3 e 0 f 1 g 2 h 0 i 0 j 2 k 2 l 4 m 5 n 5 o 0 p 1 q 2 r 6 s 2 t 3 u 0 v 1 w 0 x 2 y 0 z 2 } proc ::soundex::soundex {string} { variable soundexcode ## force lowercase and strip out all non-alphabetic characters regsub -all {[^a-z]} [string tolower $string] {} letters ## the null string is code Z000 if {![string length $letters]} { return Z000 } set last -1 set key {} ## scan until end of string or until the key is built foreach char [split $letters {}] { set code $soundexcode($char) ## Fold together adjacent letters with the same code if {$last != $code} { set last $code ## Ignore code==0 letters except as separators if {$last} { append key $last ## Only need the first four if {[string length $key] >= 4} {break} } } } ## normalise by adding zeros to get four characters string range "[string toupper $key]0000" 0 3 } DKF: Are soundex codes all numeric except for the one for the empty string? Or should the append really be an append of ''$char'' instead? LV: There are some alternative algorithms to soundex that attempt to achive similar functionality. I recall seeing several coded for Perl. The benefit was that they achieved varying degrees of better matches. Anyone familar with alternatives? PSE: OK. I'm working on a pure Tcl metaphone engine. ---- [Category Command]