Version 13 of Anagrams

Updated 2006-01-18 19:59:33

With a given word list (such as from the 12-dicts [L1 ] project) formatted as a Tcl procedure that returns a list [L2 ], it is interesting to speculate on the fastest procedure for obtaining anagrams of a given set of letters.

The following procedure is my effort. It has some refinements: (1) it takes an optional template, as for string match and (2) it allows for a wildcard character (such as the blank in Scabble or the joker in card games), written as "_".

Code:

 proc anagram {style letters {template *}} {
  set anagrams [list]
  set usingTemplate [string match "template" $style]
  foreach word [wordList] {
    if {$usingTemplate} {
      if {![string match $template $word]} {continue}
    }
    set discard $letters
    set flag true
    for {set i 0} {$i<[string length $word]} {incr i} {
      set match [string first [string index $word $i] $discard] 
      if {$match>-1} {
        set discard [string replace $discard $match $match]
      } else {
        #-- We don't hold the char; but we may have a blank
        set match [string first _ $discard]
        if {$match>-1} {
          #-- We do have a blank, so we use that instead
          set discard [string replace $discard $match $match]
        } else {
          set flag false
          break
        }
      }
    }
    if {$flag} {
      lappend anagrams $word
    }
  }
  return $anagrams
 }

 # Examples: 
 catch {console show}
 puts [anagram anyof RETAINS]
 puts [anagram template RETAINS *A]
 puts [anagram template TO_UE ?????]

Alastair Davies - 17 January 2006


HJG I get the error 'Invalid command name "wordList"'. Maybe a Tcl 8.5 feature ? - RS No - looks like a function that reads the word list file, and returns them as a list. Left as an exercise for the reader :^)

Alastair responds: HJG and others can download [L3 ] the source code that defines "wordList". The start and end of this look like...

 proc wordList {{lang en}} {
  switch $lang {
    en {
      return {
        AARDVARK
        AARDVARKS
        ABACI
        ABACK
        ABACUS

        ZYGOTE
        ZYGOTES
        ZYGOTIC
        ZYMURGY
      }
    }
  }
 }

...but the whole file is 1359 kb. English only at present, but does anyone have a similar French or German word list that could be added to the source?


KPV By far the fastest way to do this is to preprocess the word list and create what's known as an anagram dictionary[L4 ]. Once that's done, figuring out anagrams takes O(1) time.

Over lunchtime sandwiches, Alastair created an anagram dictionary:

 set anagramDict [dict create]
 foreach word [wordList] {
   dict lappend anagramDict [join [lsort [split $word ""]] ""] $word
 }

In anagramDict, the key "AEINRST", for example, lists the words "ANTSIER", "NASTIER", "RETAINS", "RETINAS" and "RETSINA", and it takes (approximately) no time at all to access them. Thanks, KPV! Incidentally, there are 69894 keys in the dictionary and 75273 words in the original list.

Thinking of anagrams in the looser sense (using any, rather than all, of the letters), I came up with this recursive procedure to reduce a word letter by letter to find all possible keys:

 proc reduce {word} {
  lappend reduction $word
  if {[string length $word]>2} {
    for {set i 0} {$i<[string length $word]} {incr i} {
      set reduction [concat $reduction [reduce [string replace $word $i $i]]]
    }
  }
  return [lsort -uniq $reduction]
 }

These keys can be used to search the anagramDict (as defined above):

 proc anagram {word} {
  global anagramDict
  set word [join [lsort [split $word ""]] ""]
  set result [list]
  foreach key [reduce $word] {
    if {[dict exists $anagramDict $key]} {
      set result [concat $result [dict get $anagramDict $key]]
    }
  }  
  return $result
 }

KPV Speaking of anagram dictionaries, there are a couple of interesting questions you can ask. See if you answer them before looking at the dictionary.

  • What's the first entry after 'a' and 'aa'?
  • What's the last entry?
  • What's the longest entry that is itself a word?

Category Toys