From Wikipedia, the free encyclopedia In computer science, a ''trie'' [http://en.wikipedia.org/wiki/Trie], or ''prefix tree'', is an ordered tree data structure that is used to store an associative array where the keys are usually strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of any one node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that happen to correspond to keys of interest. ---- [NaviServer] uses a ''trie'' for url dispatching: * http://naviserver.cvs.sourceforge.net/naviserver/naviserver/nsd/urlspace.c?view=markup ---- [NEM] 2008-06-09: Here's a very simplistic ''trie'' implementation based on straight-forward use of nested [dict]s (typically a trie in [C] or [Java] would instead using a fixed-size array and indexing directly based on character (e.g. restricting words to contain only characters a-z)): ====== # trie.tcl -- # # Simple implementation of tries in Tcl. # package require Tcl 8.5 package provide trie 0.3 namespace eval ::trie { namespace export {[a-z]*} namespace ensemble create # create an empty trie proc create {} { dict create } # add a word to a trie contained in trieVar proc add {trieVar word} { upvar 1 $trieVar trie dict set trie {*}[split $word ""] END {} } # check if a given word is contained in a trie proc contains {trie word} { dict exists $trie {*}[split $word ""] END } # get the sub-trie of all words corresponding to a given prefix proc get {trie {prefix ""}} { if {$prefix eq ""} { return $trie } if {![dict exists $trie {*}[split $prefix ""]]} { return {} } dict get $trie {*}[split $prefix ""] } # iterate through all words in a trie calling a callback for each one. The # callback will be called with the string of each word. proc words {trie cmd {prefix ""}} { set tries [list [get $trie $prefix] $prefix] set i 0 while {[llength $tries] > $i} { set trie [lindex $tries $i] set prefix [lindex $tries [incr i]] # set tries [lassign $tries trie prefix] ;# VERY slow! if {[dict exists $trie END]} { uplevel 1 [linsert $cmd end $prefix] } dict for {k v} $trie { lappend tries $v $prefix$k } incr i } } # remove a word from a trie proc remove {trieVar word} { upvar 1 $trieVar trie if {![contains $trie $word]} { return } dict unset trie {*}[split $word ""] END # Could/should compact the trie at this point if no other words with # this word as a prefix. } # count the number of words in the trie proc size {trie {prefix ""}} { set count 0 words $trie count $prefix return $count } # private helpers proc count {args} { upvar 1 count var incr var } } ====== And a quick test/demo: ====== proc test {} { set t [trie create] foreach word {howdy hello who where what when why how} { trie add t $word } puts "t = $t" puts "words:" trie words $t puts puts "all wh- words:" trie words $t puts "wh" trie remove t how puts "now:" trie words $t {lappend words} puts [join $words ", "] } # A bigger test -- read all words in a text into the trie proc read-trie file { set t [trie create] set in [open $file r] while {[gets $in line] >= 0} { foreach word [regexp -all -inline {[a-zA-Z]+} $line] { trie add t $word } } return $t } set t [read-trie ~/Desktop/ulyss12.txt] ;# James Joyce's Ulysses puts "size = [trie size $t]" dict for {k v} $trie { puts "$k = [trie size $v]" } puts "Words beginning with 'the':" trie words $t puts "the" ====== Interestingly while testing this I noticed that it was taking a huge amount of time to calculate the number of distinct words in the trie (over a minute for just ~37000 words). Profiling revealed that the following idiom was to blame: set xs [lassign $xs x y] which is used to pop elements off the front of a queue. [Lassign] seems to be quite pathologically slow in this case... ---- !!!!!! %| [Category Data Structure] | [Category Glossary] |% !!!!!!