[AMG]: A critbit tree (a.k.a. Patricia tree, Radix tree, [trie]) is a binary search tree in which traversal proceeds to the left or right subtree depending on the value of certain critical bits in the key. See [http://en.wikipedia.org/wiki/Radix_tree] and [http://cr.yp.to/critbit.html] for more information. Critbit trees can only store (or index) prefix-free strings. They don't work when one key is the prefix of another key, for example "Bob" and "Bobby". Critbit trees work great for IP addresses and other structures that are all the same length. Critbit trees can be adapted for use with arbitrary strings, even strings with prefixes. The trick is to alter the key value to be prefix-free. One approach is to prefix each key with its length, though this requires establishing a maximum length beforehand, or establishing a way to encode lengths which is itself prefix-free. Another approach is to prefix each byte with a 1 bit and append a 0 bit to the key. This is known as kart, or key alteration radix tree [http://code.dogmap.org/kart/]; its advantage over the first approach is that it maintains the lexicographical sort order inherent in classic critbit trees. C strings are already prefix-free because they are NUL-terminated and contain no embedded NULs, so Critbit trees can store (or index) them directly. Tcl's internal string representation also has this property despite being able to contain embedded NULs since it represents them as the two-byte sequence `0xc0 0x80`. Implementations may choose to eliminate redundant (common) prefixes by storing them inside interior nodes, though this comes at the cost of needing to reconstitute the full strings in the process of walking the tree. For large strings this may be a benefit, and this can form the basis of a compression algorithm. Needless to say, if a critbit tree is used to index another data structure without itself storing the strings (only tracking their critical bits, etc.), this implementation option is not available. ---- [Stefan K]: I've implemented a critbit C extension here: [https://chiselapp.com/user/stefank/repository/qdsh]. [AMG]: Big thanks for bringing your code back online! ---- [AMG]: Here's my Tcl implementation of length-prefixed critbit and kart: ====== package require Tcl 8.5 set kart 1 if {$kart} { # Key alteration . proc Bits {key} { set bits {} foreach char [split $key ""] { lappend bits 1 {*}[split [binary scan $char B* b; set b] ""] } lappend bits 0 } } else { # Length prefix. proc Bits {key} { binary scan [binary format I [string length $key]]$key B* bits split $bits "" } } proc Walk {tree kb} { while {[llength $tree] == 3} { set tree [lindex $tree [expr {1 + [lindex $kb [lindex $tree 0]]}]] } lindex $tree 0 } proc insert {cbvar key} { upvar 1 $cbvar critbit if {![info exists critbit] || ![llength $critbit]} { set critbit [list $key] } else { set kb [Bits $key] set other [Walk $critbit $kb] if {$key ne $other} { set ob [Bits $other] for {set m 0} {[lindex $kb $m] == [lindex $ob $m]} {incr m} {} set tree $critbit set path {} while {[llength $tree] == 3 && [lindex $tree 0] < $m} { set child [expr {1 + [lindex $kb [lindex $tree 0]]}] set tree [lindex $tree $child] lappend path $child } if {![lindex $kb $m]} { lset critbit $path [list $m [list $key] $tree] } else { lset critbit $path [list $m $tree [list $key]] } } } } proc remove {cbvar key} { upvar 1 $cbvar critbit if {[info exists critbit]} { if {[llength $critbit] == 1} { if {$key eq [lindex $critbit 0]} { set critbit {} } } elseif {[llength $critbit] == 3} { set kb [Bits $key] set tree $critbit set path {} while {[llength $tree] == 3} { set child [expr {1 + [lindex $kb [lindex $tree 0]]}] set tree [lindex $tree $child] lappend path $child } if {$key eq [lindex $tree 0]} { lset critbit [lrange $path 0 end-1]\ [lindex $critbit {*}[lrange $path 0 end-1]\ [expr {3 - [lindex $path end]}]] } } } } proc find {cbvar key} { upvar 1 $cbvar critbit expr {[info exists critbit] && [llength $critbit] && $key eq [Walk $critbit [Bits $key]]} } ====== These procedures might be helpful for analyzing the resulting trees. ====== proc Descend {tree {path {}}} { if {[llength $tree] == 3} { Descend [lindex $tree 1] [concat $path [list [lindex $tree 0]]] Descend [lindex $tree 2] [concat $path [list [lindex $tree 0]]] } else { puts [format "%-60s %s" $path [join [Bits [lindex $tree 0]] ""]] } } proc Depth {tree {level 0}} { if {[llength $tree] == 3} { incr level expr {max([Depth [lindex $tree 1] $level], [Depth [lindex $tree 2] $level])} } else { return $level } } ====== Example usage, with $kart enabled: ====== % insert cb "Green Shell" {Green Shell} % insert cb Mario 5 {{Green Shell}} Mario % insert cb Mushroom 5 {{Green Shell}} {13 Mario Mushroom} % insert cb "Rainbow Road" 4 {5 {{Green Shell}} {13 Mario Mushroom}} {{Rainbow Road}} % insert cb "Mario Circuit" 4 {5 {{Green Shell}} {13 {45 Mario {{Mario Circuit}}} Mushroom}} {{Rainbow Road}} % find cb Mario 1 % find cb "Mario Circuit" 1 % find cb Luigi 0 % remove cb Mushroom 4 {5 {{Green Shell}} {45 Mario {{Mario Circuit}}}} {{Rainbow Road}} % find cb Mushroom 0 % insert cb Mushroom 4 {5 {{Green Shell}} {13 {45 Mario {{Mario Circuit}}} Mushroom}} {{Rainbow Road}} % Depth $cb 4 % Descend $cb 4 5 1010001111011100101011001011011001011011011101001000001010100111011010001011001011011011001011011000 4 5 13 45 1010011011011000011011100101011010011011011110 4 5 13 45 1010011011011000011011100101011010011011011111001000001010000111011010011011100101011000111011101011011010011011101000 4 5 13 1010011011011101011011100111011010001011100101011011111011011111011011010 4 1010100101011000011011010011011011101011000101011011111011101111001000001010100101011011111011000011011001000 ====== Example usage, with $kart disabled: ====== % insert cb "Green Shell" {Green Shell} % insert cb Mario 28 Mario {{Green Shell}} % insert cb Mushroom 28 Mario {30 Mushroom {{Green Shell}}} % insert cb "Rainbow Road" 28 Mario {29 {30 Mushroom {{Green Shell}}} {{Rainbow Road}}} % insert cb "Mario Circuit" 28 Mario {29 {30 Mushroom {{Green Shell}}} {31 {{Rainbow Road}} {{Mario Circuit}}}} % find cb Mario 1 % find cb "Mario Circuit" 1 % find cb Luigi 0 % remove cb Mushroom 28 Mario {29 {{Green Shell}} {31 {{Rainbow Road}} {{Mario Circuit}}}} % find cb Mushroom 0 % insert cb Mushroom 28 Mario {29 {30 Mushroom {{Green Shell}}} {31 {{Rainbow Road}} {{Mario Circuit}}}} % Depth $cb 3 % Descend $cb 28 000000000000000000000000000001010100110101100001011100100110100101101111 28 29 30 000000000000000000000000000010000100110101110101011100110110100001110010011011110110111101101101 28 29 30 000000000000000000000000000010110100011101110010011001010110010101101110001000000101001101101000011001010110110001101100 28 29 31 00000000000000000000000000001100010100100110000101101001011011100110001001101111011101110010000001010010011011110110000101100100 28 29 31 0000000000000000000000000000110101001101011000010111001001101001011011110010000001000011011010010111001001100011011101010110100101110100 ====== <> Data Structure