Version 3 of critbit

Updated 2016-07-13 17:32:29 by AMG

AMG: A critbit tree (a.k.a. Patricia tree, Radix tree, * trie) is a binary search tree in which traversal proceeds to the left or right subtree depending on the value of certain critical bits in the key. See [L1 ] and [L2 ] for more information.

Critbit trees can only store prefixes. They don't work when one key is the prefix of another key, for example "Bob" and "Bobby". Critbit trees work great for IP addresses and other structures that are all the same length.

Critbit trees can be adapted for use with arbitrary strings, even strings with prefixes. The trick is to alter the key value to be prefix-free. One approach is to prefix each key with its length, though this requires establishing a maximum length beforehand, or establishing a way to encode lengths which is itself prefix-free. Another approach is to prefix each byte with a 1 bit and append a 0 bit to the key. This is known as kart, or key alteration radix tree [L3 ]; its advantage over the first approach is that it maintains the lexicographical sort order inherent in classic critbit trees.

C strings are already prefix-free because they are NUL-terminated and contain no embedded NULs, so Critbit trees can store them directly. Tcl's internal string representation also has this property despite being able to contain embedded NULs since it represents them as the two-byte sequence 0xc0 0x80.


Stefan K: I've implemented a critbit C extension here: [L4 ].


AMG: Here's my Tcl implementation of length-prefixed critbit and kart:

package require Tcl 8.5

set kart 1

if {$kart} {
    # Key alteration <URL:http://code.dogmap.org/kart/>.
    proc Bits {key} {
        set bits {}
        foreach char [split $key ""] {
            lappend bits 1 {*}[split [binary scan $char B* b; set b] ""]
        }
        lappend bits 0
    }
} else {
    # Length prefix.
    proc Bits {key} {
        binary scan [binary format I [string length $key]]$key B* bits
        split $bits ""
    }
}

proc Walk {tree kb} {
    while {[llength $tree] == 3} {
        set tree [lindex $tree [expr {1 + [lindex $kb [lindex $tree 0]]}]]
    }
    lindex $tree 0
}

proc insert {cbvar key} {
    upvar 1 $cbvar critbit
    if {![info exists critbit] || ![llength $critbit]} {
        set critbit [list $key]
    } else {
        set kb [Bits $key]
        set other [Walk $critbit $kb]
        if {$key ne $other} {
            set ob [Bits $other]
            for {set m 0} {[lindex $kb $m] == [lindex $ob $m]} {incr m} {}

            set tree $critbit
            set path {}
            while {[llength $tree] == 3 && [lindex $tree 0] < $m} {
                set child [expr {1 + [lindex $kb [lindex $tree 0]]}]
                set tree [lindex $tree $child]
                lappend path $child
            }

            if {![lindex $kb $m]} {
                lset critbit $path [list $m [list $key] $tree]
            } else {
                lset critbit $path [list $m $tree [list $key]]
            }
        }
    }
}

proc remove {cbvar key} {
    upvar 1 $cbvar critbit
    if {[info exists critbit]} {
        if {[llength $critbit] == 1} {
            if {$key eq [lindex $critbit 0]} {
                set critbit {}
            }
        } elseif {[llength $critbit] == 3} {
            set kb [Bits $key]
            set tree $critbit
            set path {}
            while {[llength $tree] == 3} {
                set child [expr {1 + [lindex $kb [lindex $tree 0]]}]
                set tree [lindex $tree $child]
                lappend path $child
            }
            if {$key eq [lindex $tree 0]} {
                lset critbit [lrange $path 0 end-1]\
                    [lindex $critbit {*}[lrange $path 0 end-1]\
                    [expr {3 - [lindex $path end]}]]
            }
        }
    }
}

proc find {cbvar key} {
    upvar 1 $cbvar critbit
    expr {[info exists critbit] && [llength $critbit]
       && $key eq [Walk $critbit [Bits $key]]}
}

These procedures might be helpful for analyzing the resulting trees.

proc Descend {tree {path {}}} {
    if {[llength $tree] == 3} {
        Descend [lindex $tree 1] [concat $path [list [lindex $tree 0]]]
        Descend [lindex $tree 2] [concat $path [list [lindex $tree 0]]]
    } else {
        puts [format "%-60s %s" $path [join [Bits [lindex $tree 0]] ""]]
    }
}

proc Depth {tree {level 0}} {
    if {[llength $tree] == 3} {
        incr level
        expr {max([Depth [lindex $tree 1] $level],
                  [Depth [lindex $tree 2] $level])}
    } else {
        return $level
    }
}

Example usage, with $kart enabled:

% insert cb "Green Shell"
{Green Shell}
% insert cb Mario
5 {{Green Shell}} Mario
% insert cb Mushroom
5 {{Green Shell}} {13 Mario Mushroom}
% insert cb "Rainbow Road"
4 {5 {{Green Shell}} {13 Mario Mushroom}} {{Rainbow Road}}
% insert cb "Mario Circuit"
4 {5 {{Green Shell}} {13 {45 Mario {{Mario Circuit}}} Mushroom}} {{Rainbow Road}}
% find cb Mario
1
% find cb "Mario Circuit"
1
% find cb Luigi
0
% remove cb Mushroom
4 {5 {{Green Shell}} {45 Mario {{Mario Circuit}}}} {{Rainbow Road}}
% find cb Mushroom
0
% insert cb Mushroom
4 {5 {{Green Shell}} {13 {45 Mario {{Mario Circuit}}} Mushroom}} {{Rainbow Road}}
% Depth $cb
4
% Descend $cb
4 5       1010001111011100101011001011011001011011011101001000001010100111011010001011001011011011001011011000
4 5 13 45 1010011011011000011011100101011010011011011110
4 5 13 45 1010011011011000011011100101011010011011011111001000001010000111011010011011100101011000111011101011011010011011101000
4 5 13    1010011011011101011011100111011010001011100101011011111011011111011011010
4         1010100101011000011011010011011011101011000101011011111011101111001000001010100101011011111011000011011001000

Example usage, with $kart disabled:

% insert cb "Green Shell"
{Green Shell}
% insert cb Mario
28 Mario {{Green Shell}}
% insert cb Mushroom
28 Mario {30 Mushroom {{Green Shell}}}
% insert cb "Rainbow Road"
28 Mario {29 {30 Mushroom {{Green Shell}}} {{Rainbow Road}}}
% insert cb "Mario Circuit"
28 Mario {29 {30 Mushroom {{Green Shell}}} {31 {{Rainbow Road}} {{Mario Circuit}}}}
% find cb Mario
1
% find cb "Mario Circuit"
1
% find cb Luigi
0
% remove cb Mushroom
28 Mario {29 {{Green Shell}} {31 {{Rainbow Road}} {{Mario Circuit}}}}
% find cb Mushroom
0
% insert cb Mushroom
28 Mario {29 {30 Mushroom {{Green Shell}}} {31 {{Rainbow Road}} {{Mario Circuit}}}}
% Depth $cb
3
% Descend $cb
28       000000000000000000000000000001010100110101100001011100100110100101101111
28 29 30 000000000000000000000000000010000100110101110101011100110110100001110010011011110110111101101101
28 29 30 000000000000000000000000000010110100011101110010011001010110010101101110001000000101001101101000011001010110110001101100
28 29 31 00000000000000000000000000001100010100100110000101101001011011100110001001101111011101110010000001010010011011110110000101100100
28 29 31 0000000000000000000000000000110101001101011000010111001001101001011011110010000001000011011010010111001001100011011101010110100101110100