Version 7 of critbit

Updated 2020-05-03 09:24:23 by pooryorick

A critbit tree (a.k.a. Patricia tree, Radix tree, * trie) is a binary search tree in which traversal proceeds to the left or right subtree depending on the value of certain critical bits in the key. See also radix tree .

In a critbit tree redundant prefixes of each string added to the tree are removed and used form intermediate nodes under which the prefix-free part of the string is stored as a leaf node. Each intermediate node contains a left side and a right side, and a value indicating the first bit on the right side that is different from the left side. I.e. the prefix of the right side can be found somewhere in the structure on the left side. The implementation might store the prefix as part of the value of the leaf node furthest to the left, or it might store the prefix as auxiliary data in node where values diverge.

The original string can be reconstituted by following the path to the leaf node and prepending prefixes found along the way. Since each prefix is only stored once, a form of compression is achieved. Of course, an implementation might choose not to actually remove the prefixes, but just to use the redundant prefixes to build the structure, in which case it foregoes the compression but gains the benefit of keeping each original string intact in a leaf node.

To use a critbit tree as a key-value store, store the key and value as a single string separated by some delimiter. The delimiter, of course, must not occur in the key. For example, a streaming canonicl list containing the key and the value could be used. Alternatively, use fixed-width keys, e.g. IP addresses or other fixed-witdh values.

When implementing a critbit tree at the C level, a key-value pair could be represented as two adjacent C strings with the NULL that terminates the first C string forming the delimiter between the key and the value. A Tcl string could also be used since it also NUL-terminated and encodes any embedded nullsas the two-byte sequence 0xc0 0x80.

kart , or key alteration radix tree is a similar structure. It also features the lexicographical sort order inherent in classic critbit trees.


Stefan K: I've implemented a critbit C extension here: [L1 ].

AMG: Big thanks for bringing your code back online!


AMG: Here's my Tcl implementation of length-prefixed critbit and kart:

package require Tcl 8.5

set kart 1

if {$kart} {
    # Key alteration <URL:http://code.dogmap.org/kart/>.
    proc Bits {key} {
        set bits {}
        foreach char [split $key ""] {
            lappend bits 1 {*}[split [binary scan $char B* b; set b] ""]
        }
        lappend bits 0
    }
} else {
    # Length prefix.
    proc Bits {key} {
        binary scan [binary format I [string length $key]]$key B* bits
        split $bits ""
    }
}

proc Walk {tree kb} {
    while {[llength $tree] == 3} {
        set tree [lindex $tree [expr {1 + [lindex $kb [lindex $tree 0]]}]]
    }
    lindex $tree 0
}

proc insert {cbvar key} {
    upvar 1 $cbvar critbit
    if {![info exists critbit] || ![llength $critbit]} {
        set critbit [list $key]
    } else {
        set kb [Bits $key]
        set other [Walk $critbit $kb]
        if {$key ne $other} {
            set ob [Bits $other]
            for {set m 0} {[lindex $kb $m] == [lindex $ob $m]} {incr m} {}

            set tree $critbit
            set path {}
            while {[llength $tree] == 3 && [lindex $tree 0] < $m} {
                set child [expr {1 + [lindex $kb [lindex $tree 0]]}]
                set tree [lindex $tree $child]
                lappend path $child
            }

            if {![lindex $kb $m]} {
                lset critbit $path [list $m [list $key] $tree]
            } else {
                lset critbit $path [list $m $tree [list $key]]
            }
        }
    }
}

proc remove {cbvar key} {
    upvar 1 $cbvar critbit
    if {[info exists critbit]} {
        if {[llength $critbit] == 1} {
            if {$key eq [lindex $critbit 0]} {
                set critbit {}
            }
        } elseif {[llength $critbit] == 3} {
            set kb [Bits $key]
            set tree $critbit
            set path {}
            while {[llength $tree] == 3} {
                set child [expr {1 + [lindex $kb [lindex $tree 0]]}]
                set tree [lindex $tree $child]
                lappend path $child
            }
            if {$key eq [lindex $tree 0]} {
                lset critbit [lrange $path 0 end-1]\
                    [lindex $critbit {*}[lrange $path 0 end-1]\
                    [expr {3 - [lindex $path end]}]]
            }
        }
    }
}

proc find {cbvar key} {
    upvar 1 $cbvar critbit
    expr {[info exists critbit] && [llength $critbit]
       && $key eq [Walk $critbit [Bits $key]]}
}

These procedures might be helpful for analyzing the resulting trees.

proc Descend {tree {path {}}} {
    if {[llength $tree] == 3} {
        Descend [lindex $tree 1] [concat $path [list [lindex $tree 0]]]
        Descend [lindex $tree 2] [concat $path [list [lindex $tree 0]]]
    } else {
        puts [format "%-60s %s" $path [join [Bits [lindex $tree 0]] ""]]
    }
}

proc Depth {tree {level 0}} {
    if {[llength $tree] == 3} {
        incr level
        expr {max([Depth [lindex $tree 1] $level],
                  [Depth [lindex $tree 2] $level])}
    } else {
        return $level
    }
}

Example usage, with $kart enabled:

% insert cb "Green Shell"
{Green Shell}
% insert cb Mario
5 {{Green Shell}} Mario
% insert cb Mushroom
5 {{Green Shell}} {13 Mario Mushroom}
% insert cb "Rainbow Road"
4 {5 {{Green Shell}} {13 Mario Mushroom}} {{Rainbow Road}}
% insert cb "Mario Circuit"
4 {5 {{Green Shell}} {13 {45 Mario {{Mario Circuit}}} Mushroom}} {{Rainbow Road}}
% find cb Mario
1
% find cb "Mario Circuit"
1
% find cb Luigi
0
% remove cb Mushroom
4 {5 {{Green Shell}} {45 Mario {{Mario Circuit}}}} {{Rainbow Road}}
% find cb Mushroom
0
% insert cb Mushroom
4 {5 {{Green Shell}} {13 {45 Mario {{Mario Circuit}}} Mushroom}} {{Rainbow Road}}
% Depth $cb
4
% Descend $cb
4 5       1010001111011100101011001011011001011011011101001000001010100111011010001011001011011011001011011000
4 5 13 45 1010011011011000011011100101011010011011011110
4 5 13 45 1010011011011000011011100101011010011011011111001000001010000111011010011011100101011000111011101011011010011011101000
4 5 13    1010011011011101011011100111011010001011100101011011111011011111011011010
4         1010100101011000011011010011011011101011000101011011111011101111001000001010100101011011111011000011011001000

Example usage, with $kart disabled:

% insert cb "Green Shell"
{Green Shell}
% insert cb Mario
28 Mario {{Green Shell}}
% insert cb Mushroom
28 Mario {30 Mushroom {{Green Shell}}}
% insert cb "Rainbow Road"
28 Mario {29 {30 Mushroom {{Green Shell}}} {{Rainbow Road}}}
% insert cb "Mario Circuit"
28 Mario {29 {30 Mushroom {{Green Shell}}} {31 {{Rainbow Road}} {{Mario Circuit}}}}
% find cb Mario
1
% find cb "Mario Circuit"
1
% find cb Luigi
0
% remove cb Mushroom
28 Mario {29 {{Green Shell}} {31 {{Rainbow Road}} {{Mario Circuit}}}}
% find cb Mushroom
0
% insert cb Mushroom
28 Mario {29 {30 Mushroom {{Green Shell}}} {31 {{Rainbow Road}} {{Mario Circuit}}}}
% Depth $cb
3
% Descend $cb
28       000000000000000000000000000001010100110101100001011100100110100101101111
28 29 30 000000000000000000000000000010000100110101110101011100110110100001110010011011110110111101101101
28 29 30 000000000000000000000000000010110100011101110010011001010110010101101110001000000101001101101000011001010110110001101100
28 29 31 00000000000000000000000000001100010100100110000101101001011011100110001001101111011101110010000001010010011011110110000101100100
28 29 31 0000000000000000000000000000110101001101011000010111001001101001011011110010000001000011011010010111001001100011011101010110100101110100

Page Authors

AMG
PYK
Added a description of the critbit structure. Removed inaccurate statements.