This page is the result of a comment on the Better Arrays for Tcl9 page -- JCW
This demo illustrates an experimental "hash" command and datatype for Tcl, which uses the usual odd-even representation to store data, plus an internal integer "map" vector to represent a hashing layer placed on top.
The "hash" command has been renamed to "ihash" and is now part of CritLib [L1 ]
This sounds from an API point of view similar to TclX's keyed lists. The mapping of course improves performance.
Yes, but TclX's keyed lists are lists of 2-item lists, while the odd-even lists below are a single list... makes quite a difference in speed and mem-use --jcw
Differences between hash objects and arrays:
Performance-wise, the output below seems to indicate the following:
The current code does not support set/unset yet. Setting would either add or replace, depending on whether the key already exists.
Correction: the most recent upload [L2 ] now DOES support set and unset (and exists), see new examples in [L3 ] -jcw
Script:
puts "version = [package require jolt]" if {[info commands nop] != ""} { puts "10 nops = [time {nop;nop;nop;nop;nop;nop;nop;nop;nop;nop} 10000]" } proc hash_try {} { set a {1 one 2 two 3 three} puts "data = [hash a]" puts "length = [hash a length]" puts "keys = [hash a keys]" puts "values = [hash a values]" puts "map = [hash a map]" puts "get 1 = [hash a get 1]" puts "get 2 = [hash a get 2]" puts "get 3 = [hash a get 3]" foreach x {1 10 100 1000 10000 100000} { set v {} for {set y 0} {$y < $x} {incr y} { lappend v $y [expr {$y+$y}] } set n [expr {1000000/$x}] if {$n > 100} { set n 100 } set r [lindex [time {set a $v; hash a} ${n}0] 0] set s [lindex [time {array unset b; array set b $v} $n] 0] puts [format "create %6d = %5d hash, vs. %d array" $x $r $s] } foreach x {0 1000 abc} { catch {hash a get $x} e1 catch {set b($x)} e2 puts "get $x: $e1 <-> $e2" } set x 1 puts "10x h = [time { hash a get $x; hash a get $x; hash a get $x; hash a get $x; hash a get $x; hash a get $x; hash a get $x; hash a get $x; hash a get $x; hash a get $x} 10000]" puts "10x a = [time { set b($x); set b($x); set b($x); set b($x); set b($x); set b($x); set b($x); set b($x); set b($x); set b($x)} 10000]" puts "h all = [time {foreach x [hash a keys] { hash a get $x }}]" puts "a all = [time {foreach x [array names b] { set b($x) }}]" puts "same = [expr {[lsort [hash a]] == [lsort [array get b]]}]" puts "h list = [time {foreach {x y} [hash a] { }}]" puts "a list = [time {foreach {x y} [array get b] { }}]" }
Output (SuSE 7.1 Linux, PIII/650):
version = 2001.11.06.170201 10 nops = 7 microseconds per iteration HASH ==== data = 1 one 2 two 3 three length = 3 keys = 1 2 3 values = one two three map = 0 -1 0 -1 0 -1 0 -1 2105051955 2 -2061914958 4 0 -1 1977051568 0 get 1 = one get 2 = two get 3 = three create 1 = 5 hash, vs. 12 array create 10 = 5 hash, vs. 31 array create 100 = 6 hash, vs. 323 array create 1000 = 30 hash, vs. 2310 array create 10000 = 1370 hash, vs. 37315 array create 100000 = 15341 hash, vs. 415132 array get 0: 0 <-> 0 get 1000: 2000 <-> 2000 get abc: no such key <-> can't read "b(abc)": no such element in array 10x h = 29 microseconds per iteration 10x a = 20 microseconds per iteration h all = 557448 microseconds per iteration a all = 556954 microseconds per iteration same = 1 h list = 216386 microseconds per iteration a list = 542398 microseconds per iteration
This is pretty much the same as the new dictionary type that's in 8.5, non? Why rewrite it?
It's not a "rewrite" - it has been around since Nov 2001... -jcw
FW: Good point. ;) Come to think of it, frederic bonnet's interface (the one which I assume is the one that's going into the Tcl core) is pretty similar to yours. Is it (or at least the interface) based on your code?
I don't think so (ihash uses a Python-derived hash algorithm, not Tcl's), but that's not so important - just wanted to mention ihash as idea for forward-compatible dict. I really don't care to lobby for ihash, having it buried in wiki-dust is fine too! -jcw