Version 6 of Judy Arrays

Updated 2003-10-06 04:56:11

http://judy.sourceforge.net is a very fast, very scalable dynamic array facility. I wonder how it compares to Tcl's hashing and whether it is worth looking into as the basis for a more powerful array command.

CMcC they certainly look interesting, but it seems to me that they are trees which store their keys internally, and they distribute it within the tree (they seem to be based on a trie data structure) so I can't see how you could use them for storing objects with an arbitrary ordering unless you used the address of the object as they key (cast to an integer.) Hmm, I guess that would work after all, as a kind of hash. It's quite counter-intuitive that something giving a complete (if arbitrary) ordering would be more efficient than something like a hash, but since Judy is designed to play well with CPU caches, I guess such unusual results are to be expected.

CMcC Ok, I just tried to make a dict-like judy extension, ran into what seems to be a showstopper:

There are two (actually three) flavours of Judy: JudyL (mapping long->long) and JudySL (mapping char*->long).

Initially I'd thought to cast Tcl_Obj to long and use JudyL: that won't work because it's very possible for two different Tcl_Obj's to have identical string representations, which under this scheme would give two distinct entries with (from tcl's POV) identical keys. That breaks Hash/Array semantics.

The alternative, using JudySL won't work either: Judy stores the key value inside itself, that is it duplicates the string key and stores it internally. That's bad because it entails duplication of the string rep of the key, but it's fatal because there's no way for the Judy tree to be informed when that string representation becomes invalid.

Probably I'm missing something, and there's no problem with using a JudySL as a simple mapping from key's string rep to Tcl_Obj, and that suffices to provide Hash/Array semantics. Someone check my reasoning, please.

CMcC there's no real problem with using JudySL, but there is a hard limit on the maximum length of a string key: 65536, which would mean that keys would have to be unique in the first 65K characters. That's probably not too much of an impost, but I think it precludes Judy from being used as a generalised replacement for tcl Hash.

http://www.snookles.com/scott/publications/pli2003-slf.pdf talks about how much faster it is in Erlang as compared with Erlang's builtin hashing.

-- Todd Coram