This is a comment about managing data in Tcl. No code, just some thoughts for now... jcw
Tcl has arrays, and as of 8.5 dicts. Together, these let you manage tabular data. There are several ways to do this:
1) A list of lists (row-wise is the most common)
2) A list of dicts
3) An array of lists
4) An array of dicts
Can we do better, yet stay in the Tcl realm?
I think we could. The refinement we would need is a change in the way dicts are hashed. Instead of hashing across a list of interleaved keys and values, we need a hash which takes a key, and returns its position in the list. Then, instead of having each row carry its internal hash table, we could have one hash for all rows. After all, there is not necessarily a need to use different row orders across a table.
The required change to dicts would be that they do not lose their hash table when an lset is done on a non-key (i.e. odd) item in the list (or is that already the case? I have not checked it). Or if lset is not used: that when an unshared copy is made of a dict, they continue to share the hash table as long as no changes are made to the set of keys (again, have not examined dict's logic).
The other way to work around this issue and reduce memory use, is to store tabular data as an array of lists, and to manually manage the field-name-to-column-position mapping. Then access to row x field y would be:
puts [lindex $data(x) [dict get $fieldmap y]]
For comparison, the normal array-of-dict approach is:
puts [dict get $data(x) y]
The point of this all is to minimize memory usage with large numbers of rows or fields. Rows cannot be "less" than a Tcl list, but this way we could at least avoid the overhead of (shared) field names and the dict hash on each row.
Using arrays as the main store (and dicts or lists inside) also plays nice with Tequila, I expect.
NEM Interesting stuff. I was thinking about a similar thing just yesterday (looking at R and some of your Vlerq stuff). If I understand you correctly, the key idea here is to be able to set an element in a multiply-nested data structure without requiring that each level of data structure be of the same type? So, for instance you can have a dict of lists of dicts of... and some operation can query/set to arbitrary depth without causing shimmering? That would be useful, but generally goes against Tcl's (non-)typing. You could maybe do something with TOOT. Need to think more about this.
As an aside, is the following situation covered in your cases above:
Or perhaps I need some coffee to think about this more clearly.
DKF: Learning from the experience of metakit, I'd suggest:
5) A dict of lists
dict update data $columnName col { dict set data $columnName {} ;# Column refcount should now be 1 (i.e. efficient!) lset col $row $newValue } set col {} ;# Column refcount should now be 1 again
NEM: How do you handle missing data though?