Purpose: define what shimmering is, what causes it, why one wants it to occur or wants to avoid it, and how to cause it or avoid it. ---- "Shimmering" is a [Tcl] quirk or strength, depending on where you're coming from. It refers to the fact that internally, Tcl can keep two representations for each value. One of them is always a string, the other is usually a fast equivalent for it, i.e. ints, floats, and lists. [Donald Arseneau]: "Shimmering ... is repetitive changing of the internal representation for some data. It has no effect on the program's results, but can slow it down dramatically." [CL] notes that a few unusual [extension]s seem to have the ability to break Tcl's [EIAS] semantics, and then shimmering becomes functionally visible in an otherwise [pure-Tcl] application. [tcom], for example, takes "hints" from the internal representations when converting Tcl values to native machine values. When you do "set x {a b c}", you store a 5-character string in x. When you then do "lappend x d", you will end up with a 4-item list. Which, as far as you're normally concerned, is simply "a b c d". ''And it is!'' - but Tcl will play a clever trick, and convert the string to an efficient internal representation of a list of things. It's all hidden. When you do "puts $x", you get the result you expect. What happens is that puts wants a string, Tcl sees it has none any more, and then creates a proper string for you, on-the-fly. At this time, you may decide to do "lappend x e". Tcl will happily detect that a list is there and quickly append. As a crucial side-effect, it will also discard the 7-character string, which is no longer appropriate. Keep in mind that no new string gets created at this point. The point of all this is speed (lots of it). But conceptually, scripts can be built without caring one shred about this duality. Until "shimmering" sets in... this is used to describe the effect that Tcl continuously alternates between creating a string representation, discarding the other one, and going back to the underlying one and discarding the string. Here's a non-shimmering loop (no string operations at all): for {set i 0} {$i < 10000} {incr i} { lappend x $i } Here's one which shimmers a bit (the list is never lost): for {set i 0} {$i < 10000} {incr i} { lappend x [string bytelength $x] } This one shimmers badly (a list, a string, a list, ...): for {set i 0} {$i < 10000} {incr i} { lappend x $i; append x . } This one also shimmers badly (currently), but for less obvious reasons (a side-effect of [string length] is that an internal representation is built which would make [string index] efficient): for {set i 0} {$i < 10000} {incr i} { lappend x [string length $x] } ''Timing comparison left as exercise for the reader... You may want to start out with 1000 as upper limit instead of 10000.'' ---- All conversions in tcl currently go through the string representation. Consider: set i [expr {$n+1}] puts [llength $i] Here, "i" will be an int first, which then needs to be turned into a list. To do so, a string is constructed, parsed, and converted to a one-item list. Which we know is doing too much - an int cannot be anything but a one-item list. This was a contrived case, but now compare the two below: uplevel 1 [list myproc $arg] uplevel [list myproc $arg] Trouble (a bit: only in terms of trying to achieve top performance). How about introducing a "typecast matrix"? With a growing, perhaps dynamically extensible, set of smart conversions for special cases? If a converter is present it gets called. If it fails or there is none, then conversion progresses as usual - through an intermediate string rep. Note that the uplevel example above requires yet more smarts. It's a list of two items, hence it cannot possibly be an int - no need to convert, fail, and have lost the list in the process. A thought for Tcl9 perhaps? -jcw As to the specific case jcw cites - a list of more than one element can ''never'' be successfully converted to an int ... would it not suffice to build that peephole optimisation into the C function which converts arbitrary objects to ints? In other words, rather than an extensible table of smart convertors (which seems to me would be pretty sparse), use the C code itself to implement short circuits for failure. [CMCc] '''[DGP]''' Along these lines, see Tcl Patch 738900. ---- [AMG]: I have a project to unify the internal representations for [list] and [dict]. However, I haven't touched this project in quite a while, since I've been massively busy with other stuff. Anyway, this unification would avoid shimmering in the case of using list methods to access dicts, and it would optimize the generation of the string representation of a dict. Here's a simple example of a case that would be improved: ====== set data [dict create {a 1 b 2}] ;# must have a dict intrep foreach {key val} $data {...} ;# normally should use [dict for] ====== Of course, you could just use [dict for]. But that is less flexible than general [foreach]. A long-term goal of the project is to create a new data type for sets, which I think I'll have to call rings because [set] is very much taken. :^) The difference between a dict and a ring is that a dict has a value for each key, whereas a ring has only keys. Both would be unified with lists, in that they're really just indexes over the list intrep. Of course, a dict can currently be used to implement sets; just set all the values to be empty string. But it takes positive effort to generate such a data structure from a list of keys. If I get my way, the list of keys ''would be'' the ring. One nice side effect of my approach is that the indexing data could be made available at the script level. This would make it possible to find which list index corresponds to a given key without having to do a search. An example use is to look up the column number corresponding to a column name in a tabular data structure. ====== set fields {first middle last url} set data { {Donald G Porter http://math.nist.gov/~DPorter/contact/} {Robert L Hicks http://wiki.tcl.tk/11367} {Andy M Goth http://andy.junkdrome.org/} } foreach $fields [concat {*}[lsort -index [ring index $fields last] $data]] { puts [format "> %-24s : %s" "$last, $first $middle" $url] } > Goth, Andy M : http://andy.junkdrome.org/ > Hicks, Robert L : http://wiki.tcl.tk/11367 > Porter, Donald G : http://math.nist.gov/~DPorter/contact/ ====== Back to shimmering. This would create a new, lesser type of shimmering. Rather than blowing away the original list intrep in order to make the dict, the object type would remain list and a dict index would be generated over it. So long as the [Tcl_Obj] is modified using only dict accessors, the dict index would remain valid. Using [lappend] would wipe out the index, but using [llength] would not. ---- '''[AK] - 2010-06-29 17:47:54''' Regarding sets/rings, have a look at the [struct::set] package in Tcllib. This package not only has the regular Tcl implementation, but also a C implementation based on [Critcl], which in essence creates a Tcl_ObjType. It uses a Tcl_HashTable to make the set operations fast (i.e. quick check for existence of set elements, at the core of stuff like intersection and difference, etc.). A general note: From time to time multi-intrep musings seem come up as an alternative to unified data structures for lists, dicts, and sets, i.e. instead of having string and intrep a Tcl_Obj is allowed to have string plus multiple intreps at the same time. I do not remember anybody having worked out all the quirks and corner cases of such a system. Although I seem to remember that [FB]'s [Colibri]/[Cloverfield] is an attempt of doing that (plus ropes for strings). Another note, IIRC some of the builtin commands, like foreach, have been modified to work with dicts as well, without converting them to list. This however is done on command-by-command basis by checking for the various intrep types in the command. <> Concept | Glossary | Internals | String Processing