This page is about (the features that should be produced within) an ongoing project of the Google Summer Of Code 2011.
Current version as of 2011-08-22 is 0.4, which is probably best downloaded from the GSoC 2011 Executed Projects download site, but can also be found here: http://abel.math.umu.se/~lars/tcl/stasher-0.4.tar.gz
Previous version from 2011-07-20 was 0.3.1, available here: http://abel.math.umu.se/~lars/tcl/stasher-0.3.1.tar.gz This fixes some segfaults in v0.3, which were caused by missing a few Tcl_IncrRefCount calls.
At mid-term (2011-07-15), the current version of this package was 0.3, which was the first to be coded entirely in C.
In many cases, there is a gap between the ways in which a human (or more concretely: a Tcler) might be comfortable defining or specifying something, and the ways in which a computer/program/algorithm might need it to be defined or specified if processing is to be reasonably efficient; see below for some examples of such gaps. It is usually not difficult to bridge these gaps, but there is a cost associated with moving something across a gap, so one does not want to do it more than necessary. The basic solution to that is to cache the costly alternative representation after it has been computed the first time, and thereby avoid having to repeat the same computation later; this "only" takes a bit of administration.
As it happens, Tcl has (and makes extensive use of) an elegant mechanism that facilitates this: the dual-ported Tcl_Obj structures that store every value in a Tcl program ? a value can have both a string representation suitable for human consumption and an internal representation with which the computer is more comfortable. While the most frequently mentioned applications of this are probably the low-level optimisations of making "a list" internally a C-style vector and storing "a number" internally in binary rather than decimal, the mechanism is by no means restricted to such applications. Indeed, one of the most important is that the bytecode to which a Tcl script gets compiled is stored in the internal representation of the Tcl_Obj whose string representation is the script in question! Other cases where the result of compiling something is cached in its Tcl_Obj internal representation include expressions (which is one reason you should brace your expr-essions), lambdas, and regexps.
Since the Tcl_Obj system is open for new types, it is not just the Tcl core that can benefit from caching results in internal representations. Traditionally, there has however been one restriction: One had to implement the computations in C, since there (by design) is no script level access to the internal representation of a Tcl_Obj. The stasher package changes this, by letting the computations whose results are to be stored be defined at script level, while it takes care of the C-level access to Tcl_Obj internal representations that helps avoid making the same computation more than once. The effect is that you can implement something like the regexp command in pure Tcl, without a per-use overhead for compiling the regular expression!
In terms of program flow, it is similar to having an array whose contents are computed by read traces (cf. Arrays as cached functions), so execution is demand-driven. Syntactically, it is of course more dictish than arrayish, with the main call being something like
somestasher get $value foobar
to retrieve the "foobar" of $value. Rather than being explicitly part of the $value (as would be the case with a dict) this "foobar" is supposed to be something that can be computed from the $value; concretely, it is the name of a function, and what the get call is supposed to return is the result of applying this function to the $value. However, this result need not be computed anew for every get call; stashers rather prefer to return what was computed upon some earlier call.
A Tcl_Obj internal representation which is used for storing data derived from the (string representation) value is called a stash. Individual data items stored in a stash are called properties (of the value in the string representation). Stashes are created and managed by commands called stashers. Stashers are created using the stasher command. Every stasher has an associated properties object, which is an oo::object whose exported methods define the properties for its stasher: when the stasher is asked for a property value that is not found in the stash, it will compute this by calling the corresponding method of the properties object.
As of version 0.3, a stasher has three subcommands that get property values from a stash:
$stasher eval $name $stash
eval [$stasher get $stash $name]
proc stasher_apply_helper {cmd propL stashVal args} { set call [list apply] foreach propName $propL { lappend call [$cmd get $stashVal $propName] } tailcall {*}$call {*}$args }
There are two subcommands (each with two forms) for configuring the properties object:
package require sha1 2.0.3 ; # From tcllib, if nothing else found. foo define { method sha1 {stash} { ::sha1::sha1 $stash } }
foo define forward sha1 ::sha1::sha1
$stasher inscope $prefix $arg1 $arg2
$stasher inscope [list {*}$prefix $arg1 $arg2]
Finally, there are three subcommands which are mainly useful when debugging or introspecting:
The stasher command has the syntax
Finally, there is (at least so far) also a sesame command that gives more direct access to stashes (e.g. the sesame keys command tells you what properties currently have entries in a particular stash). This command blatantly violates EIAS, so don't expect it to be available in a deployment version. (But it's certainly a cute name for a command giving magical access to stashes, don't you think?)
Polynomials written for humans (rather than compilers) to parse often look something like
3x + 0.1615 - 4/11x^2 + x^10
(Notable deviations from standard computer language expression syntax includes lack of explicit multiplication operations and coefficients that are general fractions. Relative to Tcl, there is also the lack of a $ in front of the variable.) Wouldn't it be nice if one could use notation such as the above within Tcl? And moreover have it be efficient? Well, with a stasher you can do that. Let's call the stasher command polynomial
package require stasher 0.3 stasher polynomial {
The main idea is to define properties of a polynomial (of the form shown above) that constitute more practical encodings of the same information. To begin with, one might tokenise the polynomial string as follows:
method tokens {str} { set prev -1 set res {} foreach pair [ regexp -all -inline -indices {(?x) [0-9]+ | # Integers [0-9]*\.[0-9]+(?:e[+-]?[0-9]+)? | # Floats [0-9]+/[1-9][0-9]* | # Quotients x(?:\^[0-9]+)? | # Variable [+-] # Sign } $str ] { foreach {start end} $pair break set junk [string trim [string range $str $prev+1 $start-1]] if {$junk ne ""} then { error "Strange character(s) in polynomial after\ position $prev: $junk" } lappend res [string range $str $start $end] set prev $end } set junk [string trim [string range $str $prev+1 end]] if {$junk ne ""} then { error "Strange character(s) at end (after\ position $prev): $junk" } return $res }
For example, the tokens of 3x + 0.1615 - 4/11x^2 + x^10 are 3, x, +, 0.1615, -, 4/11, x^2, +, and x^10. With that sorted out, one might construct as a more numerical representation of the polynomial a dictionary mapping variable exponent to numeric coefficient. In this case, that would be 1 3 0 0.1615 2 -0.36363636363636365 10 1, and it can be computed by the following method.
method coeffdict {str} { set res [dict create] set exp 0 set coeff 1 set hasterm 0 foreach token [polynomial get $str tokens] { switch -regexp -- $token { {^[0-9]+$} { # Integer token. Always decimal. # Multiply with the coefficient, which may contain the sign already. scan $token %d val set coeff [expr {$coeff*$val}] set hasterm 1 } {^[0-9]*\.[0-9]+(?:e[+-]?[0-9]+)?$} { # Float token. Ditto. scan $token %g val set coeff [expr {$coeff*$val}] set hasterm 1 } {^[0-9]+/[1-9][0-9]*$} { # Quotient token. # Multiply by numerator, divide by denominator. scan $token %d/%d numer denom set coeff [expr {$coeff*$numer/double($denom)}] set hasterm 1 } {^x$} { # Variable. Add 1 to exponent incr exp set hasterm 1 } {^x\^[0-9]+$} { # Variable with explicit exponent. scan $token x^%d val incr exp $val set hasterm 1 } {^(\+|-)$} { if {$hasterm} then { if {[dict exists $res $exp]} then { set sum [expr {$coeff + [dict get $res $exp]}] if {$sum != 0} then { dict set res $exp $sum } else { dict unset res $exp } } else { dict set res $exp $coeff } set coeff 1 set exp 0 set hasterm 0 } if {$token eq "-"} then { set coeff [expr {-$coeff}] } } default { error "This can't happen" } } } if {$hasterm} then { if {[dict exists $res $exp]} then { set sum [expr {$coeff + [dict get $res $exp]}] if {$sum != 0} then { dict set res $exp $sum } else { dict unset res $exp } } else { dict set res $exp $coeff } } return $res }
(It's a bit messy, but parsing tends to be like that.) Notable here is that this property is computed from the tokens property value, rather than directly from the main string representation.
While the coeffdict is fully numerical, many algorithms are more comfortable working with the list of coefficients. That can be another property: the coefflist.
method coefflist {str} { set D [polynomial get $str coeffdict] set res {} foreach exp [lsort -integer [dict keys $D]] { while {[llength $res] < $exp} {lappend res 0} lappend res [dict get $D $exp] } return $res }
The coefflist of 3x + 0.1615 - 4/11x^2 + x^10 is computed to be 0.1615 3 -0.36363636363636365 0 0 0 0 0 0 0 1.
One thing that people tend to do with polynomials is however to evaluate them, and the above does not give an obvious solution for that. Hence, it might be an idea to also turn it into some piece of Tcl code. Concretely, an apply lambda might be useful. If constructing an expression on Horner form [L1 ], the ** operation can be avoided. This suggests the following hornerlambda property:
method hornerlambda {str} { set expr 0 foreach c [lreverse [polynomial get $str coefflist]] { if {$c != 0} then { if {$expr ne "0"} then { set expr "($expr)*\$x+$c" } else { set expr $c } } elseif {$expr ne "0"} then { set expr "($expr)*\$x" } } return [list x [list expr $expr] ::] } } ; # End of [stasher] command
With this, the hornerlambda of 3x + 0.1615 - 4/11x^2 + x^10 is
x {expr {((((((((((1)*$x)*$x)*$x)*$x)*$x)*$x)*$x)*$x+-0.36363636363636365)*$x+3)*$x+0.1615}} ::
Saying
apply [polynomial get {3x + 0.1615 - 4/11x^2 + x^10} hornerlambda] 0.3
just to compute the polynomial in x=0.3 is however a bit awkward. By instead defining
interp alias {} applypoly {} polynomial apply hornerlambda
one can shorten that to
applypoly {3x + 0.1615 - 4/11x^2 + x^10} 0.3
The power of such a machinery gets more apparent when one works with many polynomials. For example, let's set up the first eight Legendre polynomials [L2 ]. This can be done as
set P0 "1 " ; # With just 1, it appears there is shimmering. set P1 x set P2 {3/2 x^2 - 1/2} set P3 {5/2 x^3 - 3/2 x} set P4 {35/8 x^4 - 15/4 x^2 + 3/8} set P5 {63/8 x^5 - 35/4 x^3 + 15/8 x} set P6 {231/16 x^6 - 315/16 x^4 + 105/16 x^2 - 5/16} set P7 {429/16 x^7 - 693/16 x^5 + 315/16 x^3 - 35/16 x}
Thanks to the miracle of stashing, there is no parsing overhead for using applypoly with these; timing as
for {set n 0} {$n<=7} {incr n} { set script [list applypoly [set P$n] 0.3] puts "P${n}(0.3), i.e., \[$script\]: [time $script 100]" }
produces the output
P0(0.3), i.e., [applypoly {1 } 0.3]: 2.08798 microseconds per iteration P1(0.3), i.e., [applypoly x 0.3]: 2.15502 microseconds per iteration P2(0.3), i.e., [applypoly {3/2 x^2 - 1/2} 0.3]: 2.34157 microseconds per iteration P3(0.3), i.e., [applypoly {5/2 x^3 - 3/2 x} 0.3]: 2.44839 microseconds per iteration P4(0.3), i.e., [applypoly {35/8 x^4 - 15/4 x^2 + 3/8} 0.3]: 2.65547 microseconds per iteration P5(0.3), i.e., [applypoly {63/8 x^5 - 35/4 x^3 + 15/8 x} 0.3]: 2.69351 microseconds per iteration P6(0.3), i.e., [applypoly {231/16 x^6 - 315/16 x^4 + 105/16 x^2 - 5/16} 0.3]: 2.9555399999999996 microseconds per iteration P7(0.3), i.e., [applypoly {429/16 x^7 - 693/16 x^5 + 315/16 x^3 - 35/16 x} 0.3]: 2.97874 microseconds per iteration
This is fast enough that one can draw graphs of the functions simply by passing the polynomial form to applypoly for a very dense set of points in the interval being plotted, and even redraw the whole thing whenever the window is rescaled.
Code to draw Legendre polynomial graphstoplevel .legendre pack [canvas .legendre.c -width 400 -height 400] -expand true -fill both variable colL {red sienna salmon gold green cyan blue purple} proc drawit {c} { set wd [winfo width $c] set ht [winfo height $c] set unit [expr {(($wd<$ht ? $wd : $ht) - 20)*0.5}] set x0 [expr {double($wd)/2}] set y0 [expr {double($ht)/2}] set xunit $unit set yunit [expr {-$unit}] # Delete old contents $c delete all # Draw coordinate axes $c create line [expr {$x0 - $xunit}] $y0 [expr {$x0 + $xunit + 10}] $y0 -arrow last -fill black $c create line $x0 [expr {$y0 - $yunit}] $x0 [expr {$y0 + $yunit - 10}] -arrow last -fill black $c create text [expr {$x0 - $xunit}] $y0 -anchor n -text "-1" $c create text [expr {$x0 + $xunit}] $y0 -anchor n -text "1" $c create text $x0 [expr {$y0 - $yunit}] -anchor e -text "-1" $c create text $x0 [expr {$y0 + $yunit}] -anchor e -text "1" # Draw graphs variable colL for {set n 0} {$n<=7} {incr n} { set coordL {} for {set x -1} {$x<=1.01} {set x [expr {$x + 0.02}]} { lappend coordL [expr {$x0 + $x*$xunit}] [expr {$y0 + $yunit*[applypoly [set ::P$n] $x]}] } $c create line $coordL -fill [lindex $colL $n] } } bind .legendre.c <Configure> {drawit %W}
The following are some examples of how humans and computers may have different opinions about what constitutes a good representation of some piece of data. Humans want the ability to edit the data, whereas computers are required to use it efficiently.
The idea (abstractly): Suppose you've got some value X, and you know you'll need to know f(X) quite often, where f is some function. Then a stasher can let you compute f(X) once and then "stash" the result within the Tcl_Obj that is X, so that when subsequently you need to know f(X), computing it is "free".
The Tcl core uses this idea in a number of cases, usually when X is a piece of code written by a programmer and f is the operation of compiling that code: Tcl scripts can be bytecompiled, expressions can be bytecompiled too, and regexps compile to a finite automaton. But traditionally, you could only do that if you had implemented f in C. A stasher makes it possible to use any Tcl command as an f here, and additionally it lets you cache g(X), h(X), etc. for as many functions g, h, … as you bother to define.
(The details below are preliminary.)
Small glossary:
Stashers are created using the stasher command, which has the syntax
A stasher has a number of subcommands, the most important of which are
that returns the property of the give value, e.g.
strange get {some stupid() #computer<program>using*strange@programming<language>} bytecode
that would return the bytecode for some stupid() #computer<program>using*strange@programming<language> in the strange programming language.
When called, the get subcommand does one of two things:
Doing 1 may be expensive, but doing 2 is essentially free (time-wise).
(To be continued)
First, an example to illustrate the idea of "properties" of a value.
stasher complexnumber { # Lots of code defining properties of complex numbers in various formats } complexnumber get 3+4i real ; # => 3 complexnumber get 3+4i imag ; # => 4 complexnumber get 3+4i abs ; # => 5 complexnumber get 3+4i arg ; # => 0.9272952180016122 complexnumber get 3+4i parsed ; # => rect 3 4 complexnumber get 5cis0.92 parsed ; # => polar 5 0.92 complexnumber get 5cis0.9272952180016122 real ; # => 3.0000000000000004 complexnumber get 5cis0.9272952180016122 imag ; # => 3.9999999999999996 complexnumber get 3 real ; # => 3 complexnumber get 3 imag ; # => 0 complexnumber get 3 parsed ; # => rect 3 0
Next, an example to illustrate how stashing can make a difference for speed
stasher book { # Lots of code defining properties of "book"s and how to compute or look them up } set pptt isbn:0-13-038560-3 book get $pptt title
Some delay when book retrieves a library catalog entry for the book with ISBN 0-13-038560-3, and then returns: Practical programming in Tcl/Tk.
book get $pptt authors
No delay before returning: {Welch, Brent B.} {Hobbs, Jeffrey} {Jones, Ken}, as the full catalog entry is stashed within the $pptt value.
set pptt
Returns: isbn:0-13-038560-3. We're not automagically mutating values, we're doing things to the internal representation that supplements that value.
book get isbn:0-13-038560-3 authors
Returns {Welch, Brent B.} {Hobbs, Jeffrey} {Jones, Ken} as above, but after a delay. Though the two strings are equal, they are not stored in the same Tcl_Obj, so when book is handed one it cannot access the data stashed within the other.
(What follows is the core of Lars's report; it represents a work in progress)
Attached are the two main source files. They belong within a TEA set-up however, so I suppose it is really overdue that we sort out a better way of sharing the code.
In this round I did as Donal suggested, and implemented a command that gives full script-level control of the data stored: sesame (what else could one call a command for accessing magic stashes?). I do not intend to keep this, but (as I might have written previously) some of its subcommands /could/ be made available as unsupported subsubcommands of a stasher, if the extension is compiled with a switch to provide them.
Q: Is there some idiom, on the C side, for letting the same block of code handle both of the following:
foo bar subcommand ?arg ...? baz subcommand ?arg ...?
([for bar {*}$args] and [baz {*}$args] are supposed to be equivalent, except that error messages should reflect how each was called.) Tcl_WrongNumArgs presumes knowledge of how many arguments to keep as prefix. But I suppose one could put the code in a helper function that receives "the number of words already parsed" in an extra argument.
When I say "it's working", it means I can load the package, source fibonacci.tcl, and then do
% set seed {0 1} 0 1 % Fibonacci get $seed 400 176023680645013966468226945392411250770384383304492191886725992896575345044216019675 % sesame keys $seed 35 286 36 287 37 288 38 289 290 300 39 40 291 301 41 292 302 42 293 303 43 294 304 44 295 305 45 296 306 46 297 307 47 298 308 48 299 309 310 49 50 311 51 312 52 313 53 314 54 315 55 316 56 317 57 318 58 319 320 59 60 321 61 322 62 323 63 324 64 325 65 326 66 327 67 328 68 329 330 69 70 331 71 332 72 333 73 334 74 335 75 336 76 337 77 338 78 339 340 79 80 341 81 342 82 343 83 344 84 345 85 346 86 347 87 348 0 88 349 350 1 89 90 351 2 91 352 3 92 353 4 93 354 5 94 355 6 95 356 7 96 357 8 97 358 9 98 359 360 99 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 400 391 392 393 394 395 396 397 398 399 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 aslist 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 200 191 201 192 202 193 203 194 204 195 205 196 206 197 207 198 208 199 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 10 261 11 262 12 263 13 264 14 265 15 266 16 267 17 268 18 269 270 19 20 271 21 272 22 273 23 274 24 275 25 276 26 277 27 278 28 279 280 29 30 281 31 282 32 283 33 284 34 285 % Fibonacci get $seed 800 69283081864224717136290077681328518273399124385204820718966040597691435587278383112277161967532530675374170857404743017623467220361778016172106855838975759985190398725 % Fibonacci get $seed 1200 27269884455406270157991615313642198705000779992917725821180502894974726476373026809482509284562310031170172380127627214493597616743856443016039972205847405917634660750474914561879656763268658528092195715626073248224067794253809132219056382939163918400
without much delay. However if I restart from scratch with (the sesame subcommands are documented in stasher.c)
% sesame unstashify $seed 0 1
I then get
% Fibonacci get $seed 900 too many nested evaluations (infinite loop?)
which seems entirely reasonable.
What turn out to not work with this version is the grade.tcl example file, since the score subcommand contains an expr which causes the stash internal representation to shimmer away. This possibility was foreseen, and can be handled quite easily in a pure-C implementation. (One could also make grade.tcl work by adding a "dup" subcommand to sesame and not passing the original Tcl_Obj to the property commmands, but this is not completely robust, and anyway I think time would be better spent continuing to the pure-C implementation.)
What I did not foresee, but which occurred to me while coding, was that putting arbitrary Tcl_Objs in a stash would risk creating reference loops, leading to memory leaks. An example of this could be
stasher foo { method quine {value} {list contains $value} }
since after [foo get whatever quine], a naive stasher implementation would have created the following data structure:
Tcl_Obj: A bytes -> "whatever" | internalRep.ptrAndLongRep.ptr | -> | Stash: | refCount = 1 | propertyTable -> ... -> ("quine" entry) | -> | Tcl_Obj: | bytes -> "contains whatever" | refCount = 1 | internalRep.twoPtrValue.ptr1 | -> | List: | refCount = 1 | elemCount = 2 | elements[0] -> Tcl_Obj ("contains") | elements[1] | | +--------------+
What I in the end did to prevent this was to have the sesame poke command, which handles all storing of data in stashes, never keep a reference to the Tcl_Obj it is handed, but rather make a pure-string duplicate of it, like so:
objPtr = Tcl_NewStringObj(Tcl_GetString(objv[4]), -1);
The new object goes into the stash, where it can later shimmer over to whatever internal representation it is practical for it to have. This has the advantage that the likes of the "aslist" property from fibonacci.tcl don't need to fear accidental reuse of the stash Tcl_Obj; the code can be simplified from
stasher Fibonacci { method aslist seed {concat "" $seed} ... }
to the not quite so mysterious
stasher Fibonacci { method aslist seed {return $seed} ... }
without ill effects. It's still [eval [list lappend L1] $L2]-hard to motivate, but probably within bounds.
Is this the best one can do, though? I considered two alternatives:
set stash whatever stasher foo { method thestash {dummy} {return $::stash} } foo get $stash thestash
With such considerations, one might of course also question whether the approach I picked can be proved foolproof. My best argument for claiming that it is is that every Tcl_Obj in a stash (whether on the surface level or referenced in some number of steps) must be strictly younger than the stash Tcl_Obj, and therefore cannot be equal to it. This follows from a postulate that any Tcl_Obj that some setFromAnyProc make use of in creating the internal representation for another Tcl_Obj are younger than the full Tcl_Obj. I think this holds, but I suppose it could be violated if some setFromAnyProc were to apply aggressive reuse of Tcl_Objs. (Possibly the bytecode compiler could be troublesome here.)
What else? I declared the STASHER_ID and epoch to be unsigned long since they're both just supposed to count up from a starting point. The Tcl APIs only seem to provide for getting signed values out of Tcl_Objs, however. Since the numeric values really aren't important, I'm simply ignoring the signedness mismatch, but is there some idiom for handling it robustly?
/* * stasher.c -- * * This file implements a Tcl interface to let scripts * stash data in the internal representations of objects. * * Copyright (c) 2011 Lars Hellstr\"om * * See the file "license.terms" for information on usage and redistribution * of this file, and for a DISCLAIMER OF ALL WARRANTIES. * */ #include <tcl.h> /* * #include <stdio.h> * #include <stdlib.h> * #include <string.h> * #include "sample.h" */ /* * FILL IN: Some kind of introduction. */ /* * A Stash is the struct that keeps track of the properties that have been * stashed in the internal representation of a stash Tcl_Obj. Most if it * is a Tcl_HashTable named propertyTable. */ typedef struct Stash { unsigned long epoch; /* Definition epoch of properties. */ int refCount; /* When 0 this struct can be freed. */ Tcl_HashTable propertyTable; } Stash; /* * The STASH macro accesses the pointer to the Stash of a Tcl_Obj. * * The STASHER_ID macro accesses the identity of the stasher that * set up this stash. * * The STASH_EPOCH macro accesses the epoch of the Stash of a Tcl_Obj. * * The STASH_TABLE macro produces a pointer to the propertyTable * of the Stash of a Tcl_Obj. */ #define STASH(objPtr) \ ((Stash*)((objPtr)->internalRep.ptrAndLongRep.ptr)) #define STASHER_ID(objPtr) ((objPtr)->internalRep.ptrAndLongRep.value) #define STASH_EPOCH(objPtr) (STASH(objPtr) -> epoch) #define STASH_TABLE(objPtr) (&(STASH(objPtr) -> propertyTable)) /* Let's start with cleaning up after ourselves. */ /* *---------------------------------------------------------------------- * * WipeStash -- * * Removes all entries from the propertyTable of a Stash struct, * releasing all references these have held. * * Results: * None. * * Side effects: * May do Tcl_DecrRefCount on various Tcl_Objs. * *---------------------------------------------------------------------- */ void WipeStash (Stash *ptr) { Tcl_HashEntry *entryPtr; Tcl_Obj *objPtr; Tcl_HashSearch search; entryPtr = Tcl_FirstHashEntry(&(ptr->propertyTable),&search); while (entryPtr != NULL) { objPtr = (Tcl_Obj*) Tcl_GetHashValue(entryPtr); if (objPtr != NULL) { Tcl_DecrRefCount(objPtr); } Tcl_DeleteHashEntry(entryPtr); /* Manpage says deleting an entry returned by search is OK. */ entryPtr = Tcl_NextHashEntry(&search); } } /* *---------------------------------------------------------------------- * * FreeStashObj -- * * The freeIntRepProc of the "stash" Tcl_ObjType. * * Results: * None. * * Side effects: * May do Tcl_DecrRefCount on various Tcl_Objs. * *---------------------------------------------------------------------- */ static void FreeStashObj (Tcl_Obj *objPtr) { STASH(objPtr)->refCount --; if (STASH(objPtr)->refCount == 0) { WipeStash(STASH(objPtr)); Tcl_DeleteHashTable(STASH_TABLE(objPtr)); Tcl_Free((void*)STASH(objPtr)); } } /* *---------------------------------------------------------------------- * * DuplicateStashObj -- * * The dupIntRepProc of the "stash" Tcl_ObjType. * * Results: * None. * * Discussion: * An explicit dupIntRepProc is needed, since the default copying * of the internalRep would copy a pointer without updating * reference counts accordingly. There are however two possible * approaches duplicating stash Tcl_Objs: * * 1. Copy the internalRep, increment the Stash refcount. * * 2. Invalidate the internalRep of the copy. * * This function implements approach 1, but at the moment there is * no clear reason why approach 2 would be inferior, so this * decision may end up being reversed. * *---------------------------------------------------------------------- */ static void DuplicateStashObj (Tcl_Obj *srcPtr, Tcl_Obj *copyPtr) { copyPtr->typePtr = srcPtr->typePtr; copyPtr->internalRep = srcPtr->internalRep; STASH(srcPtr)->refCount ++; } /* * Now the elements of the Tcl_ObjType struct can be given. * * The updateStringProc is NULL by design: Tcl_Objs of this type should * never be without a string representation. * * The setFromAnyProc is also NULL by design: There is no point in just * converting a Tcl_Obj to a stash -- you need a specific stasher to * fill that stash with something useful. */ static const Tcl_ObjType stasherObjType = { "stash", /* name */ FreeStashObj, /* freeIntRepProc */ DuplicateStashObj, /* dupIntRepProc */ NULL, /* updateStringProc */ NULL /* setFromAnyProc */ }; /* * The following macros can be used to test things about stashes. * * IS_STASH_OF would be what a stasher normally uses when it is handed * a Tcl_Obj. Other stashers' stashes should be treaded with the same * distance as Tcl_Objs of some completely different type. * * IS_STASH performs the basic test of whether an item is a stash at * all. */ #define IS_STASH(objPtr) ((objPtr)->typePtr == &stasherObjType) #define IS_STASH_OF( objPtr, stasherId ) ( \ IS_STASH(objPtr) && (STASHER_ID(objPtr) == (stasherId)) ) /* *---------------------------------------------------------------------- * * StashifyObj -- * * Turns a Tcl_Obj into a stash. Any existing internalRep will * shimmer away, so this will wipe an existing stash even if it * had the right stasherId and epoch. * * Results: * None. * * Side effects: * None beyond what follows from the main effect. * *---------------------------------------------------------------------- */ void StashifyObj ( Tcl_Obj *objPtr, unsigned long stasherId, unsigned long epoch) { Stash *stashPtr; /* * Ensure a string rep exists. */ Tcl_GetString(objPtr); /* * Get rid of any existing internal rep. */ if (objPtr->typePtr != NULL && objPtr->typePtr->freeIntRepProc != NULL) { objPtr->typePtr->freeIntRepProc(objPtr); } /* * Allocate and initialise the stash. */ stashPtr = (Stash*) Tcl_Alloc(sizeof(Stash)); stashPtr->epoch = epoch; stashPtr->refCount = 1; Tcl_InitHashTable(&(stashPtr->propertyTable), TCL_STRING_KEYS); objPtr->typePtr = &stasherObjType; objPtr->internalRep.ptrAndLongRep.ptr = stashPtr; STASHER_ID(objPtr) = stasherId; } /* * Discussion: * Using string keys rather than Tcl_Obj keys for the hash may seem * strange, but storing Tcl_Objs there could lead to situations * where entries of the propertyTable of a Stash struct holds * references to the Tcl_Obj it is the stash of. Such circular * references would break the refcounted memory management scheme, * causing it to leak memory. */ /* *---------------------------------------------------------------------- * * PokeProperty -- * * Turns a Tcl_Obj into a stash. Any existing internalRep will * shimmer away, so this will wipe an existing stash even if it * had the right stasherId and epoch. * * Results: * None. * * Side effects: * None beyond what follows from the main effect. * *---------------------------------------------------------------------- */ /* *---------------------------------------------------------------------- * * Sesame_Cmd -- * * Implements the "sesame" command, which gives script level * access to a stash (in complete violation of EIAS). * * The subcommands of this command are: * * sesame peek $stashObj $key * Return the value associated with $key in the $stashObj, * or throw an error if there is no such value. An error * is also thrown if $stashObj is not a stash. * * sesame poke $stashObj $key $value * Associate $key with $value in the $stashObj. An error * is thrown if $stashObj is not a stash. * * sesame clear $stashObj $key * Remove the entry for $key in the stash of $stashObj. * Returns boolean true uf there was such an entry, and * false otherwise. An error is thrown if $stashObj is not * a stash. * * sesame reserve $stashObj $key * Associate $key with a NULL pointer in the propertyTable * of the $stashObj. Returns boolean false if the entry * already existed (in which case the old value is freed), * and boolean true if the entry is created. An error is * thrown if $stashObj is not a stash. * * sesame status $stashObj $key * Returns boolean true if a value is associated with $key * in the stash of $stashObj. Returns "pending" if a $key * entry exists in the table, but it is not associated with * any value. Returns boolean false if no $key entry exists in * the table. An error is thrown if $stashObj is not a stash. * * sesame keys $stashObj * Return list of keys in the Stash of the $stashObj. An * error is thrown if $stashObj is not a stash. * * sesame isstash $obj * Returns boolean true if the $obj is a stash, and boolean * false otherwise. * * sesame stasher $stashObj ?$id? * If an $id is not included, then return $stashObj's * stasherId. If an $id is given, then set $stashObj's stasherId * to this value and return an empty string. An error is thrown * if $stashObj is not a stash. * * sesame epoch $stashObj ?$epoch? * If an $epoch is not included, then return $stashObj's * epoch. If an $epoch is given, then set $stashObj's epoch * to this value and return an empty string. An error is thrown * if $stashObj is not a stash. * * sesame wipe $stashObj * Remove all entries in the stash of $stashObj. * An error is thrown if $stashObj is not a stash. * * sesame stashify $obj $id $epoch * Turn the internal representation of $obj into a stash * with the specified stasherId and epoch. Returns the $obj. * * sesame unstashify $obj * Free the internal representation (if any) of the $obj. * * * Results: * A standard Tcl result * * Side effects: * Several subcommands modify the internal representation of * the Tcl_Obj being operated upon. * *---------------------------------------------------------------------- */ int Sesame_Cmd( ClientData dummy, /* Not used. */ Tcl_Interp *interp, /* Current interpreter. */ int objc, /* Number of arguments. */ Tcl_Obj *CONST objv[]) /* Argument objects. */ { int index; /* For parsing subcommand. */ Tcl_HashEntry *entryPtr; /* Entry operated upon. */ Tcl_Obj *objPtr; /* Scratch Tcl_Obj pointer. */ int new; /* For newPtr of Tcl_CreateHashEntry. */ /* * This list of constants should match the subcommands string array * below. */ enum subcommands { SCMD_CLEAR, SCMD_EPOCH, SCMD_ISSTASH, SCMD_KEYS, SCMD_PEEK, SCMD_POKE, SCMD_RESERVE, SCMD_STASHER, SCMD_STASHIFY, SCMD_STATUS, SCMD_UNSTASHIFY,SCMD_WIPE }; static CONST char *subcmdNames[] = { "clear", "epoch", "isstash", "keys", "peek", "poke", "reserve", "stasher", "stashify", "status", "unstashify", "wipe", NULL }; if (objc < 2) { Tcl_WrongNumArgs(interp, 1, objv, "subcommand ?arg ...?"); return TCL_ERROR; } if (Tcl_GetIndexFromObj(interp, objv[1], subcmdNames, "subcommand", 0, &index) != TCL_OK) { return TCL_ERROR; } if (objc < 3) { Tcl_WrongNumArgs(interp, 2, objv, "tclobj ?arg ...?"); return TCL_ERROR; } switch ((enum subcommands) index) { case SCMD_STASHIFY: if (objc != 5) { Tcl_WrongNumArgs(interp, 2, objv, "tclobj stasherId epoch"); return TCL_ERROR; } else { long id, epoch; /* Parsed arguments. * FIXME: Should be unsigned. * */ if (Tcl_GetLongFromObj(interp, objv[3], &id) != TCL_OK) { return TCL_ERROR; } if (Tcl_GetLongFromObj(interp, objv[4], &epoch) != TCL_OK) { return TCL_ERROR; } StashifyObj(objv[2], id, epoch); Tcl_SetObjResult(interp, objv[2]); return TCL_OK; } case SCMD_ISSTASH: if (objc != 3) { Tcl_WrongNumArgs(interp, 2, objv, "tclobj"); return TCL_ERROR; } Tcl_SetObjResult(interp, Tcl_NewBooleanObj(IS_STASH(objv[2])) ); return TCL_OK; case SCMD_UNSTASHIFY: if (objc != 3) { Tcl_WrongNumArgs(interp, 2, objv, "tclobj"); return TCL_ERROR; } /* * Ensure a string rep exists, then get rid of any existing * internal rep. */ Tcl_GetString(objv[2]); if (objv[2]->typePtr != NULL && objv[2]->typePtr->freeIntRepProc != NULL) { objv[2]->typePtr->freeIntRepProc(objv[2]); } objv[2]->typePtr = NULL; Tcl_SetObjResult(interp, objv[2]); return TCL_OK; default: ; /* Silence silly warning. */ } /* * Remaining subcommands all require objv[2] to be a stash already, * so make a unified check for that here. */ if (!IS_STASH(objv[2])) { Tcl_SetResult(interp, "This subcommand only operates on stashes", TCL_STATIC); Tcl_SetErrorCode(interp, "STASHER", "NOT_A_STASH", NULL); return TCL_ERROR; } switch ((enum subcommands) index) { case SCMD_CLEAR: if (objc != 4) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj key"); return TCL_ERROR; } entryPtr = Tcl_FindHashEntry(STASH_TABLE(objv[2]), Tcl_GetString(objv[3])); if (entryPtr != NULL) { objPtr = (Tcl_Obj*) Tcl_GetHashValue(entryPtr); if (objPtr != NULL) { Tcl_DecrRefCount(objPtr); } Tcl_DeleteHashEntry(entryPtr); } Tcl_SetObjResult(interp, Tcl_NewBooleanObj(entryPtr!=NULL) ); return TCL_OK; case SCMD_RESERVE: if (objc != 4) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj key"); return TCL_ERROR; } entryPtr = Tcl_CreateHashEntry(STASH_TABLE(objv[2]), Tcl_GetString(objv[3]), &new); if (!new) { objPtr = (Tcl_Obj*) Tcl_GetHashValue(entryPtr); if (objPtr != NULL) { Tcl_DecrRefCount(objPtr); } } Tcl_SetHashValue(entryPtr, (ClientData)NULL); Tcl_SetObjResult(interp, Tcl_NewBooleanObj(new!=0) ); return TCL_OK; case SCMD_STATUS: if (objc != 4) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj key"); return TCL_ERROR; } entryPtr = Tcl_FindHashEntry(STASH_TABLE(objv[2]), Tcl_GetString(objv[3])); if (entryPtr == NULL) { Tcl_SetObjResult(interp, Tcl_NewBooleanObj(0) ); } else { objPtr = (Tcl_Obj*) Tcl_GetHashValue(entryPtr); if (objPtr != NULL) { Tcl_SetObjResult(interp, Tcl_NewBooleanObj(1) ); } else { Tcl_SetResult(interp, "pending", TCL_STATIC); } } return TCL_OK; case SCMD_PEEK: if (objc != 4) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj key"); return TCL_ERROR; } entryPtr = Tcl_FindHashEntry(STASH_TABLE(objv[2]), Tcl_GetString(objv[3])); if (entryPtr == NULL) { Tcl_AppendResult(interp, "No entry for key: ", Tcl_GetString(objv[3]), NULL); Tcl_SetErrorCode(interp, "STASHER", "entry missing", NULL); return TCL_ERROR; } objPtr = (Tcl_Obj*) Tcl_GetHashValue(entryPtr); if (objPtr == NULL) { Tcl_AppendResult(interp, "No value for key: ", Tcl_GetString(objv[3]), NULL); Tcl_SetErrorCode(interp, "STASHER", "entry reserved", NULL); return TCL_ERROR; } Tcl_SetObjResult(interp, objPtr); return TCL_OK; case SCMD_POKE: if (objc != 5) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj key value"); return TCL_ERROR; } /* * First release old value, if there is one. */ entryPtr = Tcl_CreateHashEntry(STASH_TABLE(objv[2]), Tcl_GetString(objv[3]), &new); if (!new) { objPtr = (Tcl_Obj*) Tcl_GetHashValue(entryPtr); if (objPtr != NULL) { Tcl_DecrRefCount(objPtr); } } /* * Then create a /copy/ (pure string) of the new value, and put * /that/ in the stash. * * The reason not to use the original Tcl_Obj, and not make use * of any internal representation it may have either, is that * this could potentionally end up with the stash containing * a reference to itself. */ objPtr = Tcl_NewStringObj(Tcl_GetString(objv[4]), -1); Tcl_SetHashValue(entryPtr, (ClientData)objPtr); Tcl_IncrRefCount(objPtr); return TCL_OK; case SCMD_KEYS: if (objc != 3) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj"); return TCL_ERROR; } else { Tcl_HashSearch search; /* * The by far easiest way to construct the necessary result * seems to be to use Tcl_AppendElement, even if that is * deprecated. */ entryPtr = Tcl_FirstHashEntry(STASH_TABLE(objv[2]),&search); while (entryPtr != NULL) { Tcl_AppendElement(interp, Tcl_GetHashKey(STASH_TABLE(objv[2]), entryPtr)); entryPtr = Tcl_NextHashEntry(&search); } return TCL_OK; } case SCMD_STASHER: if (objc > 4) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj ?id?"); return TCL_ERROR; } if (objc == 3) { Tcl_SetObjResult(interp, Tcl_NewLongObj(STASHER_ID(objv[2]))); return TCL_OK; } return Tcl_GetLongFromObj(interp, objv[3], &STASHER_ID(objv[2])); case SCMD_EPOCH: if (objc > 4) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj ?epoch?"); return TCL_ERROR; } if (objc == 3) { Tcl_SetObjResult(interp, Tcl_NewLongObj(STASH_EPOCH(objv[2]))); return TCL_OK; } return Tcl_GetLongFromObj(interp, objv[3], &STASH_EPOCH(objv[2])); case SCMD_WIPE: if (objc != 3) { Tcl_WrongNumArgs(interp, 2, objv, "stashobj"); return TCL_ERROR; } WipeStash(STASH(objv[2])); return TCL_OK; default: ; /* Silence silly warning. */ } Tcl_SetResult(interp, "This can't happen", TCL_STATIC); return TCL_ERROR; } /* *---------------------------------------------------------------------- * * Stasher_Init -- * * Initialize the new package. * * Results: * A standard Tcl result * * Side effects: * The stasher package is created. * One new command "sesame" is added to the Tcl interpreter. * *---------------------------------------------------------------------- */ int Stasher_Init(Tcl_Interp *interp) { /* * This may work with 8.0, but we are using strictly stubs here, * which requires 8.1. */ if (Tcl_InitStubs(interp, "8.1", 0) == NULL) { return TCL_ERROR; } if (Tcl_PkgRequire(interp, "Tcl", "8.1", 0) == NULL) { return TCL_ERROR; } if (Tcl_PkgProvide(interp, PACKAGE_NAME "::sesame", PACKAGE_VERSION) != TCL_OK) { return TCL_ERROR; } Tcl_CreateObjCommand(interp, "sesame", (Tcl_ObjCmdProc *) Sesame_Cmd, (ClientData)NULL, (Tcl_CmdDeleteProc *)NULL); return TCL_OK; }
# tcl-stasher.tcl # # Tcl implementation of stasher API on top of sesame command. package require TclOO package require stasher::sesame 0.2 package provide stasher 0.2 oo::class create stasher_properties { unexport destroy # Want to start out with an empty list of properties # constructor {args} { # oo::objdefine [self object] {*}$args # } } namespace eval stasher { proc nsset {ns var args} { set ${ns}::${var} {*}$args } namespace eval define {} proc setup_define {obj} { namespace delete define namespace eval define {} foreach cmd [info commands ::oo::objdefine::*] { set tail [namespace tail $cmd] interp alias {} ::stasher::define::$tail {} \ ::oo::objdefine $obj $tail } interp alias {} ::stasher::define::vset {} \ ::stasher::nsset [info object namespace $obj] } } oo::class create stasher_class { variable propobj epoch id self variable last_id self method newid {} { if {[info exists last_id]} then { return [incr last_id] } else { return [set last_id 0] } } constructor {args} { set propobj [stasher_properties new] set id [stasher_class newid] set epoch 0 my define {*}$args oo::objdefine [self object] forward property $propobj # oo::objdefine [self object] forward define oo::objdefine $propobj } method define {args} { if {[llength $args] == 1} then { ::stasher::setup_define $propobj tailcall namespace eval ::stasher::define [lindex $args 0] } elseif {[lindex $args 0] eq "vset"} then { tailcall ::stasher::nsset [info object namespace $propobj]\ {*}[lrange $args 1 end] } else { tailcall ::oo::objdefine $propobj {*}$args } } destructor { rename $propobj "" } method get {value property} { if {![sesame isstash $value]} then { sesame stashify $value $id $epoch } elseif {[sesame stasher $value] != $id || [sesame epoch $value] != $epoch} then { sesame wipe $value sesame stasher $value $id sesame epoch $value $epoch } switch -- [sesame status $value $property] "1" { return [sesame peek $value $property] } "pending" { return -code return -errorCode {STASHER "circular definition"}\ "Circular definition of property: $property" } sesame reserve $value $property set code [catch {$propobj $property $value} res opts] if {$code} then { sesame clear $value $property return -options $opts $res } else { sesame poke $value $property $res return [sesame peek $value $property] } } method eval {property value} { tailcall if 1 then [my get $value $property] # Or not so tricky: # uplevel 1 [my get $value $property] } method bump {} { incr epoch } } interp alias {} stasher {} stasher_class create
arjen - 2011-06-29 02:22:18
Glancing at the C source code, I would say that most of these functions should be declared static, but I have merely glanced at them ;). Given this code, we can use it as a separate package, right?
Lars H, 2011-07-20: Yes, most of them should be static. I primarily didn't think about that, and secondarily was under the impression that static is the default, but now upon checking the manual I see that the default is actually extern. Will fix in version 0.3.1. And yes, it is a separate package.