This page is about (the features that should be produced within) an ongoing project of the [Google Summer Of Code] 2011. ---- The idea (abstractly): Suppose you've got some value ''X'', and you know you'll need to know f(''X'') quite often, where f is some function. Then a [stasher] can let you compute f(''X'') '''once''' and then "stash" the result within the [Tcl_Obj] that is ''X'', so that when subsequently you need to know f(''X''), computing it is "free". The Tcl core uses this idea in a number of cases, usually when ''X'' is a piece of code written by a programmer and ''f'' is the operation of compiling that code: Tcl scripts can be bytecompiled, [expr]essions can be bytecompiled too, and [regexp]s compile to a finite automaton. But traditionally, you could only do that if you had implemented ''f'' in [C]. A [stasher] makes it possible to use any Tcl command as an ''f'' here, and additionally it lets you cache g(''X''), h(''X''), etc. for as many functions ''g'', ''h'', … as you bother to define. **Interface** (The details below are preliminary.) Small glossary: stash: Where cached values are stored. Technically this is going to reside within the internal representation of a [Tcl_Obj], so that when the original value is forgotten, everything that was cached for it goes away too. ''stasher'': A command used to access a stash. '''stasher''': The command used to create ''stasher''s. property: A computed value that can be cached within a stash. Stashers are created using the '''stasher''' command, which has the syntax : '''stasher''' ''name'' ''setup-script'' A ''stasher'' has a number of subcommands, the most important of which are : ''stasher'' '''get''' ''value'' ''property'' that returns the ''property'' of the give ''value'', e.g. strange get {some stupid() #computerusing*strange@programming} bytecode that would return the bytecode for `some stupid() #computerusing*strange@programming` in the `strange` programming language. When called, the '''get''' subcommand does one of two things: 1. It passes the ''value'' on to the '''property''' ''property'' subsubcommand, and uses whatever it returns. 2. It may return some result obtained previously when doing 1. Doing 1 may be expensive, but doing 2 is essentially free (time-wise). ''(To be continued)'' ***Example*** First, an example to illustrate the idea of "properties" of a value. stasher complexnumber { # Lots of code defining properties of complex numbers in various formats } complexnumber get 3+4i real ; # => 3 complexnumber get 3+4i imag ; # => 4 complexnumber get 3+4i abs ; # => 5 complexnumber get 3+4i arg ; # => 0.9272952180016122 complexnumber get 3+4i parsed ; # => rect 3 4 complexnumber get 5cis0.92 parsed ; # => polar 5 0.92 complexnumber get 5cis0.9272952180016122 real ; # => 3.0000000000000004 complexnumber get 5cis0.9272952180016122 imag ; # => 3.9999999999999996 complexnumber get 3 real ; # => 3 complexnumber get 3 imag ; # => 0 complexnumber get 3 parsed ; # => rect 3 0 Next, an example to illustrate how stashing can make a difference for speed stasher book { # Lots of code defining properties of "book"s and how to compute or look them up } set pptt isbn:0-13-038560-3 book get $pptt title Some delay when '''book''' retrieves a library catalog entry for the book with ISBN 0-13-038560-3, and then returns: `Practical programming in Tcl/Tk`. book get $pptt authors No delay before returning: `{Welch, Brent B.} {Hobbs, Jeffrey} {Jones, Ken}`, as the full catalog entry is stashed within the $pptt value. set pptt Returns: `isbn:0-13-038560-3`. We're not automagically mutating values, we're doing things to the internal representation that supplements that value. book get isbn:0-13-038560-3 authors Returns `{Welch, Brent B.} {Hobbs, Jeffrey} {Jones, Ken}` as above, but after a delay. Though the two strings are equal, they are not stored in the same [Tcl_Obj], so when '''book''' is handed one it cannot access the data stashed within the other. ---- **Status: 28-Jun-2011** ''(What follows is the core of Lars's report; it represents a work in progress)'' Attached are the two main source files. They belong within a TEA set-up however, so I suppose it is really overdue that we sort out a better way of sharing the code. In this round I did as Donal suggested, and implemented a command that gives full script-level control of the data stored: sesame (what else could one call a command for accessing magic stashes?). I do not intend to keep this, but (as I might have written previously) some of its subcommands /could/ be made available as unsupported subsubcommands of a stasher, if the extension is compiled with a switch to provide them. Q: Is there some idiom, on the C side, for letting the same block of code handle both of the following: ====== foo bar subcommand ?arg ...? baz subcommand ?arg ...? ====== (`[for bar {*}$args]` and `[baz {*}$args]` are supposed to be equivalent, except that error messages should reflect how each was called.) [Tcl_WrongNumArgs] presumes knowledge of how many arguments to keep as prefix. But I suppose one could put the code in a helper function that receives "the number of words already parsed" in an extra argument. When I say "it's working", it means I can load the package, source fibonacci.tcl, and then do ====== % set seed {0 1} 0 1 % Fibonacci get $seed 400 176023680645013966468226945392411250770384383304492191886725992896575345044216019675 % sesame keys $seed 35 286 36 287 37 288 38 289 290 300 39 40 291 301 41 292 302 42 293 303 43 294 304 44 295 305 45 296 306 46 297 307 47 298 308 48 299 309 310 49 50 311 51 312 52 313 53 314 54 315 55 316 56 317 57 318 58 319 320 59 60 321 61 322 62 323 63 324 64 325 65 326 66 327 67 328 68 329 330 69 70 331 71 332 72 333 73 334 74 335 75 336 76 337 77 338 78 339 340 79 80 341 81 342 82 343 83 344 84 345 85 346 86 347 87 348 0 88 349 350 1 89 90 351 2 91 352 3 92 353 4 93 354 5 94 355 6 95 356 7 96 357 8 97 358 9 98 359 360 99 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 400 391 392 393 394 395 396 397 398 399 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 aslist 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 200 191 201 192 202 193 203 194 204 195 205 196 206 197 207 198 208 199 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 10 261 11 262 12 263 13 264 14 265 15 266 16 267 17 268 18 269 270 19 20 271 21 272 22 273 23 274 24 275 25 276 26 277 27 278 28 279 280 29 30 281 31 282 32 283 33 284 34 285 % Fibonacci get $seed 800 69283081864224717136290077681328518273399124385204820718966040597691435587278383112277161967532530675374170857404743017623467220361778016172106855838975759985190398725 % Fibonacci get $seed 1200 27269884455406270157991615313642198705000779992917725821180502894974726476373026809482509284562310031170172380127627214493597616743856443016039972205847405917634660750474914561879656763268658528092195715626073248224067794253809132219056382939163918400 ====== without much delay. However if I restart from scratch with (the [sesame] subcommands are documented in stasher.c) ====== % sesame unstashify $seed 0 1 ====== I then get ====== % Fibonacci get $seed 900 too many nested evaluations (infinite loop?) ====== which seems entirely reasonable. What turn out to not work with this version is the grade.tcl example file, since the score subcommand contains an [expr] which causes the stash internal representation to shimmer away. This possibility was foreseen, and can be handled quite easily in a pure-C implementation. (One could also make grade.tcl work by adding a "dup" subcommand to sesame and not passing the original Tcl_Obj to the property commmands, but this is not completely robust, and anyway I think time would be better spent continuing to the pure-C implementation.) What I did not foresee, but which occurred to me while coding, was that putting arbitrary Tcl_Objs in a stash would risk creating reference loops, leading to memory leaks. An example of this could be ====== stasher foo { method quine {value} {list contains $value} } ====== since after `[foo get whatever quine]`, a naive stasher implementation would have created the following data structure: ======none Tcl_Obj: A bytes -> "whatever" | internalRep.ptrAndLongRep.ptr | -> | Stash: | refCount = 1 | propertyTable -> ... -> ("quine" entry) | -> | Tcl_Obj: | bytes -> "contains whatever" | refCount = 1 | internalRep.twoPtrValue.ptr1 | -> | List: | refCount = 1 | elemCount = 2 | elements[0] -> Tcl_Obj ("contains") | elements[1] | | +--------------+ ====== What I in the end did to prevent this was to have the [sesame poke] command, which handles all storing of data in stashes, never keep a reference to the Tcl_Obj it is handed, but rather make a pure-string duplicate of it, like so: ======c objPtr = Tcl_NewStringObj(Tcl_GetString(objv[4]), -1); ====== The new object goes into the stash, where it can later shimmer over to whatever internal representation it is practical for it to have. This has the advantage that the likes of the "aslist" property from fibonacci.tcl don't need to fear accidental reuse of the stash Tcl_Obj; the code can be simplified from ====== stasher Fibonacci { method aslist seed {concat "" $seed} ... } ====== to the not quite so mysterious ====== stasher Fibonacci { method aslist seed {return $seed} ... } ====== without ill effects. It's still [eval [list lappend L1] $L2]-hard to motivate, but probably within bounds. Is this the best one can do, though? I considered two alternatives: * Don't pass the actual stash [Tcl_Obj] to the property methods, but rather some kind of "alias" that contains information to the effect of "by the way, I represent the stash passed to this stasher on its Nth recursive invokation". Since this is not a full reference, there would be no harm in putting one of these in a stash, but they could still be used to locate stashed data (even if not as quickly as the stash itself) while a property is being computed. : The problem with these is that they only protect against reference loops if the property methods are pure functions, which one cannot easily enforce. It would be very easy to go ====== set stash whatever stasher foo { method thestash {dummy} {return $::stash} } foo get $stash thestash ====== : to create a reference loop. So I'd say this is no good. * An alternative is to check the value computed for a property for having an "unsafe" Tcl_ObjType, and only make a clean copy when necessary to avoid creating a loop. This may seem more attractive, as it would usually preserve the native (i.e., internal) representations of Tcl_Objs, even if it would come at the price of having to `#include ` to examine the elements of lists and dicts. I considered for a while whether such "cleaning" of Tcl_Objs could be split off as a separate package. : However, this would still not be sufficient. That a Tcl_Obj is not a stash when it is returned by a property method is no guarantee that it will not later shimmer over to such. While this is closer to being safe, I do not think it would be foolproof. With such considerations, one might of course also question whether the approach I picked can be proved foolproof. My best argument for claiming that it is is that every Tcl_Obj in a stash (whether on the surface level or referenced in some number of steps) must be strictly younger than the stash Tcl_Obj, and therefore cannot be equal to it. This follows from a postulate that any Tcl_Obj that some setFromAnyProc make use of in creating the internal representation for another Tcl_Obj are younger than the full Tcl_Obj. I think this holds, but I suppose it could be violated if some setFromAnyProc were to apply aggressive reuse of Tcl_Objs. (Possibly the bytecode compiler could be troublesome here.) What else? I declared the STASHER_ID and epoch to be unsigned long since they're both just supposed to count up from a starting point. The Tcl APIs only seem to provide for getting signed values out of Tcl_Objs, however. Since the numeric values really aren't important, I'm simply ignoring the signedness mismatch, but is there some idiom for handling it robustly? <>Internals