unshared value idiom

the unshared value idiom is a technique for unsharing a Tcl_Obj that so that Tcl doesn't have to copy it in order to modify it.

Description

Tcl stores a value in a Tcl_Obj, and when a value is passed as an argument to a command, that Tcl_Obj is bound to the corresponding variable in the scope of the command. If the Tcl_Obj is also bound to a variable in the caller, then it is shared, and unavailable to hold any new value the command may produce. On the other hand, it the Tcl_Obj is unshared, the command might be able to reuse that Tcl_Obj instead of creating a new one. For example, if the value of the Tcl_Obj is a large list and is unshared, a command like lreplace might save a lot of time and memory by reusing that object.

So how does one go about making sure the Tcl_Obj that is passed to the command is unshared? The answer is rebind the variable holding the Tcl_Obj in the caller to some other value. One trick to accomplish this in one line is to set the variable to the empty string in a command substitution at the end of the value being passed:

set myvalue [somecommand $myvalue[set myvalue {}]]

Since substituting the empty string is a no-op as far as the word is concerned, the side effect of rebinding the variable to a new value is efficiently achieved.

Here's an lpop command which rebinds the variable to a useful string instead of binding to the empty string:

set item [lindex $list end]
set list [lreplace $list [set list end] end]

Commentary

AMG: The above examples miss the mark.

[lappend] has no need for this idiom because its argument is the name of a variable, not the value, so the act of invoking [lappend] does not increment the Tcl_Obj's refcount. If [lappend] is called on a variable whose value is unshared, [lappend] is able to modify it in place, no idiom required. Yet if [lappend] is called on a variable whose value is shared (i.e. refcount>1), a copy must be done, and that is true even in situations where this idiom is used correctly.

The [somecommand] example also does not work since [somecommand] is unable to modify the caller variable named "myvalue" on account of not knowing its name. The issue is not with commands being able to modify variables whose values are passed to them. Rather, the issue is with commands that operate on values and return new values which the caller wishes to store back into the original variable.

proc somecommand {value} {lappend value xxx}        ;# copy happens only if $value is shared
set myvalue [somecommand $myvalue]                  ;# copy guaranteed to happen
set myvalue [somecommand $myvalue[set myvalue {}]]  ;# copy may be inhibited

The last line of the above example avoids copying if and only if the Tcl_Obj stored in myvalue is initially unshared. Given the above sequence, it will be, since the second line makes a copy.

It may be useful to explain how this idiom actually works. The locution $myvalue[set myvalue {}] evaluates to the current value of the variable "myvalue" concatenated with empty string (a no-op), where empty string is the value returned by [set myvalue {}], since two-argument [set] returns its second argument (*). Of course [set] has the side effect of changing the variable's value. In effect, the value is taken from the variable "myvalue" when it is given to [somecommand] via the argument list.

(*) unless a [trace] overwrites the variable's value, in which case [set] returns that new value

Thus I prefer to call this idiom "take":

proc take {varName} {
    upvar 1 $varName var
    return $var[set var {}]
}
set myvalue [somecommand [take myvalue]]

Proof that it works (notice the refcount):

% set myvalue [list [list this is] [list my value]]; list
% tcl::unsupported::representation $myvalue
value is a list with a refcount of 2, object pointer at 00000000029A9B90, internal representation 0000000002994C90:0000000000000000, no string representation
% tcl::unsupported::representation [take myvalue]
value is a list with a refcount of 1, object pointer at 00000000029A9B90, internal representation 0000000002994C90:0000000000000000, no string representation

Since this was interactively typed, the funny business with [list] was to avoid the value also being stored in the history.

A more featureful [take] could accept a second argument defaulted to empty string. However, I don't see the utility of this because the whole point of the exercise is to liberate the Tcl_Obj from its variable in anticipation of said variable being assigned to immediately after the return of the command being passed said Tcl_Obj to maximize the odds of the Tcl_Obj being unshared, so the command can work as efficiently as possible.

proc take {varName {val {}}} {
    upvar 1 $varName var
    return $var[set var $val]
}

That earlier sentence is really long and really important, so I'll repeat it clause by clause:

  • The whole point of the exercise is to
  • Liberate the Tcl_Obj from its variable
  • In anticipation of said variable being assigned to
  • Immediately after the return of
  • The command being passed said Tcl_Obj
  • To maximize the odds of the Tcl_Obj being unshared,
  • So the command can work as efficiently as possible

Regarding your lpop example:

set item [lindex $list end]
set list [lreplace $list [set list end] end]

There's no benefit to writing the idiom differently, though it is a valid example of the underlying concept. Just say:

set item [lindex $list end]
set list [lreplace $list[set list {}] end end]

I'm confused as to why you choose to name the variable "myvalue" rather than "myvariable" or "myvar" or similar. In this particular discussion, it's paramount that the distinction between variables and values be highlighted, not obfuscated.

PYK 2015-03-14: AMG, thanks for the great feedback and explanations. I've modified The description based on it. If I'd been thinking more clearly this morning, I wouldn't have used lappend as the exemple. The only benefit to writing the lpop differently is that it's a little more concise, and illustrates that there's nothing special about using the empty string. I used $myvalue instead of $myvar to attempt to highlight that semantically, it's the value of the variable that's passed, not the variable itself, though I can see where you're coming from by questioning that. I don't have any strong feeling about the usefulness of naming it the way I did. take has list processing connotations, so I'd probably name the command unshare, unvar, dropvar, or something like that.

AMG: [unshare] is misleading; I would expect it to return an unshared copy of the value passed to it, basically Tcl_DuplicateObj() [L1 ]. Of course that's not what's happening, and this specific behavior ought not to be exposed as a script command. [unvar] and [dropvar] suggest that the variable is to be unset, but that's not the case either. I think [take] works well because the value is taken away from the variable, without destroying the variable. If I understand your comment about list processing connotations, I think it's a valid analogy, if "take" on a list is like a generalized "pop" that operates on any index.

You're correct that [lpop] demonstrates this particular spelling of the idiom is not the only way to achieve the goal.

See Also

K
Presents various alternate wordings of the unshared value idiom