Copy-on-write is a mechanism used in the C implementation of Tcl. If you're not writing an extension for Tcl in C then you really don't need to know anymore. If you are then a good place to start is with the documentation for Tcl_Obj (The name is misleading. Tcl_Value would better describe its purpose, but that term was already taken by Tcl_CreateMathFunc().)
A fundamental of Tcl is that everything is a string. This has both advantages and disadvantages. One of the potential disadvantages is that this inevitably means that there are a lot of large strings and a naive implementation that copied these strings a lot would perform badly. copy-on-write is the mechanism by which the Tcl C implementation avoids unecessary copies. Each value (Tcl_Obj) has a reference count. Whenever the value is passed to a command or assigned to a variable the reference count is incremented and no copy is made. When a value is to be changed the implementation first checks the reference count. If the count is 1 then there is no other reference to the value and it can be changed in place. If the count is greater than 1 then there are other references to this value. If the value were to be changed in place then those references would also be changed. To prevent this a copy of the value is made (with new reference count of 1) and the copy is changed in place.
Some people have confused copy-on-write with issues of call-by-value versus call-by-reference and/or the use of upvar. This is wrong. Tcl command arguments are always passed by value at the Tcl script level and by reference at the C implementation level. Sometimes the value might be the name of a variable but this is not the same thing as passing by reference in C (and even if it were, it would still have nothing to do with copy-on-write).
The only time that copy-on-write is visible at the script level is when an extension has been coded wrongly and copy-on-write hasn't happened when it should have. Also you can see it by using the [representation] command, which does not conform to Tcl value semantics.
Should anyone still be confused by values, references, variables and names I refer you to Lewis Carroll's "Through the Looking Glass" :
"The name of the song is called 'Haddocks' Eyes.'" |
"Oh, that's the name of the song, is it?" Alice said, trying to feel interested. |
"No, you don't understand," the Knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged Aged Man.'" |
"Then I ought to have said 'That's what the song is called'?" Alice corrected herself. |
"No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!" |
"Well, what is the song, then?" said Alice, who was by this time completely bewildered. |
"I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention." |
C/C++ programmers discovering copy-on-write initially tend to see it as a guaranteed performance improvement and beginners' C++ books are full of examples of string classes using it. After a while they will use their copy-on-write implementations in a multi-threaded application and then things start to go wrong. The problem is that checking the reference count and copying needs to be performed in a thread safe manner which typically requires the use of mutexes. Since the mutex must be locked for every access, not just ones that turn out to need a copy, the performance tends to actually be worse than for a straightforward implementation that always copies. Threaded Tcl avoids this problem by never sharing values between threads.
NickHounsome - 2009-07-08 03:30:04
None of the Tcl examples here have anything to do with copy-on-write since copy-on-write only has meaning in the C API.