Version 21 of copy-on-write

Updated 2010-12-27 21:40:49 by AMG

Copy-on-write is a mechanism used in the C implementation of Tcl. If you're not writing an extension for Tcl in C then you really don't need to know anymore. If you are then a good place to start is with the documentation for Tcl_Obj (The name is misleading. Tcl_Value would better describe its purpose, but that term was already taken by Tcl_CreateMathFunc().)

A fundamental of Tcl is that everything is a string. This has both advantages and disadvantages. One of the potential disadvantages is that this inevitably means that there are a lot of large strings and a naive implementation that copied these strings a lot would perform badly. copy-on-write is the mechanism by which the Tcl C implementation avoids unecessary copies. Each value (Tcl_Obj) has a reference count. Whenever the value is passed to a command or assigned to a variable the reference count is incremented and no copy is made. When a value is to be changed the implementation first checks the reference count. If the count is 1 then there is no other reference to the value and it can be changed in place. If the count is greater than 1 then there are other references to this value. If the value were to be changed in place then those references would also be changed. To prevent this a copy of the value is made (with new reference count of 1) and the copy is changed in place.

Misconceptions

Some people have confused copy-on-write with issues of call-by-value versus call-by-reference and/or the use of upvar. This is wrong. Tcl command arguments are always passed by value at the Tcl script level and by reference at the C implementation level. Sometimes the value might be the name of a variable but this is not the same thing as passing by reference in C (and even if it were, it would still have nothing to do with copy-on-write).

The only time that copy-on-write is visible at the script level is when an extension has been coded wrongly and copy-on-write hasn't happened when it should have. Also you can see it by using the [representation] command, which does not conform to Tcl value semantics.

Should anyone still be confused by values, references, variables and names I refer you to Lewis Carroll's "Through the Looking Glass" :

"The name of the song is called 'Haddocks' Eyes.'"
"Oh, that's the name of the song, is it?" Alice said, trying to feel interested.
"No, you don't understand," the Knight said, looking a little vexed. "That's what the name is called. The name really is 'The Aged Aged Man.'"
"Then I ought to have said 'That's what the song is called'?" Alice corrected herself.
"No, you oughtn't: that's quite another thing! The song is called 'Ways and Means': but that's only what it's called, you know!"
"Well, what is the song, then?" said Alice, who was by this time completely bewildered.
"I was coming to that," the Knight said. "The song really is 'A-sitting On A Gate': and the tune's my own invention."

Threads

C/C++ programmers discovering copy-on-write initially tend to see it as a guaranteed performance improvement and beginners' C++ books are full of examples of string classes using it. After a while they will use their copy-on-write implementations in a multi-threaded application and then things start to go wrong. The problem is that checking the reference count and copying needs to be performed in a thread safe manner which typically requires the use of mutexes. Since the mutex must be locked for every access, not just ones that turn out to need a copy, the performance tends to actually be worse than for a straightforward implementation that always copies. Threaded Tcl avoids this problem by never sharing values between threads.


NickHounsome - 2009-07-08 03:30:04

None of the Tcl examples here have anything to do with copy-on-write since copy-on-write only has meaning in the C API.