Version 20 of copy-on-write

Updated 2009-07-08 13:06:56 by dkf

copy-on-write is a mechanism used in the C implementation of Tcl. If you're not writing an extension for Tcl in C then you really don't need to know anymore. If you are then a good place to start is with the documentation for Tcl_Obj (The name is misleading. Tcl_Value would better describe it's purpose.)

A fundamental of Tcl is that everything is a string. This has both advantages and disadvantages. One of the potential disadvantages is that this inevitably means that there are a lot of large strings and a naive implementation that copied these strings a lot would perform badly. copy-on-write is the mechanism by which the Tcl C implementation avoids unecessary copies. Each value (Tcl_Obj) has a reference count. Whenever the value is passed to a command or assigned to a variable the reference count is incremented and no copy is made. When a value is to be changed the implementation first checks the reference count. If the count is 1 then there is no other reference to the value and it can be changed in place. If the count is greater than 1 then there are other references to this value. If the value were to be changed in place then those references would also be changed. To prevent this a copy of the value is made (with new reference count of 1) and the copy is changed in place.

Misconceptions

Some people have confused copy-on-write with issues of call-by-value versus call-by-reference and/or the use of upvar. This is wrong. Tcl command arguments are always passed by value at the Tcl script level and by reference at the C implementation level. Sometimes the value might be the name of a variable but this is not the same thing as passing by reference in C (and even if it were, it would still have nothing to do with copy-on-write).

The only time that copy-on-write is visible at the script level is when an extension has been coded wrongly and copy-on-write hasn't happened when it should have.

Should anyone still be confused by values, references, variables and names I refer you to Lewis Carroll's "Through the Looking Glass" :


The name of the song is called "Haddocks' Eyes."'
`Oh, that's the name of the song, is it?' Alice said, trying to feel interested.
`No, you don't understand,' the Knight said, looking a little vexed. `That's what the name is called. The name really is "The Aged Aged Man."'
`Then I ought to have said "That's what the song is called"?' Alice corrected herself.
`No, you oughtn't: that's quite another thing! The song is called "Ways and Means": but that's only what it's called, you know!'
`Well, what is the song, then?' said Alice, who was by this time completely bewildered.
`I was coming to that,' the Knight said. `The song really is "A-sitting On A Gate": and the tune's my own invention.' 

Threads

C/C++ programmers discovering copy-on-write initially tend to see it as a guaranteed performance improvement and beginners C++ books are full of examples of string classes using it. After a while they will use their copy-on-write implementations in a multi-threaded application and then things start to go wrong. The problem is that checking the reference count and copying needs to be performed in a thread safe manner which typically requires the use of mutexes. Since the mutex must be locked for every access, not just ones that turn out to need a copy, the performance tends to actually be worse than for a straightforward implementation that always copies. Threaded Tcl avoids this problem by never sharing values between threads.


NickHounsome - 2009-07-08 03:30:04

None of the Tcl examples here have anything to do with copy-on-write since copy-on-write only has meaning in the C API.