Version 18 of copy-on-write

Updated 2009-07-08 09:15:12 by NickHounsome

copy-on-write is a mechanism used in the C implementation of Tcl. If you're not writing an extension for Tcl in C then you really don't need to know anymore. If you are then a good place to start is with the documentation for Tcl_Obj (The name is misleading. Tcl_Value would better describe it's purpose.)

A fundamental of Tcl is that everything is a string. This has both advantages and disadvantages. One of the potential disadvantages is that this inevitably means that there are a lot of large strings and a naive implementation that copied these strings a lot would perform badly. copy-on-write is the mechanism by which the Tcl C implementation avoids unecessary copies. Each value (Tcl_Obj) has a reference count. Whenever the value is passed to a command or assigned to a variable the reference count is incremented and no copy is made. When a value is to be changed the implementation first checks the reference count. If the count is 1 then there is no other reference to the value and it can be changed in place. If the count is greater than 1 then there are other references to this value. If the value were to be changed in place then those references would also be changed. To prevent this a copy of the value is made (with new reference count of 1) and the copy is changed in place.

Misconceptions

Some people have confused copy-on-write with issues of call-by-value versus call-by-reference and/or the use of upvar. This is wrong. Tcl command arguments are always passed by value at the Tcl script level and by reference at the C implementation level. Sometimes the value might be the name of a variable but this is not the same thing as passing by reference in C (and even if it were, it would still have nothing to do with copy-on-write).

The only time that copy-on-write is visible at the script level is when an extension has been coded wrongly and copy-on-write hasn't happened when it should have.

Threads

C/C++ programmers discovering copy-on-write initially tend to see it as a guaranteed performance improvement and beginners C++ books are full of examples of string classes using it. After a while they will use their copy-on-write implementations in a multi-threaded application and then things start to go wrong. The problem is that checking the reference count and copying needs to be performed in a thread safe manner which typically requires the use of mutexes. Since the mutex must be locked for every access, not just ones that turn out to need a copy, the performance tends to actually be worse than for a straightforward implementation that always copies. Threaded Tcl avoids this problem by never sharing values between threads.


Category Glossary | Category Internals


NickHounsome - 2009-07-08 03:30:04

None of the Tcl examples here have anything to do with copy-on-write since copy-on-write only has meaning in the C API.