Version 57 of How do I manage lock files in a cross platform manner in Tcl

Updated 2007-05-25 13:17:22 by AW

Purpose: to conduct conversations as well as provide code samples of Tcl scripted file locking.

  1. Why are lock files needed?
  2. What are some of the techniques for locking files?
  3. How do I lock a file so that other programs can't update it at the same time I am, if I don't have access to the code of those other programs?
  4. What if the file is being shared between several different computers over a network?
  5. What if the file is being shared between machines running different operating systems?
  6. What if the program crashes, or the computer crashes, while the file is locked?

ulis, 2005-12-23. For a portable (and reliable) manner see : Serializing things via file locks


Why are lock files needed?

To stop a file from being read in an inconsistent state. This means that the file should only ever be written to by one process/thread at a time (to stop the interleaving of chunks of data in a non-helpful fashion) and that the file should not be read from during a write (or you would end up with damaged data.) Multiple concurrent reads are typically not a problem. Simple locking techniques are usually enough to solve all of these problems.

Sometimes (not often), you need to permit concurrent read/write or write/write access to a file (typically a binary/record-formatted file - databases are the canonical example) and then you will need to use a more sophisticated technique based on locking regions of a file.

Note that this does not apply to the situation where you are writing to log files. There, as long as you open for update in append-mode and only write consistent chunks, a typical operating system can enforce the correctness of your datastream for you. Collisions don't happen, interleavings don't matter, and only consistent chunks can be read. The only data that might be inconsistent even if your O/S is failing miserably is the final record if it is currently being added.

Exception: If the log file is on an NFS mounted volume and you append simultaniously from more than one node, the logfile will become corrupted (applies to AIX and probably more). To avoid this, append to the log file from only one node.

Sometimes you want to control a device, such as a printer, modem, etc. and lock files are one way that is sometimes considered.

Sometimes you want to ensure that a particular process can only run one instance at a time.

What are some of the techniques for locking files?

  1. creat() a lock file, build the new file using a separate filename (though in the same directory), rename() it into place, and remove the lock file.
  2. lockf()/flock()/fctrl()/ioctl() - I can't remember the details of setting up advisory locks.
  3. Mandatory locks are like advisory locks on Unix, but the file is specially flagged.
  4. Use a socket as a locking indicator.
  5. Use a system semaphore as a locking indicator (see also: semaphores).

How do I lock a file so that other programs cannot update it at the same time I am, if I don't have access to the code of those other programs?

If those programs already do locking, then do the locking their way. Otherwise, you are SOL.

What if the file is being shared between several different computers over a network?

On Unix over NFS, pray that the server has a working implementation of lockd. Otherwise, unless you have write access to the directory you are accessing the file from as well as access to every program's source code that might access that file, you are SOL. If you do have write access to the directory that the file you're looking to lock is located in and you have access to the source code for all of the programs that will access that file then implementing the locking scheme mentioned above should still work in most cases.

What if the file is being shared between machines running different operating systems?

If Tcl can read and write files to and from that O/S's file format and directory structure you should still be able to implement file locking like that discussed above.

What if the program crashes, or the computer crashes, while the file is locked?

Any intelligent locking routine should check the time and date of the lock file. If the lock file is extremely old it can usually safely be deleted. Be careful though, if the program that created that lock file is still running you might end up with corrupted data. Also, if other locking routines have been waiting for that lock file to be removed you might have to fight for the right to lock that file.

Now you're beginning to see why many people don't like locks...but no matter how much they dislike them they are a necessary part of most realistic programs.


Falco Paul: Here is a nice lock implementation that works both on UNIX and WIN32 platforms. It's done using sockets (IPC). Basicly, the idea here is to open a server socket on a particular port. This will fail if the port is already in use. So this little trick allows us to manage locks without having to resort to lock files. In this sample implementation you can even supply LOCK ID's that will be mapped to 'unique' port numbers (done here by using a simple hash function). The nice thing about this method is that it's 100% fool-proof. The O/S will guarantee that only one process can actually open a port. As a nice bonus, the O/S will also auto-release the lock if your process dies. This works OK on both UNIX and WIN32 platforms! There is one small caveat. If you open a socket (ie: aquire the lock), and then use 'exec' to spawn a process, than your new child process will inherit the 'open' socket. So if that process (or some of it's child processes) would remain to run, even after you long closed 'your' socket, than the socket will still be considered open. In fact, you wouldn't be able to open that socket while the child processes remain to hold the socket open! Something to be aware off... see also socket

Aquire a lock:

  proc aquire_lock {LOCKID} {

  global LOCK_SOCKET

  set PORT [port_map "$LOCKID"]

  # 'socket already in use' error will be our lock detection mechanism

  if { [catch {socket -server dummy_accept $PORT} SOCKET] } {
    puts "Could not aquire lock"
    return 0
  }

  set LOCK_SOCKET("$LOCKID") "$SOCKET"

  return 1

  }

Release a lock (assumes you actually hold the lock):

  proc release_lock {LOCKID} {

  global LOCK_SOCKET

  if { [catch {close $LOCK_SOCKET("$LOCKID")} ERRORMSG] } {
    puts "Error '$ERRORMSG' on closing socket for lock '$LOCKID'"
    return 0
  }

  unset LOCK_SOCKET("$LOCKID")

  return 1

  }

Map LOCKID to an O/S socket:

  proc port_map {LOCKID} {

  # calculate our 'unique' port number using a hash function.
  # this mapping function comes from dr. KNUTH's art of programming volume 3.

  set LEN [string length $LOCKID]
  set HASH $LEN

  for {set IDX 0} {$IDX < $LEN} {incr IDX} {
    scan [string index "$LOCKID" $IDX] "%c" ASC
    set HASH [expr (($HASH<<5)^($HASH>>27))^$ASC];
  }

  # always use a prime for remainder
  # note that the prime number used here will basicly determine the maximum 
  # number of simultaneous locks

  return [expr (65535 - ($HASH % 101))]

  }

Our server call-back function (that does nothing at all):

  proc dummy_accept {newsock addr port} {
  }

wcf3 - I modified the previous code a bit to wait a period of time for the lock before returning failure.

  proc sleep { {milliseconds 200} } {
    set sleepvar "[namespace current]::___sleep[clock seconds][expr int(rand() * 1000)]___"
    after $milliseconds [list set $sleepvar 0]
    vwait $sleepvar
    unset $sleepvar
  }

  proc aquire_lock { LOCKID {timeout 5} } {
    global LOCK_SOCKET
    set PORT [port_map "$LOCKID"]

    # 'socket already in use' error will be our lock detection mechanism
    incr timeout [clock seconds]
    while { [catch {socket -server [namespace current]::dummy_accept $PORT} SOCKET] } {
      if { [clock seconds] > $timeout } {
        puts "Could not aquire lock"
        return 0
      }
      sleep
    }

    set LOCK_SOCKET("$LOCKID") "$SOCKET"
    return 1
  }

I was then able to...

  proc transaction { args } {
    foreach {opt val} [cmdline::getoptions args {
                {timeout.arg 5 "Seconds to try before timing out"}
                }] {
      set $opt $val
    }
    if { [aquire_lock "transaction" $timeout] } {
      eval uplevel 1 $args
      release_lock "transaction"
      return 1
    }
    return 0
  }

...which I can use like this

  transaction -timeout 3 {
    puts "starting locked process"
    sleep 1000
    puts "ending locked process"
  }

to wait approximately 3 seconds for the lock, run the script (or transaction), and then release the lock. The best part is that sleep allows the Tcl 'engine' to continue to run while sleeping. I will build a package out of all this after I stress it a bit more across multiple platforms.


A technique I prefer to use is to do away with locking altogether and instead have a daemon to handle all reads from and updates to the shared resource. This also allows the use of schemes that are not based on a single file, since it provides an abstract interface to the functionality required. - DKF

Unfortunately that implies that all of the writes are from programs that know about your daemon and means that interfacing with the files we're locking access to is a pain in the rear from many programming languages other than the one you used. - ENE

I didn't say it was perfect... - DKF


CL wants to return to this topic in terms of "singleton processes" and ... well, here's a grab bag, which I'll eventually re-organize:

Cameron: If it helps, perhaps I should label this conversation in terms of "locking"--file locking, process locking, ...

lvirden: Interesting. I guess I'm still not clear if what they REALLY want is one process at a time, or if they really want to ensure that only one process at any one time is updating or using some resource.

lvirden: In any case, Unix at least doesn't reliably make locking resources (files, processes, devices) trivial; in most cases, I find that one has to layer lots and lots of caveats or assumptions before one nears nirvana.

Cameron: You are so right to observe that. It's a natural place to start any discussion.

lvirden: I mean, one has to take into effect multiple users, multiple processors, NFS/Samba/NFS-WWW (or whatever it is called), multiple processes, remote executions, etc.

Cameron: Right: locking is difficult; that's why it deserves our attention.

lvirden: I know that JCW and I hashed it about for quite some time and he came to the conclusiong that it might be too big a problem to solve.

lvirden: I agree though that something needs to be written. And perhaps what we can do is load the page with a variety of keywords to ensure that searching finds the info.


Some people recommend lockf as the only reliable locking mechanism. However, here is Solaris's warning about lockf:

     Record-locking should not be used in combination with the
     fopen(3C), fread(3C), fwrite(3C) and other stdio functions.
     Instead, the more primitive, non-buffered functions (such as
     open(2)) should be used.  Unexpected results may occur in
     processes that do buffering in the user address space.  The pro-
     cess may later read/write data which is/was locked.  The stdio
     functions are the most common source of unexpected buffering.

In other words, if you have no control over how the I/O is going to occur to the file (because of an interaction with user provided code - or even with interaction with operating system code), so that you cannot guarantee that low level access to the file is all that is occuring, then you should not use lockf .


The Python Cookbook has a recipe [L1 ] that might interest some Tcl file-lockers.


On the other hand, that python recipe appears to me to leave a lot of questions - on posix systems it depends on fcntl, which doesn't work on NFS'd files as far as I am aware, and on windows, it has dependencies on a python module that doesn't have as of yet a tcl equivalent I don't think.

DKF - As I mentioned above, locking on NFS depends on the NFS server also hosting a working rpc.lockd (i.e. a lock management daemon.) This is probably true on reasonably modern Solaris, but who knows with other UNIX systems? And then there's other networked filesystems to worry about (e.g. AFS...)

LV I'm trying to recall all the ways that the NFS locking went wrong on me when I tried it. Of course, things might be better now.


fsync()


[Explain pertinence of [L2 ].]


18may04 jcw - There's a very interesting lockfile implementation in Ruby for lockfiles which presumably also works on NFS. It's Unix-specific (links, inode's), but if this is combined with a good solution for Windows, it might just be the way to achieve solid x-platform locking.

The mechanism is clearly explained on the RAA site [L3 ], along with complete Ruby source code. I couldn't find license info, but Ruby appears to be BSD-like, at least in spirit.

Does anyone care to comment on the value and robustness of this approach?

TP - I think I first read about this approach in Marc Rochkind's book, Advanced Unix Programming [L4 ] (which I don't have my first edition copy around right now to confirm.) I wouldn't be so certain about its use on an NFS-based file system, someone else may want to comment. Basically, this approach is fairly certain to avoid race conditions, as the link(2) system call is (should be!) atomic. The downside to this approach is that if the locking process dies, the resource remains locked until someone else cleans it up. For one of my cross platform programs (Windows & Unix), where I wanted process locking (only one instance running at one time), I used the server socket code shown above with good results.


RJ: I have had good experience simply using a .lock file to store/check/delete filelock entries, one line per file lock. Is there anything bad about this concept I am unaware of?

LV depends on your environment. That might work well in a simple, single user, single application environment. However, if you have multiple users, using either multiple instances, or even worse, multiple applications, accessing the file via NFS, this strategy falls down from, at the very least, buffering, etc.


Brian Griffin: In my experience with heterogeneous networks, the only file locking mechanism that works is the client/server model mentioned by DKF above. As proof of this, please list all real database engines that don't use a client/server model:


Todd Coram: But how would such a lock server work? Here is a simple Generic Lock Server that could be used to maintain locks on resources.


MAKR 2005-08-25: I am still looking for a way using only Tcl. My requirements are simple:

  • filesystem locking (the filesystem may be shared (NFS) between hosts, or multiple applications on one host may simultaneously try to access it),
  • only UNIX systems and
  • using only Tcl or POSIX shell tools.

Thus I came to the conclusion the Mozilla way is the one to go: Creating a symbolic link pointing to the lock identifier. E. g. in a shell you would do:

  $ ln -s "$lockid" /path/to/lockfile
  $ ls -l /path/to/lockfile
  lrwxrwxrwx  1 makr users 18 2005-08-25 08:34 /path/to/lockfile -> [email protected]:8896

Which is the right thing to do, because POSIX specifies for the source file: If the -s option is specified, no restrictions on the type of file or on its existence shall be made.

One cannot follow this approach in Tcl, however. If you try it you will get:

  % file link -sym /path/to/lockfile [format "%s@%s:%ld" $user $host [pid]]
  could not create new link "/path/to/lockfile" since target "[email protected]:17028" doesn't exist

Thus either use exec with ln -s - which is what I do currently - or write your own handler for symlink(2). POSIX specifies for symlink(2) the target shall be treated only as a character string and shall not be validated as a pathname. It also specifies EEXIST as an error, implying an atomic operation.

LV so tell me, with regards to this technique, what error handling should be performed when the code attempts to create the link but the file already exists by the time the tcl interpreter attempts to create it?

MAKR well, I don't do very sophisticated error handling. If the exec fails, I look into errorCode if it contains a POSIX EEXIST error. In this case I read the link:

  • is this a lock I created already (user, host and pid match)? yes -> return success, otherwise
  • does this lock belong to another application on the same host? yes -> is this a stale lock? yes -> try to break and return success, otherwise
  • raise an error reporting a locked filesystem (incl. user, host and pid)

If errorCode contains anything else, just hand over errorCode and errorInfo.

MAKR argh - well, you got me. The error handling applies to the symlink()-wrapper of cause. This is what I've done recently, the old code using exec just reported that a lock could not be obtained.


EF The TIL [L5 ] contains an implementation of the UNIX lockfile command. I suspect that the implementation is not safe on NFS systems, but it is perhaps worth a look at. [L6 ]


lockfiles


[ Category File | Category Platform Issues ]