Version 9 of Wiki admin notice

Updated 2005-02-13 14:37:33 by jcw

The wiki had some weird trouble with dates and references on Sat 12 Feb, and again on the 13th.

I've made the wiki read-only until the cause of this is resolved. Will look into this later today, my apologies for the inconvenience.

-jcw, your friendly wiki admin

VK: you probably mean 13 Feb. Thanks for repairing!


2005-02-13 12:00 GMT - I've re-enabled the wiki with more patient locking, looks like some edits are now taking longer than the previous rules for lock acquire/break were prepared to wait for. There may be a race condition in the lockfile logic. If things break again, I'll revert to r/o mode again. Please let me know by email -jcw


Aha, i think I found it. There is indeed a race condition, when wikit cannot acquire the lock. In wikit/lock.tcl, it does:

  1. check if pid stored in lockfile is a running process
  2. if not, remove the lock file as being stale

But that fails when the process was quitting, removed its lock, and a new one already created a lock - all between steps 1) and 2).

This is quite probable when the cache has been cleared and a search engine is hitting the wiki, generating tons of CGI calls to rebuild each cache page it requests. As is the case this very moment, btw...

The good news is that no page edits get lost, I can easily rebuild/recover. It's just two processes opening the db in r/w mode, which is a no-no with Metakit. Due to some substantial column-wise caching, the effects can be very odd - as the recent two failures haev shown.

Yuck. I'll tweak the lock logic a bit to prevent, or at least greatly reduce, this race condition. -jcw

Here's the new locking logic in wikit/lock.tcl:

  proc AcquireLock {lockFile {maxAge 3600}} {
    for {set i 0} {$i < 300} {incr i} {
      catch {
        set t [file mtime $lockFile]
        set fd [open $lockFile]
        set opid [gets $fd]
        close $fd

        if {[clock seconds] > $t + $maxAge] ||
            $opid != "" && [file isdir /proc] && ![file exists /proc/$opid]} {

          # If the lock looks stale, wait a bit to see if it is about to go away
          # and be reclaimed by another process - if so, avoid a file delete race.
          # This caused a damaged db twice in mid-Feb 2005, the new logic should
          # make most wikit instances back off if they see a lock.

          after 2500

          if {[file mtime $lockFile] == $t} {
            file delete $lockFile
            set fd [open savelog.txt a]
            set now [clock format [clock seconds]]
            puts $fd "# $now drop lock $opid -> [pid]"
            close $fd
          }
        }
      }
      catch {close $fd}

      if {![catch {open $lockFile {CREAT EXCL WRONLY}} fd]} {
        puts $fd [pid]
        close $fd
        return 1
      }
      after 1100
    }
    return 0
  }