Version 6 of Persistent arrays

Updated 2004-03-08 15:12:45

Richard Suchenwirth 2002-05-30 - Large data (e.g. a simple database) are in Tcl often stored in arrays. You can load array contents from files, and you can store them to files, so the data persists between sessions. Here is some code that "ties" an array to a file, so initially the array is "loaded" from the file, and changes to the array are reflected in the file too. This is done by opening it for appending, writing the changed data, and closing it - possibly slow, but very robust. If you don't give a filename, it is made up of the array name and the extension ".txt".

A special application for this is event logging: register a global persistent array once and assign to it, for instance with a telling key like clock seconds for each event:

 persistentArray log mylogfile
 ...
 set log([clock seconds]) "$this happened."
 ...
 set log([clock seconds]) "$that happened."

When later examining the logfile, you can reconstruct the date and time of the events with clock format. The disadvantage against pure file logging is that all event messages remain in the array.

 proc persistentArray {arrName {filename {}}} {
    upvar 1 $arrName arr
    array set arr {} ;# to make sure it exists, and is an array
    if {$filename==""} {set filename $arrName.txt}
    set filename [file join [pwd] $filename]
    if [file exists $filename] {
        set fp [open $filename]
        array set arr [read $fp]
        close $fp
    }
    uplevel 1 [list trace var $arrName wu [list persist'save $filename]]
 }
 proc persist'save {filename arrName el op} {
    upvar 1 $arrName arr
    switch -- $op {
        w {set value $arr($el)}
        u {set value {}}
    }
    set    fp [open $filename a]
    puts  $fp [list $el $value]
    close $fp
 }

Here's another approach to persistent/tied arrays that is similar to what perl does with dbmopen and whatnot. It uses BerkeleyDB.

 # package require Db_tcl
 load /usr/lib/libdb_tcl-3.2.so ;# already installed on some linux systems

 proc tieArray {aname file db} {
   upvar $aname a
   array set a {}
   set fname [file nativename $file]
   set dbh [berkdb open -hash -create $fname $db]
   set tie_db::afiles($aname) $dbh
   trace variable a r [list tie_db::read $aname]
   trace variable a w [list tie_db::write $aname]
   trace variable a u [list tie_db::del $aname]
 }

 proc syncArray {aname} {
   $tie_db::afiles($aname) sync
 }

 proc syncAll {} {
   foreach a [array names tie_db::afiles] {
     syncArray $a
   }
 }

 proc untieArray {aname} {
   tie_db::afiles($aname) close
   trace vdelete a r [list tie_db::read $aname]
   trace vdelete a w [list tie_db::write $aname]
   trace vdelete a u [list tie_db::del $aname]
 }


 namespace eval ::tie_db {
   variable afiles

   proc read {rname aname ename op} {
     variable afiles
        upvar $aname a
        foreach {k v} [lindex [$afiles($rname) get $ename] 0] {null} {null} {}
        if {$k == ""} {
          error "no such element in db"
        }
        set a($ename) $v
   }

   proc write {rname aname ename op} {
     variable afiles
        upvar $aname a
        $afiles($rname) put $ename $a($ename)
        # $afiles($rname) sync
   }

   proc del {rname aname ename op} {
     variable afiles
        upvar $aname a
        $afiles($rname) del $ename 
        # $afiles($rname) sync
   }

   proc array {cmd aname args} {
     variable afiles
        if {[catch {set afiles($aname)}]} {
          uplevel 2 _array $cmd $aname $args
        } else {
          switch $cmd {
            names -
                get   -
                names/get {
                  set cur [$afiles($aname) cursor]
                  _array set tmp_array {}
                  while 1 {
                    set kvp [lindex [$cur get -next] 0]
                        if {$kvp == {}} {break}
                        puts -nonewline [lindex $kvp 0]..
                        set tmp_array([lindex $kvp 0]) [lindex $kvp 1]
                  }
                  $cur close
                  if {$cmd == "names"} {
                    _array names tmp_array
                  } elseif {$cmd == "get"} {
                    _array get tmp_array
                  }
                }
                default {error "$cmd not implemented for tied arrays"}
          }
        }
   }
 }

 rename array _array
 proc array {args} {uplevel ::tie_db::array $args}

The write and del procedures can be changed to sync after each operation at a cost in performance. Some datasets seem to make this upset (I wrote it trying to make the Bayesan spam filtering code more elegant) but I suspect its some bugs in the version of db that I have.

Along with this, I noticed an anamoly with trace and upvar. You can use upvar to create an alias for a variable, even is the real variable is an element of an array.

 % set c(x) 0
 % upvar #0 c(x) x
 % set x 1
 % set c(x)
 => 1

However, that kind of aliased variable will not trigger traces set on the entire array, although it will trigger a trace on that element of the array. This seems like a bug; if a variable is traced then any way of accessing it should fire the same traces, right?


Stefan Vogel 8-Mar-2004: You can also use the new version of Tgdbm (A Tcl-Wrapper for gdbm-(GNU-dbm)-API). Version 0.5 allows to "attach" an array to a gdbm-file (which stores the equivalent to the tcl-array, namely hash-key/value-pairs).

It is possible to do this:

 gdbm_open -writer -sync -array airports test.gdbm
 set airports(PAR) Paris ;# will store or update the key/value to test.gdbm

 # add/update some more data
 array set airports {
  ADD "Addis Abeba"
  FFM "Frankfurt"
 }

 # print value (gdbm-file and array is synchronized)
 puts "FFM: [airports fetch FFM] / $airports(FFM)"
 unset airports          ;# this will close test.gdbm
                         ;# this could have been done with unset airportArray

See http://www.vogel-nest.de/tcl for more details on Tgdbm.