Version 12 of SYLK

Updated 2007-12-04 10:42:23 by dkf

This page was started by HZe, comments are welcome.

To write out simple, table oriented data to interface with other systems, sometimes csv formatted files are used. But if you have to deal with different localization, soon you get screwed up by different configurations of decimal points (dot or comma) and list separators (comma, semicolon). Even worse, on Windows each user may reconfigure the locale settings.

So I tried to use a different format. SYLK is the file format of old Microsoft Multiplan and is still supported by almost all modern spreadsheet programs.

The benefits are

  • ASCII format
  • decimal point is defined as dot (.)
  • some basic formatting is possible
  • cells are addressed by x/y values, so no separator

unfortunately, there is also a drawback

  • format is not documented well (see e.g. [L1 ] which is almost everything you can find)

GWM also see [L2 ].

The package below is able to read in SYLK files and store it into an array. While doing this, some parts are parsed (like the content, the format and the formulas), some parts are just stored (like the header information). This enables the user to read a given SYLK file (e.g. exported by Excel) and access the values with $SYLK(<x>,<y>,value). Also, modifications can be made and the whole structure can be saved back into a SYLK file.

Since there is no access layer (yet), you will have to access the array directly. So, some caution is necessary. Here are some remarks when using the package:

  • if you overwrite a value with a formula, make sure the formula is deleted (unset SYLK(<x>,<y>,formula))
  • format and values are exactly stored as defined in SYLK
  • the code makes heavily use of regexp, so performance can be improved
  • SYLK(header) is used to store all header lines as a TCL list

Although the code is perhaps not complete, it already may be helpful for some others...

    #####################################################################
    # Access procedures for SYLK files reading/writing
    #
    # (c) Copyright 2003 LMS Deutschland GmbH
    #                    Holger Zeinert
    #####################################################################
    # feel free to use without any warrenty
    #####################################################################
    #
    # internal data structure of a SYLK file is stored in an array:
    #
    #   SYLK(id)
    #       identification of the SYLK file
    #   SYLK(<x>,<y>,value)
    #       value of cell
    #   SYLK(<x>,<y>,formula)
    #       formula to calculate value
    #       (a cell needs a value; if a formula is given, but no value, the cell
    #       will not be stored into the SYLK file)
    #   SYLK(<x>,<y>,format)
    #       formatstring of cell
    #   SYLK(xMax), SYLK(yMax)
    #       Boundary X and Y, i.e. maximum column and row index
    #   SYLK(header)
    #       part of the header, which is not handled; this
    #       is used to write it unmodified (uninterpreted)
    #       back to the file
    #
    #####################################################################

    package provide Sylk 0.1


    #--------------------------------------------------------------------
    # parse X and Y values from a SYLK record
    #
    # Both values are optional. If a value is missing, the corresponding
    # value is not modified
    #--------------------------------------------------------------------
    proc parsePosition {record xName yName} {
        upvar $xName x
        upvar $yName y

        if {[regexp {;Y([0-9]+)(;|$)} $record dummy value]} {
            set y $value
        }
        if {[regexp {;X([0-9]+)(;|$)} $record dummy value]} {
            set x $value
        }
    }

    #--------------------------------------------------------------------
    # read a SYLK file into the internal structure
    #--------------------------------------------------------------------
    proc readSYLK {filename arrayName} {
        upvar $arrayName SYLK

        if {[catch {set fp [open $filename r]}]} {
            return 0
        }

        # store table boundaries
        set xMax 0
        set yMax 0

        # hold current postition
        set xAct 0
        set yAct 0

        while {![eof $fp]} {
            gets $fp line
            switch -regexp $line {
                ^ID {
                    # e.g. ID;PWXL;N;E
                    regexp {^ID;(.*)$} $line dummy value
                    set SYLK(id) $value
                }
                #^P -
                ^O {
                    # ignore some default formatting information,
                    # which I did not find documented
                }
                ^B {
                    # e.g. B;Y107;X105;D0 0 106 104
                    regexp {^B;Y([0-9]+);X([0-9]+)} $line dummy yMax xMax
                }
                ^F {
                    parsePosition $line xAct yAct
                    if {$xAct == 0 && $yAct == 0} {
                        # still header formats
                        lappend SYLK(header) $line
                    } else {
                        # cell formats
                        if {[regexp {;S([^;]*)(;|$)} $line dummy value]} {
                            set SYLK($xAct,$yAct,format) $value
                        }
                    }
                }
                ^C {
                    # e.g. C;Y1;X1;K"Name"
                    parsePosition $line xAct yAct
                    if {[regexp {;K([^;]*)(;E|$)} $line dummy value]} {
                        regsub {"(.*)"} $value {\1} value
                        set SYLK($xAct,$yAct,value) $value
                    }

                    # with formula? E.g. C;K-1188;E(R[+1]C-0.5)*(R3C2-R2C2)/R4C2+R2C2
                    if {[regexp {;E(.*)(;|$)} $line dummy value]} {
                        set SYLK($xAct,$yAct,formula) $value
                    }
                }
                ^E$ {
                    # SYLK file ends here
                    break
                }
                default {
                    # add to header (uninterpreted)
                    lappend SYLK(header) $line
                }
            }
        }
        close $fp

        set SYLK(xMax) $xMax
        set SYLK(yMax) $yMax

        return 1
    }

    #--------------------------------------------------------------------
    # write a SYLK file from the internal structure
    #--------------------------------------------------------------------
    proc writeSYLK {filename arrayName} {
        upvar $arrayName SYLK

        set fp [open $filename w]

        # write
        if {[info exists SYLK(id)]} {
            puts $fp "ID;$SYLK(id)"
        } else {
            puts $fp "ID;PWXL;N;E"
        }

        # boundary?
        if {[info exists SYLK(yMax)] && [info exists SYLK(yMax)]} {
            puts $fp "B;Y$SYLK(yMax);X$SYLK(xMax)"
        }

        # first, write header if existing
        if {[info exists SYLK(header)]} {
            foreach line $SYLK(header) {
                puts $fp $line
            }
        }

        # write data
        foreach idx [lsort -dictionary [array names SYLK *,*,value]] {
            regexp {([0-9]+),([0-9]+),} $idx dummy x y

            if {[info exists SYLK($x,$y,format)]} {
                set line "F;S$SYLK($x,$y,format);Y$y;X$x"
                puts $fp $line
            }

            if {[string is double $SYLK($x,$y,value)]} {
                set value $SYLK($x,$y,value)
            } else {
                set value "\"$SYLK($x,$y,value)\""
            }
            set line "C;Y$y;X$x;K$value"
            if {[info exists SYLK($x,$y,formula)]} {
                append line ";E$SYLK($x,$y,formula)"
            }
            puts $fp $line
        }

        # end-of-file marker
        puts $fp "E"

        close $fp
    }