Excel/BIFF2 writer in pure Tcl

Here is a program in pure Tcl that creates an Excel spreadseet (in BIFF2 binary format) containing some text and numbers. The resulting document can be viewed by any Excel viewer/editor: MS Office Excel, OpenOffice Calc, etc.


Unfortunately, the file format written by this code is only suitable for very old versions of Excel. The current (>2010) versions of Excel can open the file, but refuse to edit them.

# A simple Excel/BIFF2 writer in pure Tcl.
# This code is in the public domain.

# integer to BIFF2
proc store_int {row col int {bold 0}} {
    if {$row > 65535 || $col > 255} {return}
    if {abs($int) > 65535} {
        set float [expr {double($int)}]
        return [store_float $row $col $float $bold]
    }
    # INTEGER record: record_type(2) + record_size(2) + row_number(2)
    #     + col_number(2) + format(3) + integer_number(2)
    return [binary format s*c*s "0x0002 9 $row $col" "0 [expr {$bold<<6}] 0" $int]
}
# double to BIFF2
proc store_float {row col float {bold 0}} {
    # NUMBER record: record_type(2) + record_size(2) + row_number(2)
    #     + col_number(2) + format(3) + float_number(8)
    return [binary format s*c*d "0x0003 15 $row $col" "0 [expr {$bold<<6}] 0" $float]
}
# text to BIFF2
proc store_text {row col text {bold 0}} {
    set len [string bytelength $text]
    if {$len > 255} {
        # maximum label length is 255
        set text [string range $text 0 254]
        set len 255
    }
    set sz [expr {8+$len}]
    # LABEL record: record_type(2) + record_size(2) + row_number(2)
    #     + col_number(2) + format(3) + text_length(1) + text($text_length)
    return [binary format s*c*ca* "0x0004 $sz $row $col" "0 [expr {$bold<<6}] 0" $len $text]
}
# beginning of file
proc store_bof {} {
    # BOF record
    set head [binary format s* "0x0009 4 0x0000 0x0010"]
    # CODEPAGE record
    append head [binary format s* "0x0042 2 1251"]
    # FONT records: plain and bold
    append head [binary format s*ca* "0x0031 10 200 0x0000" 5 "Arial"]
    append head [binary format s*ca* "0x0031 10 200 0x0001" 5 "Arial"]
    return $head
}
# test formula "=B1+D1", in RPN: "B1 D1 +"
proc store_formula {} {
    # FORMULA record
    return [binary format s*c*dc*sc*sc* "0x0006 26 1 0" "0 0 0" 0 "1 9 0x44" 0 "1 0x44" 0 "3 0x03"]
}
# end of file
proc store_eof {} {
    # EOF record
    return [binary format s 0x000A]
}
# choose appropriate function based on value type
proc store_value {row col val {bold 0}} {
    if {[string is integer -strict $val]} {
        return [store_int $row $col $val $bold]
    }
    if {[string is double -strict $val]} {
        return [store_float $row $col $val $bold]
    }
    return [store_text $row $col $val $bold]
}

append xls [store_bof] \
        [store_value 0 0 "bold text" 1] \
        [store_value 0 2 "plain text"] \
        [store_value 0 1 168] \
        [store_value 0 3 32] \
        [store_value 2 0 3.1415 1] \
        [store_formula] \
        [store_eof]

set fd [open testxl.xls w]
fconfigure $fd -translation binary
puts -nonewline $fd $xls
close $fd
exit

arjen - 2010-03-09 04:41:41

It works nicely, but how is the formula encoded? Can you explain it?


AAK - 2010-03-09 13:00

Formula is stored in Reverse Polish Notation token array. In my example formula "=B1+D1" is stored as "B1 D1 +" tokens. These bytes:

   0x0006 26 1 0 "0 0 0" 0 1 9 0x44 0 1 0x44 0 3 0x03

have the following meaning:

   record_type record_size row_index column_index cell_attributes
   formula_result recalculate_or_not token_array_size
   B1_reference_token_type B1_row_index B1_column_index
   D1_reference_token_type D1_row_index D1_column_index plus_token_type

You can find detailed description of BIFF binary format here: [1 ].


Kieran - 2014-07-02 15:26:01

Neat! Just one comment about character encodings ...

I notice that the store_bof proc generates a "Windows-1251" (Cyrillic) codepage record which probably isn't ideal for everyone ...

proc store_bof {} {
    ...
    # CODEPAGE record
    append head [binary format s* "0x0042 2 1251"]

Perhaps "Windows-1252" would be a more common choice for English speakers:

    # CODEPAGE record (Windows Codepage 1252 - Latin 1)
    append head [binary format s* "0x0042 2 0x8001"]

If non-ASCII text is important, then one could also be a bit more paranoid about encodings in the store_text proc:

proc store_text {row col text {bold 0}} {
    # needs to be same as used in the CODEPAGE record from store_bof:
    set codepage "cp1252"
    set bytes [encoding convertto $codepage $text]
    set len [string length $bytes]
    if {$len > 255} {
                error "text too long"
    }
    set roundtrip_text [encoding convertfrom $codepage $bytes]
    if {![string equal $text $roundtrip_text]} {
        error "text contains characters that won't convert safely to codepage $codepage"
    }
    set sz [expr {8+$len}]
    # LABEL record: record_type(2) + record_size(2) + row_number(2)
    #     + col_number(2) + format(3) + text_length(1) + text($text_length)
    return [binary format s*c*ca* "0x0004 $sz $row $col" "0 [expr {$bold<<6}] 0" $len $bytes]
}