LZW

LZW (named after the inventors Lempel , Ziv and Welch ) is a compression algorithm used among other things in GIF, TIFF (when used with the corresponding options) and Unix' compress.

The algorithm is patented by the company Unisys. In the mid-nineties, when the Internet took off and GIF became popular with web browser support, Unisys began enforcing their patent, see http://www.unisys.com/about__unisys/lzw/ .

According to that Unisys page the patents have by now expired: "Unisys U.S. LZW Patent No. 4,558,302 expired on June 20, 2003, the counterpart patents in the United Kingdom, France, Germany and Italy expired on June 18, 2004, the Japanese counterpart patents expired on June 20, 2004 and the counterpart Canadian patent expired on July 7, 2004."

BR


LV According to Slashdot , the Unisys patent expired July 7, 2004.

'HOWEVER, LZW was a modification of a patented algorithm by Lempel and Ziv, which according to this just referenced slashdot discussion, is still patented (???) by IBM. So the state of LZW is still unknown.

FPX IIRC, the slashdot article said that the IBM patent covers the same algorithm. Thus, while the LZW algorithm remains "patented," the IBM patent could be invalidated if challenged. As long as IBM does not try to enforce the patent, that is unlikely to happen, given the time and effort involved. The possibility for IBM to bully users of the LZW algorithm exists, but is largely theoretical. Until then, the IBM patent remains as a symbol of an inefficient system that only loosely checks for prior art. (While most patent offices claim that they do, it seems that the Australians are at least being honest about having given up: patents are rubber-stamped, and prior art search is delegated to challengers in court, so that patents remain until invalidated, see [L1 ].)


glennj this version is a translation of the algorithm as shown at wikipedia .

    namespace eval LZW {
        variable char2int
        variable chars
        for {set i 0} {$i < 256} {incr i} {
            set char [binary format c $i]
            set char2int($char) $i
            lappend chars $char
        }
    }

    proc LZW::encode {data} {
        variable char2int
        array set dict [array get char2int]

        set w ""
        set result [list]

        foreach c [split $data ""] {
            set wc $w$c
            if {[info exists dict($wc)]} {
                set w $wc
            } else {
                lappend result $dict($w)
                set dict($wc) [array size dict]
                set w $c
            }
        }
        lappend result $dict($w)
    }

    proc LZW::decode {cdata} {
        variable chars
        set dict $chars

        set k [lindex $cdata 0]
        set w [lindex $dict $k]
        set result $w

        foreach k [lrange $cdata 1 end] {
            set currSizeDict [llength $dict]
            if {$k < $currSizeDict} {
                set entry [lindex $dict $k]
            } elseif {$k == $currSizeDict} {
                set entry $w[string index $w 0]
            } else {
                error "invalid code ($k) in ($cdata)"
            }
            append result $entry
            lappend dict $w[string index $entry 0]
            set w $entry
        }
        return $result
    }

    set s TOBEORNOTTOBEORTOBEORNOT#
    set e [LZW::encode $s] ;# ==> 84 79 66 69 79 82 78 79 84 256 258 260 265 259 261 263 35
    set d [LZW::decode $e] ;# ==> TOBEORNOTTOBEORTOBEORNOT#

    # or
    expr {$s eq [LZW::decode [LZW::encode $s]]} ;# ==> 1

PT 13-Jun-2003: Is this code compatible with anything? compress(1), gzip(1) etc? Any references relevant to this code?


13-Jun-2003: As it stands, this code is not terribly useful other than as an exercise because

  • its not really compatible with anything other than itself
  • it returns a list of integers (the indices into the dictionary) rather than a binary compressed string
  • the dictionary grows without bound, rather than being adaptive

My original goal was to write some code that could translate a GIF into r/g/b values in pure tcl; that will still require parsing of the gif file format and binary variable word length reading.

This is a straightforward implementation of LZW following the pseudo code in http://www.cis.udel.edu/~amer/CISC651/lzw.and.gif.explained.html

JR


SEH The code below makes some changes for speed and expresses the results in a simple encoded format rather than a list of integers.

 namespace eval ::lzw {

 proc Compress {data} {
      if {$data == {}} {return {}}
      set cpre {}
      for {set x 0} {$x < 256} {incr x} {set dict([binary format c $x]) $x}
      set pos 0
      set rval {}
        set string_length_data [string length $data]
      while {$pos < $string_length_data} {
        set ch [string index $data $pos]
        incr pos
        set ci [array names dict -exact $cpre$ch]


        if {$ci != {}} {
          # string in dictionary
          append cpre $ch
        } else {
          set dict($cpre$ch) [array size dict]
          lappend rval $dict($cpre)
          set cpre $ch
        }
      }
      lappend rval $dict($cpre)
        foreach rv $rval {
                if {$rv == 38} {
                        append rvalEncode "&0;"
                } elseif {$rv == 59} {
                        append rvalEncode "&1;"
                } elseif {$rv > 255} {
                        set rv [expr $rv - 254]
                        append rvalEncode "&$rv;"
                } else {
                        append rvalEncode [binary format c $rv]
                }
        }
        set rvalEncode [string map {;& { }} $rvalEncode]
      puts "compressed from [string length $data] to [string length $rvalEncode]"
      return $rvalEncode
 }

 proc Decompress {cdataEncode} {
        if {$cdataEncode == {}} {return {}}
        set string_length_cdataEncode [string length $cdataEncode]
        set pos 0
        while {$pos < $string_length_cdataEncode} {
                set strIndex [string index $cdataEncode $pos]
                if {$strIndex == "&"} {
                        while {[set strIndex [string index $cdataEncode [incr pos]]] != "\;"} {
                                if {$strIndex == { }} {
                                        if {$cDatum == 0} {
                                                set cDatum 38
                                        } elseif {$cDatum == 1} {
                                                set cDatum 59
                                        } else {
                                                set cDatum [expr $cDatum + 254]
                                        }
                                        lappend cdata $cDatum
                                        unset cDatum
                                }
                                append cDatum $strIndex
                        }
                        if {$cDatum == 0} {
                                set cDatum 38
                        } elseif {$cDatum == 1} {
                                set cDatum 59
                        } else {
                                set cDatum [expr $cDatum + 254]
                        }
                        lappend cdata $cDatum
                        unset cDatum
                } else {
                        binary scan $strIndex c strIndex
                        lappend cdata $strIndex
                }
                incr pos
        }

      set cpre {}
      set dict {}
      for {set x 0} {$x < 256} {incr x} {lappend dict [binary format c $x]}
      set pos 0
      set rval {}
        set llength_cdata [llength $cdata]
      while {$pos < $llength_cdata} {
        set co [lindex $cdata $pos]
        incr pos
        if {$co >= [llength $dict]} {
          lappend dict $cpre[string index $cpre 0]
          set cpre [lindex $dict $co]
        } else {
          append cpre [string index [lindex $dict $co] 0]
          # this only won't apply for the very first character
          if {[string length $cpre] > 1} {
            lappend dict $cpre
          }
          set cpre [lindex $dict $co]
        }
        append rval [lindex $dict $co]
      }
      puts "uncompressed from [llength $cdata] to [string length $rval]"
      return $rval
 }

 }
 # end namespace eval ::lzw