[Richard Suchenwirth] 2004-09-02 - Discussion in the [Tcl chatroom] brought me to propose this challenge for a Tcl contest:
   * Given a set of moderate-sized black&white GIF images,
   * write a pair of encoder/decoder between photo image and a binary string
   * so that the decoding of the encoded image is equal to the original (lossless)
   * and the average length of the encoded strings is minimal

I prefer GIF here because it is lossless in itself, and can be read in by plain Tk. Bitmap images cannot be processed pixel-by-pixel, so they're excluded, although they're binary.

Test images (feel free to add more. The GIF file size is the initial target to beat):

courier12.gif - 1923 bytes:

[http://mini.net/files/courier12.gif]

times12i.gif - 1947 bytes:

[http://mini.net/files/times12i.gif]

castle.gif - 11598 bytes

[http://www.100dollarhamburger.com/castle.gif]

ouster.gif - 2775 bytes

[http://tcl.sf.net/tct/kennykb/ouster.gif]

----
Links to solutions to this challenge:

[PS] My you can find my attempt on [Binary image compression challenge - Pascal's Entry]. It compresses courier12.gif to 720 bytes and times12.gif to 834 bytes. Castle.gif is actually a compressed gif file! Mine 'compressed' to 13672...


[PS] Update 4sept2004: Attempt number two (second code block on my entry page): now with Huffman(?) encoding for runlengths which occur 5 or more times. This actually compresses castle.gif! My new size is 11442 bytes, which is 156 bytes less than the original file! courier12 compresses to 671 bytes, times12i compresses to 718...

----
Little helpers to compare and tally photo images:
 proc photo'eq {im1 im2} {
    #-- returns 1 if both images are exactly equal, else 0
    set h [image height $im1]
    if {[image height $im2] != $h} {return 0}
    set w [image width $im1]
    if {[image width $im2] != $w} {return 0}
    for {set y 0} {$y<$h} {incr y} {
        for {set x 0} {$x<$w} {incr x} {
            if {[$im1 get $x $y] ne [$im2 get $x $y]} {return 0}
        }
    }
    return 1
 }
 proc photo'colors img {
    #-- return a list of {{r g b} n} tallies of pixel colors,
    #-- sorted decreasing by n (number of pixels of that color)
    set h [image height $img]
    set w [image width  $img]
    for {set y 0} {$y<$h} {incr y} {
        for {set x 0} {$x<$w} {incr x} {
            set color [$img get $x $y]
            if {![info exists a($color)]} {set a($color) 0}
            incr a($color)
        }
    }
    foreach {color n} [array get a] {lappend tally [list $color $n]}
    lsort -decreasing -index 1 -integer $tally
 }

----

[JH]: Here is my solution, which doesn't compression at all (doesn't beat the GIF format, but is close, the 1923b becomes 2163b).  It just represents the data in a small, raw form, but not compressed.  It will hopefully provide someone with a starting point to apply compression.
''[MS]: castle.gif is 'compressed' to 15764b by this; gzip reduces that file to 10305b.''

 proc binimg'encode img {
     set clrs [photo'colors $img]
     if {[llength $clrs] != 2} { return -code error "not a 2 color image" }
     set clr0 [lindex $clrs 0 0]
     set clr1 [lindex $clrs 1 0]
     set h [image height $img]
     set w [image width  $img]
     set str ""
     for {set y 0} {$y<$h} {incr y} {
         for {set x 0} {$x<$w} {incr x} {
             set color [$img get $x $y]
             append str [string equal $color $clr1]
         }
     }
     foreach {r g b} $clr0 { set color0 [expr {$r<<16 | $g <<8 | $b}]; break }
     foreach {r g b} $clr1 { set color1 [expr {$r<<16 | $g <<8 | $b}]; break }
     # store image as <w><h><clr0><clr1><binimgdata> where w and h are shorts
     set binstr [binary format ssiib* $w $h $color0 $color1 $str]
     return $binstr
 }
 proc binimg'decode data {
     binary scan $data ssiib* w h color0 color1 clrs
     set img [image create photo -width $w -height $h]
     set clr(0) \#[format %.6x $color0]
     set clr(1) \#[format %.6x $color1]
     set i 0
     set data ""
     set line ""
     foreach c [split $clrs {}] {
         lappend line $clr($c)
         if {[incr i] eq $w} {
             set i 0
             lappend data $line
             set line ""
         }
     }
     $img put $data -to 0 0
     return $img
 }

See [Binary image compression challenge - mig's Entry] for a slight variant of this, which ''should'' compress better in general.
----
[DKF]: The simplest cheat way is to use ''gzip'' to compress the GIF files themselves!  Both images compress to somewhere in the region of 1600-1650 bytes.  Can you do better?

[MS]: Yes; gzip'ing Jeff's coded files reduces the thing to 640-654 bytes.
----
[kroc]: My attempt, based on RLE is here: [Binary image compression challenge - Kroc's Entry] and my results are:
   * 923 bytes for courier12.gif
   * 904 bytes for times12i.gif
   * 15421 bytes for castle.gif (this one is very hard to beat).
----
[KBK]: I'm putting another attempt over at [Binary image compression challenge - KBK's entry]. It
outperforms [PS]'s entry on two of the three test cases, and does only slightly worse on the third:

   * 685 bytes for courier12.gif (vs. 671)
   * 682 bytes for times12i.gif (vs. 718)
   * 9260 bytes for castle.gif (vs. 11442)
   * 1856 bytes for ouster.gif

The way it works is that it marches through the image along the [Hilbert curve].  The advantage to the Hilbert curve as opposed to a raster pattern is that areas of a solid color are likely to result in runs that are much longer than a scan line, even though the other color is present in the same row and column.  Once the data are ordered in Hilbert-curve sequence, they are converted to a set of run lengths; note that since runs of 0 and 1 will alternate, the run lengths are all that's needed; the colors are not stored explicitly.  The run lengths are arbitrarily limited to 127; longer runs are broken into pieces with dummy 0-length runs in between.  Finally, [Huffman coding] is used to make the run lengths into a string of bits.  

Decoding is the same process, only in reverse; the Huffman bit sequences are expanded into run lengths and then to a bit string, and the bit string is folded along the Hilbert curve to yield the image again.

Compression is, of course, lossless.
----


[Arts and crafts of Tcl-Tk programming]