Version 0 of Elias coding

Updated 2004-09-05 20:58:27

if 0 {Richard Suchenwirth 2004-09-05 - In Tcl chatroom discussions on the binary image compression challenge, KBK mentioned the Elias way of representing positive integers as compact bit sequences.

Elias, P. Universal Codeword Sets and Representations of the Integers. IEEE Trans. Inform. Theory 21, 2 (March 1975), 194-203

In the gamma way, the bit sequence of the number (without leading zeroes) is prefixed with a length marker: one less zeroes than its length, so that

 1 = 1
 2 = 010
 3 = 011
 4 = 00100

and so on. The bits sequence of a positive integer can be obtained like this:}

 proc int2bits int {
    set res ""
    while {$int>0} {
       set res [expr {$int%2}]$res ;# prepend - we have no [string insert]
       set int [expr {$int/2}]
    }
    set res
 }

# and so the Elias gamma function is just

 proc Elias'gamma int {
    set bits [int2bits $int]
    return [string repeat 0 [expr {[string length $bits]-1}]]$bits
 }

# and for a list of integers, which is the most useful application:

 proc Elias'gammas ints {
    set res ""
    foreach int $ints {append res [Elias'gamma $int]}
    set res
 }

# Now to decode such a bit string:

 proc Elias'decode'gammas bits {
    set res {}
    while {$bits ne ""} {
        regexp ^(0*) $bits -> zeroes
        set length [expr {[string length $zeroes]*2}]
        lappend res [bits2int [string range $bits 0 $length]]
        set bits [string range $bits [incr length] end]
    }
    set res
 }

# with this little helper, which doesn't bother for the leading zeroes:

 proc bits2int bits {
    set res 0
    foreach bit [split $bits ""] {
        set res [expr {$res*2+$bit}]
    }
    set res
 }

if 0 {Testing:

 % Elias'gammas {1 2 3 4 5}
 10100110010000101
 % Elias'decode'gammas [Elias'gammas {1 2 3 4 5}]
 1 2 3 4 5

So we managed to cram 5 integers into 17 bits, and extract them correctly, too. This code works very well for lists of small integers, like runlengths - the worst case there, the 010101... checkerboard, just takes up one bit per 1-length run. For bigger integers, the Elias delta encoding may be more compact:}

 proc Elias'delta int {
    set bits [int2bits $int]
    return [Elias'gamma [string length $bits]][string range $bits 1 end]
 }
 proc Elias'deltas ints {
    set res ""
    foreach int $ints {append res [Elias'delta $int]}
    set res
 }
 proc Elias'decode'deltas bits {
    set res {}
    while {$bits ne ""} {
        regexp ^(0*) $bits -> zeroes
        set length [expr {[string length $zeroes]*2}]
        set l2 [bits2int [string range $bits 0 $length]]
        set l3 [expr {$length+$l2-1}]
        lappend res [bits2int 1[string range $bits [expr {$length+1}] $l3]]
        set bits [string range $bits [incr l3] end]
    }
    set res
 }

if 0 {The delta encoding is more compact if many numbers are bigger than 32 - here's an experimental table:

 % foreach i {1 2 4 8 16 32 64 128 256 512 1024 2048 4096} {
    puts $i:[string length [Elias'gamma $i]],[string length [Elias'delta $i]]}
 1:1,1
 2:3,4
 4:5,5
 8:7,8
 16:9,9
 32:11,10
 64:13,11
 128:15,14
 256:17,15
 512:19,16
 1024:21,17
 2048:23,18
 4096:25,19

Arts and crafts of Tcl-Tk programming }