Elias coding

if 0 {Richard Suchenwirth 2004-09-05 - In Tcl chatroom discussions on the binary image compression challenge, KBK mentioned the Elias way of representing positive integers as compact bit sequences.

Elias, P. Universal Codeword Sets and Representations of the Integers. IEEE Trans. Inform. Theory 21, 2 (March 1975), 194-203

Note that none of the Elias codes is able to represent the number zero. The smallest representable number is one.

In the gamma way [L1 ], the bit sequence of the number (without leading zeroes) is prefixed with a length marker: one less zeroes than its length, so that

 1 = 1
 2 = 010
 3 = 011
 4 = 00100

and so on. The bits sequence of a positive integer can be obtained like this:}

 proc int2bits int {
    set res ""
    while {$int>0} {
       set res [expr {$int%2}]$res ;# prepend - we have no [string insert]
       set int [expr {$int/2}]
    }
    set res
 }

# and so the Elias gamma function is just

 proc Elias'gamma int {
    set bits [int2bits $int]
    return [string repeat 0 [expr {[string length $bits]-1}]]$bits
 }

# and for a list of integers, which is the most useful application:

 proc Elias'gammas ints {
    set res ""
    foreach int $ints {append res [Elias'gamma $int]}
    set res
 }

# Now to decode such a bit string:

 proc Elias'decode'gammas bits {
    set res {}
    while {$bits ne ""} {
        regexp ^(0*) $bits -> zeroes
        set length [expr {[string length $zeroes]*2}]
        lappend res [bits2int [string range $bits 0 $length]]
        set bits [string range $bits [incr length] end]
    }
    set res
 }

# with this little helper, which doesn't bother for the leading zeroes:

 proc bits2int bits {
    set res 0
    foreach bit [split $bits ""] {
        set res [expr {$res*2+$bit}]
    }
    set res
 }

if 0 {Testing:

 % Elias'gammas {1 2 3 4 5}
 10100110010000101
 % Elias'decode'gammas [Elias'gammas {1 2 3 4 5}]
 1 2 3 4 5

So we managed to cram 5 integers into 17 bits, and extract them correctly, too. This code works very well for lists of small integers, like runlengths - the worst case there, the 010101... checkerboard, just takes up one bit per 1-length run. For bigger integers, the Elias delta encoding [L2 ] may be more compact. In the delta encoding the length information for the integer we are encoding is encoded with the Elias gamma code.}

 proc Elias'delta int {
    set bits [int2bits $int]
    return [Elias'gamma [string length $bits]][string range $bits 1 end]
 }
 proc Elias'deltas ints {
    set res ""
    foreach int $ints {append res [Elias'delta $int]}
    set res
 }
 proc Elias'decode'deltas bits {
    set res {}
    while {$bits ne ""} {
        regexp ^(0*) $bits -> zeroes
        set length [expr {[string length $zeroes]*2}]
        set l2 [bits2int [string range $bits 0 $length]]
        set l3 [expr {$length+$l2-1}]
        lappend res [bits2int 1[string range $bits [expr {$length+1}] $l3]]
        set bits [string range $bits [incr l3] end]
    }
    set res
 }

if 0 {Testing again:

 % Elias'decode'deltas [Elias'deltas {1 10 100 1000}]
 1 10 100 1000

The delta encoding is more compact if many numbers are bigger than 32 - here's an experimental table:

 % foreach i {1 2 4 8 16 32 64 128 256 512 1024 2048 4096} {
    puts $i:[string length [Elias'gamma $i]],[string length [Elias'delta $i]]}
 1:1,1
 2:3,4
 4:5,5
 8:7,8
 16:9,9
 32:11,10
 64:13,11
 128:15,14
 256:17,15
 512:19,16
 1024:21,17
 2048:23,18
 4096:25,19

if 0 {Another Elias code is omega coding, also called recursive Elias coding [L3 ]. It generalizes on the gamma and delta codes by putting a series of length prefixes in front of the encoded number, each prefix telling the length of the next in the sequence. The process stops when encountering a '0' bit after a prefix (Each length prefix starts with an '1' bit).}


[ Arts and crafts of Tcl-Tk programming | Category Compression ]