Erasure Tolerant Codes

11.02.2008 Jaf

During a Discussion at a LUG meeting, a few had the idea to use the free data repositories that Google, Yahoo, gmx, etc. provide to store a kind of off-site backup.

The opinion was that the data had to be somewhat securely stored, that meant that one should be able to access the data even when one or more of the servers were down.

A bit of 'Googling' showed that there are such things: erasure-tolerant codes that don't need 100% replication of the data, actually like 'RAID', but on a file-by-file basis.

The idea used for erasure tolerance is quite simple, an example:

  • There are 4 bytes to store
  • The storage can be made on 6 different locations
  • This means 2 erasures are possible without compromising the data integrity
  • Make a vector out of the 4 bytes, multiply a 4(columns)X6(rows) matrix with the vector
  • The result is a vector with 6 elements
  • The matrix is chosen so that any 4 of its rows result in an invertible matrix
  • Upon losing 2 of the result bytes, choose the rows that generated the remaining bytes
  • Build a matrix out of the rows and invert it
  • Multiply the inverted matrix with the vector made out of the remaining data
  • Et voila, your original vector is back.

A big BUT remains there: But you can only get you data back if the multiplication can be made without loss of precision, e.g., no floats.

That's why one uses Galois fields for all the operations needed to encode or to decode the data. In a Galois field the inverse of a number is also from the field, if you start with an integer field, the result of any operation will still be in the field.


Here is my implementation of the above, actually a proof of concept, that it can be done in Tcl and then you get the GUI for free.

In conjunction with fuse I think it would even be possible to implement a Web-Based-RAID


Galois Field Part:

  • Matrix and vector operations
  • Operations using look-up tables
  • Creation of look-up tables
  • Basic functions needed until look 0p tables are initialized
  • Helpers
 ################################################################################
 # vector and matrix operations                                                 #
 ################################################################################
 proc mat_invert_dec {mat} {
 ###############################
 # inverts a matrix in GF(2^n) #
 ###############################
     set rows [llength $mat]
     # make sure each row has same length, insuring we got a square matrix
     foreach row $mat {
         if {[llength $row] != $rows} {return}
       }
     # initialize the inverse with E, inversion like at school:
     # transform mat into E, apply needed operations to inv too, at the 
     # end inv will contain the inverse of mat.
     set inv [mat_unity_dec $rows]
     for {set row 0} {$row < $rows} {incr row} {
         # if the pivot is zero, then seek other row to add until it is nonzero
         if {[lindex [lindex $mat $row] $row] == 0} {
             for {set other $row} {$other < $rows} {incr other} {
                 # don't add this row onto itself, would set the row to zero
                 if {$other == $row} {continue}
                 # get the appropriate element
                 if {[lindex [lindex $mat $other] $row] != 0} {break}
               }
             # at this point either we have a row we can add or no other row
             # is possible. 
             # The latter case makes the matrice non invertible
             if {$other == $rows} {return}
             # a possible row was found
             set new_row [vect_vect_add_dec [lindex $mat $row]\
                                            [lindex $mat $other]]
             set mat [lreplace $mat $row $row $new_row]
             # reflect to E
             set new_inv [vect_vect_add_dec [lindex $inv $row]\
                                            [lindex $inv $other]]
             set inv [lreplace $inv $row $row $new_inv]
           }; # if pivot == 0
         # set the pivot to 1
         set piv_inv [inv_dec [lindex [lindex $mat $row] $row]]
         set new_row [scal_vect_mult_dec $piv_inv [lindex $mat $row]]
         set mat [lreplace $mat $row $row $new_row]
         # reflect to E
         set new_inv [scal_vect_mult_dec $piv_inv [lindex $inv $row]]
         set inv [lreplace $inv $row $row $new_inv]
         # now pivot  is 1
         for {set other 0} {$other < $rows} {incr other} {
             # don't apply to self
             if {$other == $row} {continue}
             # make the entries in the column where the pivot resides zero
             set this_elem [lindex [lindex $mat $other] $row]
             set new_mat [scal_vect_mult_dec $this_elem [lindex $mat $row]]
             set new_mat [vect_vect_add_dec $new_mat [lindex $mat $other]]
             set mat [lreplace $mat $other $other $new_mat]
             # reflect to E
             set new_inv [scal_vect_mult_dec $this_elem [lindex $inv $row]]
             set new_inv [vect_vect_add_dec $new_inv [lindex $inv $other]]
             set inv [lreplace $inv $other $other $new_inv]
           }
       }; # end foreach row of the matrix
     # check that row is not only zeros, matrice would have been not invertible
     foreach row $mat {
         set sorted [lsort -unique $row]
         if {([llength $sorted] == 1) && ([lindex $sorted 0] == 0)} { return }
       }
     return $inv
   }
 proc scal_vect_mult_dec {scal vect} {
 #####################################
 # multiplies a vector with a scalar #
 #####################################
     set result {}
     foreach elem $vect {lappend result [mult_dec $scal $elem] }
     return $result
   }
 proc vect_vect_add_dec {vecta vectb} {
 ###################
 # Adds two vectors#
 ###################
     if {[llength $vecta] != [llength $vectb]} {return}
     set result {}
     foreach elema $vecta elemb $vectb {lappend result [add_dec $elema $elemb]}
     return $result
   }
 proc mat_unity_dec {ord} {
 ####################################
 # makes an (ord x ord) unit matrix #
 ####################################
     for {set row 0} {$row < $ord} {incr row} {
         set row_str [string replace [string repeat 0 $ord] $row $row 1]
         lappend unity [split $row_str ""]
       }
     return $unity
   }
 proc mat_vect_mult_dec {mat vect} {
 #########################
 # matrix vector product #
 #########################
     set retvect {}
     foreach row $mat {
         set prod [vect_vect_dot_dec $row $vect]
         if {$prod eq ""} {return}
         lappend retvect $prod
       }
     return $retvect
   }
 proc vect_vect_dot_dec {vecta vectb} {
 ######################################
 # compute the dot product of vectors #
 ######################################
     if {[llength $vecta] != [llength $vectb]} {return}
 
     set result 0
     foreach elema $vecta elemb $vectb {
         set prod [mult_dec $elema $elemb]
         if {$prod == ""} {return}
         set result [add_dec $result $prod]
       }
     return $result
   }
 ################################################################################
 # operations using the gal_state look up tables will work after initialisation #
 # with an irreducible polynomial and after choice of a generator element.      #
 # All operations use decimals                                                  #
 ################################################################################
 proc add_dec {a b} {
 #####################
 # performs addition #
 #####################
     return [gal_add_dec $a $b]
   }
 proc mult_dec {a b} {
 #######################################################
 # multiply using the logarithm and exp look up tables #
 #######################################################
     global gal_state
 
     if {($a == 0)||($b == 0)} {return 0}
     set loga [log $a]
     set logb [log $b]
     if {($loga == "")||($logb == "")} {return}
     if {$a == 1} {return $b}
     if {$b == 1} {return $a}
     set logres [expr ($loga+$logb)%$gal_state(max_elem)]
     return [exp $logres]
   }
 proc inv_dec {num} {
 ###############################
 # returns  the inverse of num #
 ###############################
     global gal_state
 
     if {![info exists gal_state]} {return}
     return [lindex $gal_state(inv)   $num]
   }
 proc exp {num} {
 ###########################
 # returns $primitive^$num #
 ###########################
     global gal_state
 
     if {![info exists gal_state]} {return}
     return [lindex $gal_state(exp) $num]
   }
 proc log {num} {
 ################################
 # returns log($num)|$primitive #
 ################################
     global gal_state
 
     if {![info exists gal_state]} {return}
     if {$num == 0} {return}
     return [lindex $gal_state(log)   $num]
   }
 
 
 
 ################################################################################
 # Creation of the look up tables to speed up operations                        #
 ################################################################################
 proc init_gal_dec {poly primitive} {
 #########################################################################
 # create logarithm table in GF(2^ord), ord is the order of the polynom  #
 # create a exponetial table in GF(2^ord)                                #
 # records the polynomial used for the field                             #
 # records the primitive element                                         #
 # records the maximum of the field                                      #
 #########################################################################
     global gal_state
     catch {unset gal_state}
 
     if {![gal_is_primitive_elem_dec $primitive $poly]} {return 0}
     
     set num_ent [get_two_power_dec $poly]
     set gal_state(max_elem) [expr {$num_ent-1}]
 
     set num_exps [expr $num_ent - 1]
 
     set gal_state(poly) $poly
     set gal_state(primitive) $primitive
     # make exponential table
     set gal_state(exp) [list 1 $primitive]
     incr num_exps -1
     while {$num_exps} {
         lappend gal_state(exp) \
           [gal_mult_dec $poly [lindex $gal_state(exp) end] $primitive]
           incr num_exps -1
       }
     # make logarithm table
     lappend gal_state(log) {}
     set num_logs [expr $num_ent - 1]
     for {set num 1} {$num <= $num_logs} {incr num} {
         lappend gal_state(log) [lsearch -exact $gal_state(exp) $num]
       }
     # make inverse table  
     set log1 [log 1]
     set gal_state(inv)  [list {} "1"]
     for {set num 2} {$num <= $num_logs} {incr num} {
         set logn [log $num]
         lappend gal_state(inv) [exp [expr {$gal_state(max_elem) - $logn}]]
       }
     return 1
   }
 ################################################################################
 # Basic operations in GF(2^n), these are to be used in order to create the     #
 # look up tables for exp,log & inv. The look up tables are supposed to speed   #
 # up calculations                                                              #
 ################################################################################
 proc gal_add_dec {a b} {
 ##################################
 # performs addition in GF(2^n)   #
 # Addition is done bitwise mod 2 #
 ##################################
     if {![string is integer $a]} {return ""}
     if {![string is integer $b]} {return ""}
     return [expr {$a^$b}]
   }
 proc gal_poly_mult_dec {a b} {
 ##############################################
 # performs polynom multiplication in GF(2^n) #
 # Amounts to long multiplication with the    #
 # addition being done bitwise mod 2          #
 ##############################################
     set multiplicator $a
     set multiplicand  $b
     if {$a < $b} {
         set multiplicator $b
         set multiplicand  $a
       }
     # peasants multiplication
     set result 0
     while {$multiplicator != 0} {
         if {[expr {$multiplicator&1}]} {
             set result [gal_add_dec $result $multiplicand]
           }
         set multiplicand  [expr {$multiplicand  << 1}]
         set multiplicator [expr {$multiplicator >> 1}]
       } 
     set result
   }
 proc gal_remainder_dec {a poly} {
 #######################
 # performs a mod poly #
 #######################
     set poly_power [get_two_power_dec $poly]
     set a_power    [get_two_power_dec $a]
     while {$a_power >= $poly_power} {
         set pow_two [get_two_power_dec $a]
         set trial $poly
         while {$trial < $pow_two} {set trial [expr {$trial << 1}]}
         set a [gal_add_dec $a $trial]
         set a_power [get_two_power_dec $a]
       }
     return $a
   }
 proc gal_mult_dec {poly a b} {
 #####################################################
 # performs multiplication of two numbers in gf(2^n) #
 # a x b = polymult(a,b) mod poly                    #
 #####################################################
     return [gal_remainder_dec [gal_poly_mult_dec $a $b] $poly]
   }
 proc gal_find_gen_poly_dec {ord} {
 ##########################################################
 # finds generator polys in GF(2^n), generators are prime #
 # polynoms e.g. irreducible in GF(2^n)                   #
 ##########################################################
     set max_find   [expr int(pow(2,$ord+1))]
     set start_find [expr int(pow(2,$ord))]
     set poly_list {}
     set ret_list {}
     # init with  3
     lappend poly_list 3
     set counter 4
     while {$counter < $max_find} {
         # only odd numbers
         if {[expr {$counter&1}]} {
             set prime 1
             foreach elem $poly_list {
                 if {[gal_remainder_dec $counter $elem] == 0} {
                     set prime 0
                     break
                   }
               }
             if {$prime} {
                 lappend poly_list $counter
                 if {$counter >= $start_find} {
                     lappend ret_list $counter
                   }
               }
           }
         incr counter  
       }
     set ret_list  
   }
 proc gal_is_primitive_elem_dec {num poly} {
 ############################################################
 # checks whether num is a primitive element in the GF(2^n) #
 # defined by poly. Aprimitive element generates all field  #
 # values with its powers (excluding 0)                     #
 ############################################################
     set max_num [get_two_power_dec $poly]
     if {$num >= $max_num} {return 0}
     if {$num == 1} {return 0}
     incr max_num -1
     if {$max_num <= 0} {return 0}
     set power_list {}
     set power $num
     lappend power_list $power
     while {$power != 1} {   
         set power [gal_mult_dec $poly $power $num]
         lappend power_list $power
       }
     if {[llength [lsort -integer -unique $power_list]] == $max_num} {return 1}
     return 0
 
   }
 proc gal_is_primitive_poly_dec {poly} {
 ######################################################################
 # a polynom is primitive in GF(2^n)|poly if 2 is a primitive element #
 ######################################################################
     return [gal_is_primitive_elem_dec 2 $poly]
   }
 proc gal_get_generators_dec {poly} {
 #####################################################
 # finds all generating elements of the GF(2^n)|poly #
 #####################################################
     set max_num [get_two_power_dec $poly]
     incr max_num -1
     if {$max_num <= 0} {return}
     set gen_list {}
     for {set num 2} {$num < $max_num} {incr num} {
         if {[gal_is_primitive_elem_dec $num $poly]} {
             lappend gen_list $num
           }
       }
     return $gen_list  
   }
 ################################################################################
 # HELPER                                                                       #
 ################################################################################
 proc get_two_power_dec {num} {
 ##################################
 # find power of 2 lower than num #
 ##################################
     set result $num
     set shifted $num
     while {$shifted} {
         set shifted [expr {$shifted >> 1}]
         set result [expr {$result | $shifted}]
       }
     return [expr {$result ^ ($result >>1)}]
   }

Anyone wanting to explore GF(2^n) can use this part, it is self-contained.

Why look-up tables?

In GF(whatever) the numbers are actually symbols representing polynomials, a multiplication in GF(whatever) is actually the result of : polymult(poly1, poly2) Modulo (generator polynom), that costs! (even on today's GHz machines)

That's why after finding one generator polynom and one generator element you build up look-up tables for exponentiation, logs and inverses, and use only these for heavy-duty calculations.

I would advise against using gal_find_gen_poly_dec for a polynom order >= 16; it takes ages to complete.

Lars H, 2008-02-14: What is that algorithm, anyway? Is it tabulating all irreducible polynomials, or what?

Jaf, 2008-02-14: Yes, it tabulates all irreducible polynomials up to the order given. It does this by checking the remainder of the polynomial being checked against all already-found irreducible polynomials. It is O(n^2). The remainder check does not use multiplications.

Lars H,2008-02-14:I can't help mentioning that there are tricks one can use to speed up the polymult operation. For very large n one should do some kind of fast Fourier transform to approach O(nlogn) complexity, but more interesting here are probably tricks that work below a given bound on n.

If the data bits are not packed tight in the integer used to encode the polynomial, then one can even use integer multiplication to do most of the work. For example, if we know that n<16 then it suffices to put the data at every fourth bit — this corresponds to using the binary digit expansion as a hex digit expansion. The following (rather corny) code works for n<=8 (main culprit is the binary conversions, but one also needs a 32-bit*32-bit product, hence the wide):

 proc poly_mult8 {a b} {
    binary scan [binary format cc $a $b] B8B8 a_spaced b_spaced
    set prod [expr ( wide(0x$a_spaced) * 0x$b_spaced ) & 0x111111111111111]
    binary scan [binary format B16 [format %016lx $prod]] S res
    return $res
 }

Without conversions, it would of course be much more efficient (less messy to brace the expression, for one) ...

Between the two extremes, one can split the polynomial into blocks to speed things up. Keeping a table of all products of two polynomials of degree <=3 is for example quite feasible — then one can do 4 bits in each step rather than 1 bit.


The encoder part:

  • Relies on the existence of an initialized GF(2^8)
  • Will not overwrite existing files
  • Output is OriginalFileName_coded_n (where n is the chunk number)
 proc init_coder {filename chunks erasures matrix_type} {
 #######################################################################
 # this function inits the coder, gal_state  has to exist upon calling #
 # the galois field MUST be 2^8 at the moment. The read/write  routines#
 # with arbitrary number of  bits don't  exist  yet. The 2^16 case has #
 # been left out  as the time to generate  the irreducible polynomials #
 # is very unsatisfying,  at least in this implementation of galois.tcl#
 #######################################################################
     global coder_state
     global gal_state
 
     ## encoding stuff
     # check that galois field has been initialised
     if {![info  exists gal_state]} { return 0 }
     # check that the field is 2^8
     if {$gal_state(max_elem) != 255} { return 0 }
     # check that the number of chunks permits a E matrix
     # TODO Check above
     if {$chunks   >= $gal_state(max_elem)} { return 0 }
     # check that number of erasures is not higher than n=max_elem-1
     # a vandermonde is invertible if it is a (n-1)x(n-1) 
     # TODO Check above
     # if you REALLY want so many erasures: 100% more generated data , replicate
     # the filei, you'll be better off
     if {$erasures >= $chunks} { return 0 }
     # check matrix type for computed extra erasure chunks
     switch -exact -- $matrix_type {
         "vandermonde" {}
         default {
             puts "Sorry this matrix type ($matrix_type) is not implemented yet"
             return 0
           }
       }          
     ## file stuff          
     # input file exists          
     if {![file exists $filename]} {return 0}
     # and is readeable for user
     if {[catch {open $filename r} fileptr]} {return 0}
     # file is open, configure for binary access
     fconfigure $fileptr -translation binary
 
     # fill state array, bookkeeping
     set coder_state(chunks) $chunks
     set coder_state(erasures) $erasures
     set coder_state(mat_type) $matrix_type
     # the encoding matrix
       # unit matrix part
     set coder_state(coder_matrix) [mat_unity_dec $chunks]
       # vandermonde part, TODO: rework this into a proc to allow at least
       # for a cauchy matrice
     for {set counter 1} {$counter <= $erasures} {incr counter} {
         set this_row {}
         set this_elem 1
         lappend this_row $this_elem
         for {set col_count 1} {$col_count < $chunks} {incr col_count} {
             set this_mult [expr $counter]
             set this_elem [mult_dec $this_elem $this_mult]
             lappend this_row $this_elem
           }
         lappend coder_state(coder_matrix) $this_row
       }
     # input file name  
     set coder_state(filename) $filename
     # input file descriptor
     set coder_state(fileptr) $fileptr
     # file size 
     set coder_state(size) [file size $filename]
 
     ## the idea is to construct a string that can be evalled later to 
     ## convert the read bytes into the appropriate decimal values.
     # the value names to receive the bytes
     set vals {}
     for {set counter 0} {$counter < $chunks} {incr counter} {
         lappend vals val$counter
       }
     set coder_state(vals_names) $vals  
 
     set coder_state(vals_vals) \
       [string map {"val" "$val"} [join $coder_state(vals_names)]]
     
     # the variable to receive read bytes
     set coder_state(bytes_read) ""
 
     # conversion string
     set conv [concat  binary scan {$coder_state(bytes_read)}]
     set conv [concat $conv [string repeat c1 $coder_state(chunks)]]
     set conv [concat $conv $coder_state(vals_names)]
     set coder_state(conv) $conv
 
     return 1
   }
 proc read_chunks {} {
 ############################################################################
 # reads the number of chunks specified in the coder_state(chunks) variable #
 # returns a list  containing  the  values  of  the  chunks  with  the most #
 # significant chunk at list 0                                              #
 ############################################################################
     global coder_state
 
     # coder must have been intialised
     if {![info  exists coder_state]} { return }
     # no eof reached
     if {[eof $coder_state(fileptr)]} { return }
     # read the data into coder_state(bytes_read)
     set coder_state(bytes_read) \
       [read $coder_state(fileptr) $coder_state(chunks)]
     # clean up before conversion
     foreach name $coder_state(vals_names) {catch {unset $name}}
     # convert the data
     set conversions [eval $coder_state(conv)]
     # make a list, either full or partial
     set subst_str $coder_state(vals_vals)
     if {$conversions != $coder_state(chunks)} {
         # split the subst_lst to extract fields, join result to have string
         set subst_str [join [lrange [split $subst_str] 0 [incr conversions -1]]]
       }
     # substitute string, make proper list 
     set ret [split [subst $subst_str]]
     # transform to unsigned
     set num_elem [llength $ret]
     for {set idx 0} {$idx < $num_elem} {incr idx} {
         lset ret $idx [expr {[lindex $ret $idx] & 0Xff}] 
       }
 
     return $ret
   }
 proc write_coded {} {
 ###################################################################
 # creates the output files, if  files  by  same name exist aborts #
 # writes the coder_state into each file, writes the stream number #
 # switches to binary mode and streams out the encoding results    #
 ###################################################################
     global coder_state
     global gal_state
 
     # coder must have been intialised
     if {![info  exists coder_state]} {return 0}
     # generate names for the output files
     set num_streams [expr $coder_state(chunks) + $coder_state(erasures)]
     set outfiles {}
     for {set index 0}  {$index < $num_streams} {incr index} {
         lappend outfiles "$coder_state(filename)_coded_${index}"
       }
     puts "will encode in: $outfiles"  
     # check that files do not overwrite 
     foreach name $outfiles {if {[file exists $name]} {return 0}}
     # open file channels
     set channels {}
     foreach name $outfiles {
         if {[catch {open $name w} fileptr]} {
             foreach chan $channels {close $chan}
             return 0
           }
         lappend channels $fileptr  
       }
     # write ascii header, switch to binary for further writing
       # empty bytes_read just to be on the safe side
     set coder_state(bytes_read) ""
     #set state_list [array get coder_state]
     set state_list {}
     foreach idx {chunks erasures filename size} {
         lappend state_list $idx $coder_state($idx)
       }
     lappend state_list "poly" $gal_state(poly)
     lappend state_list "primitive" $gal_state(primitive)
     set counter 0
     foreach chan $channels {
         set this_list $state_list
         lappend this_list "code_row" 
         lappend this_list [lindex $coder_state(coder_matrix) $counter] 
         lappend this_list "stream" $counter
         puts $chan $this_list
         flush $chan
         fconfigure $chan -translation binary
         incr counter
       } 
     # encode
     set mat [lrange $coder_state(coder_matrix) $coder_state(chunks) end]
     set chunks $coder_state(chunks)
     while {![eof $coder_state(fileptr)]} {
         set in_vect [read_chunks]
         while {[llength $in_vect] < $chunks} {lappend in_vect 0}
         set out_vect [concat $in_vect [mat_vect_mult_dec $mat $in_vect]]
         foreach chan $channels elem $out_vect {
             puts -nonewline $chan [binary format c $elem]
           }
       }
     # clean up  
     foreach chan $channels {close $chan}
     return 1
   }
  • If you take a peek at the coded files you'll see that the first line is a plain text information line that gets gobbled by the decoder to initialize its own state.
  • Only the Vandermonde matrix type is implemented; I will implement a Cauchy variant so that we have more choices
  • You can pgp' your file first, then code it. As soon as I am satisfied with the codec, I'll implement an on the fly'' pgp in the encoder. I think there's a package somewhere....

The decoder part:

 proc init_decoder {file_list} {
 ####################################################################
 # initialises the decoder by reading the header lines of the files #
 ####################################################################
     global gal_state
     global decoder_state
     global tmp_arr
     global channel_list
 
     # file list is not empty
     if {![llength $file_list]} {return 0}
     # files exist
     foreach file_name $file_list {if {![file exists $file_name]} {return 0}}
     # files open
     set channel_list {}
     foreach file_name $file_list {
         if {[catch {open $file_name} chan]} {
             foreach chan $channel_list {close $chan}
             return 0
           }
         lappend channel_list $chan  
       }
 
     # read in ascii data
     catch {unset file_info}
     foreach chan $channel_list {gets $chan file_info($chan)}
     # ascii data is always one line at beginning, switch to binary mode now
     foreach chan $channel_list {fconfigure $chan -translation binary}
     # check consistency
     catch {unset tmp_arr}
     foreach chan $channel_list {
         foreach {name val} $file_info($chan) {
             set tmp_arr(${chan}_$name) $val
           }
       }
     # streams and  code_rows are different
     set chunk_list {}
     foreach idx [array names tmp_arr "*stream"] {
         lappend chunk_list $tmp_arr($idx)    
       }
     if {[llength $chunk_list] != [llength [lsort -unique $chunk_list]]} {
         foreach chan $channel_list {close $chan}
         puts "Encountered duplicate stream"
         return 0
       }
     set tmp_list {}
     foreach idx [array names tmp_arr "*code_row"] {
         lappend tmp_list [join $tmp_arr($idx) ""]    
       }
     if {[llength $tmp_list] != [llength [lsort -unique $tmp_list]]} {
         foreach chan $channel_list {close $chan}
         puts "Encountered duplicated code row"
         return 0
       }
     # chunks, erasures, filename, size, poly and primitive must be the same
     set check_list [list chunks erasures filename size poly primitive]
     foreach check $check_list {
         set tmp_list {}
         foreach idx [array names tmp_arr "*$check"] {
             lappend tmp_list [join $tmp_arr($idx) ""]    
           }
         if {[llength [lsort -unique $tmp_list]] != 1} {
             foreach chan $channel_list {close $chan}
             puts "Encountered inconsistency while checking: $check"
             return 0
           }
         set decoder_state($check) [lindex $tmp_list 0]  
       }
     # check that enough chunks are there
     if {[llength $file_list] < $decoder_state(chunks)} {
         puts "Not enough chunks to restore data"
         foreach chan $channel_list {close $chan}
         return 0
       }
     # check whether we received the chunks that were used in the unity matrix
     # part of the coding matrix
     set decode_list {}
     set list_end [expr $decoder_state(chunks)-1]
     set temp_list [lrange [lsort -integer  $chunk_list] 0 $list_end]
     if {([lindex $temp_list 0]         == 0        ) && \
         ([lindex $temp_list $list_end] == $list_end)} {
         puts "Chunks from unity matrix available, will use multiplex"
         foreach chan $channel_list {
             lappend decode_list [list $chan $tmp_arr(${chan}_stream)]  
           }
         set decode_list [lsort -integer -index 1 $decode_list]
         set decode_list [lrange $decode_list 0 $list_end]
         set decoder_list {}
         foreach  elem $decode_list {lappend decoder_list [lindex $elem 0]}
         unset decode_list
         mux $decoder_list
       } else {
         puts "At least one original chunk is missing, will use restore"
         set matrix {}
         set decoder_list {}
         foreach chan $channel_list {
             lappend decoder_list  $chan 
             lappend matrix $tmp_arr(${chan}_code_row)
           }
         set decoder_list [lrange $decoder_list 0 $list_end]
         set matrix       [lrange $matrix       0 $list_end]
         init_gal_dec $decoder_state(poly) $decoder_state(primitive)
         puts "matrix: $matrix"
         set matrix [mat_invert_dec $matrix]
         puts "inverse: $matrix"
         restore $decoder_list $matrix 
       }
 
     # files close
     foreach chan $channel_list {close $chan}
   }
 proc mux {chan_list} {
 ########################################################################
 # reads one byte at a time from the channels in chan_list sequentially #
 # and write them into the file given by decoder_state(filename)        #
 ########################################################################
     global decoder_state
 
     # prep output file 
     set out_name $decoder_state(filename)_decoded
     puts "will write to $out_name"
     if {[file exists $out_name]} {puts "out file exists";return 0}
     if {[catch {open $out_name w} out_chan]} {puts "could not open";return 0}
     fconfigure $out_chan -translation binary
     # stream out until num bytes of original file is reached
     set out_counter 0
     while {1} {
         foreach chan $chan_list {
             puts -nonewline $out_chan [read $chan 1]
             incr out_counter
             if {$out_counter >= $decoder_state(size)} {
                 close $out_chan
                 return 1
               }
           }
       }
   }
 proc restore {chan_list matrix} {
 ################################################
 # The actual decoding stuff happens here       #
 # Given the channels and the decoding matrix   #
 # The bytes are read from the different chunks #
 # Multiplied with the matrix and shipped out   #
 ################################################
     global decoder_state
 
     # prep output file 
     set out_name $decoder_state(filename)_decoded
     puts "will write to $out_name"
     if {[file exists $out_name]} {puts "out file exists";return 0}
     if {[catch {open $out_name w} out_chan]} {puts "could not open";return 0}
     fconfigure $out_chan -translation binary
 
 
 
     set out_counter 0
     while {1} {
         set vect {}
         foreach chan $chan_list {
             binary scan [read $chan 1] c1 byte
             lappend vect [expr {$byte & 0xff}]
           }
         set vect [mat_vect_mult_dec $matrix $vect]
         if {![llength $vect]} {return}
         foreach byte $vect {
             puts -nonewline $out_chan [binary format c $byte]
             incr out_counter 
             if {$out_counter >= $decoder_state(size)} {
                 close $out_chan
                 return 1
               }
           }
       }
 
   }

A crude GUI:

 #!/usr/bin/env tclsh
 # packages
 package require Tk
 package require BWidget
 # sources, TODO these will be transformed into packages later
 #source galois_dec.tcl
 #source encoding.tcl
 #source decoder.tcl
 # GUI state array
 set gui_state(#) anchor
 set gui_state(field_order) 8
 set gui_state(irr_polys) {}
 set gui_state(polylistbox) ""
 set gui_state(primlabeltext) ""
 set gui_state(generatorlistbox) ""
 set gui_state(generators) {}
 set gui_state(poly) ""
 set gui_state(generator) ""
 set gui_state(fieldready) ""
 set gui_state(filetoencode) ""
 set gui_state(chunks) ""
 set gui_state(erasures) ""
 set gui_state(matrixtype) "vandermonde"
 set gui_state(coderready) ""
 set gui_state(filestodecode) {}
 set gui_state(filestodecodelistbox) ""
 # GUI helpers
 proc decode_files {} {
     global gui_state
     init_decoder $gui_state(filestodecode)
   }
 proc add_file_to_decode {} {
     global gui_state
     set thisfile [tk_getOpenFile -title "Choose File to Decode"]
     lappend gui_state(filestodecode) $thisfile
   }
 proc remove_file_to_decode {} {
     global gui_state
     set thisindex [$gui_state(filestodecodelistbox) curselection]
     if {$thisindex == ""} {return}
     $gui_state(filestodecodelistbox) delete $thisindex $thisindex
   }
 proc gui_init_coder {} {
     global gui_state
     set gui_state(coderready) ""
     if {$gui_state(filetoencode) == ""} {return 0}
     if {$gui_state(chunks)       == ""} {return 0}
     if {$gui_state(erasures)     == ""} {return 0}
     if {$gui_state(matrixtype)   == ""} {return 0}
     if {![string is integer $gui_state(chunks)  ]} {return 0}
     if {![string is integer $gui_state(erasures)]} {return 0}
     if {![init_coder $gui_state(filetoencode)\
                      $gui_state(chunks)      \
                      $gui_state(erasures)    \
                      $gui_state(matrixtype)] } {
         return 0
       }
     set gui_state(coderready) "Coder Initialised"  
     return 1
   }
 proc get_file {} {
     global gui_state
     set gui_state(filetoencode) [tk_getOpenFile \
                                   -initialfile $gui_state(filetoencode) \
                                   -title "Choose File to Encode"]
     set gui_state(coderready) ""                              
   }
 proc init_field {} {
     global gui_state
     set gui_state(fieldready) ""
     set ready 0
     if {($gui_state(poly) != "") && ($gui_state(generator) != "")} {
         set ready [init_gal_dec $gui_state(poly) $gui_state(generator)]
       }
     if {$ready} {set gui_state(fieldready) "Field intialised"} 
   }
 proc use_poly {} {
     global gui_state
     set index [$gui_state(polylistbox) curselection]
     if {$index != {}} {
         set gui_state(poly) [lindex $gui_state(irr_polys) $index]
       }  
     set gui_state(fieldready) ""  
     set gui_state(generator) ""  
     set gui_state(generators) {}
   }
 proc use_gen {} {
     global gui_state
     set index [$gui_state(generatorlistbox) curselection]
     if {$index != {}} {
         set gui_state(generator) [lindex $gui_state(generators) $index]
       }  
     set gui_state(fieldready) ""  
   }
 proc get_irr_polys {} {
     global gui_state
     set gui_state(irr_polys) [gal_find_gen_poly_dec $gui_state(field_order)]
     $gui_state(polylistbox) selection clear 0 end
     $gui_state(polylistbox) selection set 0 0
     set gui_state(generators) {}
     set gui_state(generator) ""
     set gui_state(poly) ""
     set gui_state(primlabeltext) ""
     set gui_state(fieldready) ""
   }
 proc check_prim_poly {} {
     global gui_state
     set index [$gui_state(polylistbox) curselection]
     if {$index != {}} {
         set poly [lindex $gui_state(irr_polys) $index]
         set not ""
         if {![gal_is_primitive_poly_dec $poly]} { set not "not " }
         set gui_state(primlabeltext) ""
         append gui_state(primlabeltext) $poly " is " $not "primitive"
       }
   }
 proc get_gen_elems {} {
     global gui_state
     set index [$gui_state(polylistbox) curselection]
     if {$index != {}} {
         set poly [lindex $gui_state(irr_polys) $index]
         set gui_state(generators) [gal_get_generators_dec $poly]
       }
     $gui_state(generatorlistbox) selection clear 0 end
     $gui_state(generatorlistbox) selection set 0 0
     set gui_state(generator) ""
     set gui_state(fieldready) ""
   }
 #### prep notebook 
 set nb [NoteBook .nb -side top]
 $nb insert 0 galois -text "Galois"
 $nb insert 1 encode -text "Encode"
 $nb insert 2 decode -text "Decode"
 
 ### notebook page: galois
 set page [$nb getframe galois]
 # field order
 set thisframe [frame $page.galorderframe]
   set thislabel [label $thisframe.galorderlabel -text "field order"]
   set thisentry [entry $thisframe.galorderent]
   $thisentry configure -width 4 -textvariable gui_state(field_order)
   pack $thisentry -side right  
   pack $thislabel -side left
 pack $thisframe -side top -anchor w  
 # calculate generator polys
 set thisbutton [button $page.searchirred -text "Find Irreducible Polynoms"]
   $thisbutton configure -command get_irr_polys 
 pack $thisbutton -side top -anchor w
 # list generator polynoms
 set thisframe [frame $page.genpolyframe]
   set thislabel [label $thisframe.genpolylabel]
     $thislabel configure -text "Irreducible Polynoms" -anchor w 
   pack $thislabel -side top -fill x -expand 1
   set scrollerframe [ScrolledWindow $thisframe.irredpolys]
     set thislistbox [listbox $scrollerframe.polylistbox]
       $thislistbox configure -listvariable gui_state(irr_polys)
       $thislistbox configure  -height 5 -selectmode single
       $scrollerframe setwidget $thislistbox
       set gui_state(polylistbox) $thislistbox
   pack $scrollerframe  
 pack $thisframe -side top -anchor w
 # check if chosen polynom is primitive
 set thisbutton [button $page.primbutton -text "Check if Primitive"]
   $thisbutton configure -text "Check if Primitive"
   $thisbutton configure -command check_prim_poly 
 pack $thisbutton -side top -anchor w
 set thislabel [label $page.prim_or_not]
   $thislabel configure -textvariable gui_state(primlabeltext) -anchor w 
 pack $thislabel -side top -anchor w -fill x
 # select one polynom for use
 set thisbutton [button $page.usepoly -text "Use Selected Polynom"]
   $thisbutton configure -command use_poly
 pack $thisbutton -side top -anchor w
 set thislabel [label $page.polytouse -textvariable gui_state(poly)]
 pack $thislabel -side top -anchor w
 # get generator elements
 set thisbutton [button $page.getgenerators -text "Find Generator Elements"]
   $thisbutton configure -command get_gen_elems 
 pack $thisbutton -side top -anchor w
 # list field generators
 set thisframe [frame $page.genframe]
   set thislabel [label $thisframe.genlylabel]
     $thislabel configure -text "Generators" -anchor w 
   pack $thislabel -side top -fill x -expand 1
   set scrollerframe [ScrolledWindow $thisframe.generators]
     set thislistbox [listbox $scrollerframe.generatorlistbox]
       $thislistbox configure -listvariable gui_state(generators)
       $thislistbox configure  -height 5 -selectmode single
       $scrollerframe setwidget $thislistbox
       set gui_state(generatorlistbox) $thislistbox
   pack $scrollerframe  
 pack $thisframe -side top -anchor w
 # select one generator element for use
 set thisbutton [button $page.usegen -text "Use Selected Generator"]
   $thisbutton configure -command use_gen
 pack $thisbutton -side top -anchor w
 set thislabel [label $page.gentouse -textvariable gui_state(generator)]
 pack $thislabel -side top -anchor w
 # initialise galois field
 set thisbutton [button $page.initgal -text "Use Poly. & Gen. to Init"]
   $thisbutton configure -command init_field
 pack $thisbutton -side top -anchor w
 set thislabel [label $page.fieldready -textvariable gui_state(fieldready)]
 pack $thislabel -side top -anchor w
  
 ### notebook page: encode
 set page [$nb getframe encode]
 # give user the possibility to use own poly
 set thislabel [label $page.polyentrylabel -text "Genereating Polynom"]
 pack $thislabel -side top -anchor w
 set thisentry [entry $page.polyentry]
   $thisentry configure -width 10 -textvariable gui_state(poly)
 pack $thisentry -side top -anchor w
 # give user the possibility to use own generator
 set thislabel [label $page.generatorelemlabel -text "Generator Element"]
 pack $thislabel -side top -anchor w
 set thisentry [entry $page.generatorelementry]
   $thisentry configure -width 10 -textvariable gui_state(generator)
 pack $thisentry -side top -anchor w
 # init field
 set thisbutton [button $page.initgal -text "Use Poly. & Gen. to Init"]
   $thisbutton configure -command init_field
 pack $thisbutton -side top -anchor w
 set thislabel [label $page.fieldready -textvariable gui_state(fieldready)]
 pack $thislabel -side top -anchor w
 # get filename to encode
 set thislabel [label $page.filenamelabel -text "File to Encode"]
 pack $thislabel -side top -anchor w
 set thisframe [frame $page.fileframe]
   set thisentry [entry $thisframe.filenameentry]
     $thisentry configure -width 14 -textvariable gui_state(filetoencode)
   pack $thisentry -side left 
   set thisbutton [button $thisframe.filebrowse]
     $thisbutton configure -text "Browse" -command get_file
   pack $thisbutton -side right
 pack $thisframe -side top -fill x
 # init encoding
 set thisentry [LabelEntry $page.chunkentry]
   $thisentry configure -label "Chunks" -textvariable gui_state(chunks)
   $thisentry configure -width 5 -labelwidth 8
 pack $thisentry -side top -anchor w
 set thisentry [LabelEntry $page.erasureentry]
   $thisentry configure -label "Erasures" -textvariable gui_state(erasures)
   $thisentry configure -width 5 -labelwidth 8
 pack $thisentry -side top -anchor w
 set thisbutton [button $page.initcoder -text "Init Coder"]
   $thisbutton configure -command gui_init_coder
 pack $thisbutton -side top -anchor w
 set thislabel [label $page.coderready -textvariable gui_state(coderready)]
 pack $thislabel -side top -anchor w
 # write coded file
 set thisbutton [button $page.code -text "Write Coded"]
   $thisbutton configure -command write_coded
 pack $thisbutton -side top -anchor w
 
 ### notebook page: decode
 set page [$nb getframe decode]
 # get files to decode
 set thislabel [label $page.filelistlabel -text "Files to Decode"]
 pack $thislabel -side top -anchor w
 set scrollerframe [ScrolledWindow $page.filestodecodescroller]
   set thislistbox [listbox $scrollerframe.filestodecodelist]
     $thislistbox configure -listvariable gui_state(filestodecode)
     $thislistbox configure  -height 5 -selectmode single
     $scrollerframe setwidget $thislistbox
     set gui_state(filestodecodelistbox) $thislistbox
 pack $scrollerframe -side top -anchor w 
 set thisframe [frame $page.addremoveframe]
   set thisbutton [button $thisframe.addfile]
     $thisbutton configure -text "Add File"
     $thisbutton configure -command add_file_to_decode
   pack $thisbutton -side left -fill x
   set thisbutton [button $thisframe.removefile]
     $thisbutton configure -text "Remove File"
     $thisbutton configure -command remove_file_to_decode
   pack $thisbutton -side right -fill x
 pack $thisframe -side top -anchor w
 # decode
 set thisbutton [button $page.decode]
   $thisbutton configure -text "Decode" 
   $thisbutton configure -command "decode_files"
 pack $thisbutton -side top -anchor w
 
 
 
 # start show
 pack $nb -fill both -expand 1
 $nb compute_size
 $nb raise galois

That's it.


Usage:

  • Galois-tab: Galois playground, useful to find irreducible polynoms and to have a look at the generating elements of the field. Finding the generating elements can take a while.
  • Encode-tab: The impatient could use 285 as generating polynom, and 2 as generator element. The number of chunks means how many chunks you would like to generate from your original file, the number of erasures will generate one additional chunk per erasure permitted, see example above: 4 original chunks + 2 erasures = 6 chunks all in all
  • Decode-tab: Just choose the files you want to use to restore the original file and press decode.

PS:

1) The code is a bit lengthy, if you find this disturbing in the wiki, please tell me.

2) My commenting style comes from the fact that I never know when I'm going to get interrupted by my son who quite vehemently demands his baby-drinking-bottle, I write as many comments as possible, then fill the code in after the milk tour. I need the comments as extended memory of what I intended to do.

Cheers.


Other solutions for basis of comparison:

  • Parchive [L1 ] (Reed-Solomon)
  • Tkpar [L2 ]
  • zfec [L3 ] (Reed-Solomon via Vandermonde matrices) Claims to be much faster than parchive.
  • OceanStore [L4 ] (Byzantine-fault tolerant commit protocol)
  • Reed-Solomon erasure code in Python [L5 ]
  • Collection of links to code for various erasure tolerant strategies [L6 ]
  • Tornado codes [L7 ] (claims to be 100 - 10,000 times faster than R-S)
  • Low-density parity check (LDPC) [L8 ] is patent-free, contrary to other advanced schemes such as Tornado or Raptor. See [L9 ] for a LGPL library.
  • Pro-MPEG FEC is trivial to implement using binary arithmetics, as it consists in simple row and column parity checks.

FB: See also Luigi Rizzo's FEC page [L10 ]. He provides an implementation of a Reed-Solomon codec in C [L11 ] that is free to use in any context, including commercial. I've used this library in several professional projects (e.g. multicast datacasting in DVB-H), it is rock-solid and works like a charm.

Jaf: The implementation actually relies on the Papers found there, the pages are a real treasure trove for FEC stuff.

SEH 20080213 -- zfec's author claims it is an update of Rizzo's fec library.


enter categories here