## Erasure Tolerant Codes

11.02.2008 Jaf

During a Discussion at a LUG meeting, a few had the idea to use the free data repositories that Google, Yahoo, gmx, etc. provide to store a kind of off-site backup.

The opinion was that the data had to be somewhat securely stored, that meant that one should be able to access the data even when one or more of the servers were down.

A bit of 'Googling' showed that there are such things: erasure-tolerant codes that don't need 100% replication of the data, actually like 'RAID', but on a file-by-file basis.

The idea used for erasure tolerance is quite simple, an example:

• There are 4 bytes to store
• The storage can be made on 6 different locations
• This means 2 erasures are possible without compromising the data integrity
• Make a vector out of the 4 bytes, multiply a 4(columns)X6(rows) matrix with the vector
• The result is a vector with 6 elements
• The matrix is chosen so that any 4 of its rows result in an invertible matrix
• Upon losing 2 of the result bytes, choose the rows that generated the remaining bytes
• Build a matrix out of the rows and invert it
• Multiply the inverted matrix with the vector made out of the remaining data
• Et voila, your original vector is back.

A big BUT remains there: But you can only get you data back if the multiplication can be made without loss of precision, e.g., no floats.

That's why one uses Galois fields for all the operations needed to encode or to decode the data. In a Galois field the inverse of a number is also from the field, if you start with an integer field, the result of any operation will still be in the field.

Here is my implementation of the above, actually a proof of concept, that it can be done in Tcl and then you get the GUI for free.

In conjunction with fuse I think it would even be possible to implement a Web-Based-RAID

Galois Field Part:

• Matrix and vector operations
• Operations using look-up tables
• Creation of look-up tables
• Basic functions needed until look 0p tables are initialized
• Helpers
``` ################################################################################
# vector and matrix operations                                                 #
################################################################################
proc mat_invert_dec {mat} {
###############################
# inverts a matrix in GF(2^n) #
###############################
set rows [llength \$mat]
# make sure each row has same length, insuring we got a square matrix
foreach row \$mat {
if {[llength \$row] != \$rows} {return}
}
# initialize the inverse with E, inversion like at school:
# transform mat into E, apply needed operations to inv too, at the
# end inv will contain the inverse of mat.
set inv [mat_unity_dec \$rows]
for {set row 0} {\$row < \$rows} {incr row} {
# if the pivot is zero, then seek other row to add until it is nonzero
if {[lindex [lindex \$mat \$row] \$row] == 0} {
for {set other \$row} {\$other < \$rows} {incr other} {
# don't add this row onto itself, would set the row to zero
if {\$other == \$row} {continue}
# get the appropriate element
if {[lindex [lindex \$mat \$other] \$row] != 0} {break}
}
# at this point either we have a row we can add or no other row
# is possible.
# The latter case makes the matrice non invertible
if {\$other == \$rows} {return}
# a possible row was found
set new_row [vect_vect_add_dec [lindex \$mat \$row]\
[lindex \$mat \$other]]
set mat [lreplace \$mat \$row \$row \$new_row]
# reflect to E
set new_inv [vect_vect_add_dec [lindex \$inv \$row]\
[lindex \$inv \$other]]
set inv [lreplace \$inv \$row \$row \$new_inv]
}; # if pivot == 0
# set the pivot to 1
set piv_inv [inv_dec [lindex [lindex \$mat \$row] \$row]]
set new_row [scal_vect_mult_dec \$piv_inv [lindex \$mat \$row]]
set mat [lreplace \$mat \$row \$row \$new_row]
# reflect to E
set new_inv [scal_vect_mult_dec \$piv_inv [lindex \$inv \$row]]
set inv [lreplace \$inv \$row \$row \$new_inv]
# now pivot  is 1
for {set other 0} {\$other < \$rows} {incr other} {
# don't apply to self
if {\$other == \$row} {continue}
# make the entries in the column where the pivot resides zero
set this_elem [lindex [lindex \$mat \$other] \$row]
set new_mat [scal_vect_mult_dec \$this_elem [lindex \$mat \$row]]
set new_mat [vect_vect_add_dec \$new_mat [lindex \$mat \$other]]
set mat [lreplace \$mat \$other \$other \$new_mat]
# reflect to E
set new_inv [scal_vect_mult_dec \$this_elem [lindex \$inv \$row]]
set new_inv [vect_vect_add_dec \$new_inv [lindex \$inv \$other]]
set inv [lreplace \$inv \$other \$other \$new_inv]
}
}; # end foreach row of the matrix
# check that row is not only zeros, matrice would have been not invertible
foreach row \$mat {
set sorted [lsort -unique \$row]
if {([llength \$sorted] == 1) && ([lindex \$sorted 0] == 0)} { return }
}
return \$inv
}
proc scal_vect_mult_dec {scal vect} {
#####################################
# multiplies a vector with a scalar #
#####################################
set result {}
foreach elem \$vect {lappend result [mult_dec \$scal \$elem] }
return \$result
}
###################
###################
if {[llength \$vecta] != [llength \$vectb]} {return}
set result {}
foreach elema \$vecta elemb \$vectb {lappend result [add_dec \$elema \$elemb]}
return \$result
}
proc mat_unity_dec {ord} {
####################################
# makes an (ord x ord) unit matrix #
####################################
for {set row 0} {\$row < \$ord} {incr row} {
set row_str [string replace [string repeat 0 \$ord] \$row \$row 1]
lappend unity [split \$row_str ""]
}
return \$unity
}
proc mat_vect_mult_dec {mat vect} {
#########################
# matrix vector product #
#########################
set retvect {}
foreach row \$mat {
set prod [vect_vect_dot_dec \$row \$vect]
if {\$prod eq ""} {return}
lappend retvect \$prod
}
return \$retvect
}
proc vect_vect_dot_dec {vecta vectb} {
######################################
# compute the dot product of vectors #
######################################
if {[llength \$vecta] != [llength \$vectb]} {return}

set result 0
foreach elema \$vecta elemb \$vectb {
set prod [mult_dec \$elema \$elemb]
if {\$prod == ""} {return}
}
return \$result
}
################################################################################
# operations using the gal_state look up tables will work after initialisation #
# with an irreducible polynomial and after choice of a generator element.      #
# All operations use decimals                                                  #
################################################################################
#####################
#####################
}
proc mult_dec {a b} {
#######################################################
# multiply using the logarithm and exp look up tables #
#######################################################
global gal_state

if {(\$a == 0)||(\$b == 0)} {return 0}
set loga [log \$a]
set logb [log \$b]
if {(\$loga == "")||(\$logb == "")} {return}
if {\$a == 1} {return \$b}
if {\$b == 1} {return \$a}
set logres [expr (\$loga+\$logb)%\$gal_state(max_elem)]
return [exp \$logres]
}
proc inv_dec {num} {
###############################
# returns  the inverse of num #
###############################
global gal_state

if {![info exists gal_state]} {return}
return [lindex \$gal_state(inv)   \$num]
}
proc exp {num} {
###########################
# returns \$primitive^\$num #
###########################
global gal_state

if {![info exists gal_state]} {return}
return [lindex \$gal_state(exp) \$num]
}
proc log {num} {
################################
# returns log(\$num)|\$primitive #
################################
global gal_state

if {![info exists gal_state]} {return}
if {\$num == 0} {return}
return [lindex \$gal_state(log)   \$num]
}

################################################################################
# Creation of the look up tables to speed up operations                        #
################################################################################
proc init_gal_dec {poly primitive} {
#########################################################################
# create logarithm table in GF(2^ord), ord is the order of the polynom  #
# create a exponetial table in GF(2^ord)                                #
# records the polynomial used for the field                             #
# records the primitive element                                         #
# records the maximum of the field                                      #
#########################################################################
global gal_state
catch {unset gal_state}

if {![gal_is_primitive_elem_dec \$primitive \$poly]} {return 0}

set num_ent [get_two_power_dec \$poly]
set gal_state(max_elem) [expr {\$num_ent-1}]

set num_exps [expr \$num_ent - 1]

set gal_state(poly) \$poly
set gal_state(primitive) \$primitive
# make exponential table
set gal_state(exp) [list 1 \$primitive]
incr num_exps -1
while {\$num_exps} {
lappend gal_state(exp) \
[gal_mult_dec \$poly [lindex \$gal_state(exp) end] \$primitive]
incr num_exps -1
}
# make logarithm table
lappend gal_state(log) {}
set num_logs [expr \$num_ent - 1]
for {set num 1} {\$num <= \$num_logs} {incr num} {
lappend gal_state(log) [lsearch -exact \$gal_state(exp) \$num]
}
# make inverse table
set log1 [log 1]
set gal_state(inv)  [list {} "1"]
for {set num 2} {\$num <= \$num_logs} {incr num} {
set logn [log \$num]
lappend gal_state(inv) [exp [expr {\$gal_state(max_elem) - \$logn}]]
}
return 1
}
################################################################################
# Basic operations in GF(2^n), these are to be used in order to create the     #
# look up tables for exp,log & inv. The look up tables are supposed to speed   #
# up calculations                                                              #
################################################################################
##################################
# performs addition in GF(2^n)   #
# Addition is done bitwise mod 2 #
##################################
if {![string is integer \$a]} {return ""}
if {![string is integer \$b]} {return ""}
return [expr {\$a^\$b}]
}
proc gal_poly_mult_dec {a b} {
##############################################
# performs polynom multiplication in GF(2^n) #
# Amounts to long multiplication with the    #
# addition being done bitwise mod 2          #
##############################################
set multiplicator \$a
set multiplicand  \$b
if {\$a < \$b} {
set multiplicator \$b
set multiplicand  \$a
}
# peasants multiplication
set result 0
while {\$multiplicator != 0} {
if {[expr {\$multiplicator&1}]} {
}
set multiplicand  [expr {\$multiplicand  << 1}]
set multiplicator [expr {\$multiplicator >> 1}]
}
set result
}
proc gal_remainder_dec {a poly} {
#######################
# performs a mod poly #
#######################
set poly_power [get_two_power_dec \$poly]
set a_power    [get_two_power_dec \$a]
while {\$a_power >= \$poly_power} {
set pow_two [get_two_power_dec \$a]
set trial \$poly
while {\$trial < \$pow_two} {set trial [expr {\$trial << 1}]}
set a_power [get_two_power_dec \$a]
}
return \$a
}
proc gal_mult_dec {poly a b} {
#####################################################
# performs multiplication of two numbers in gf(2^n) #
# a x b = polymult(a,b) mod poly                    #
#####################################################
return [gal_remainder_dec [gal_poly_mult_dec \$a \$b] \$poly]
}
proc gal_find_gen_poly_dec {ord} {
##########################################################
# finds generator polys in GF(2^n), generators are prime #
# polynoms e.g. irreducible in GF(2^n)                   #
##########################################################
set max_find   [expr int(pow(2,\$ord+1))]
set start_find [expr int(pow(2,\$ord))]
set poly_list {}
set ret_list {}
# init with  3
lappend poly_list 3
set counter 4
while {\$counter < \$max_find} {
# only odd numbers
if {[expr {\$counter&1}]} {
set prime 1
foreach elem \$poly_list {
if {[gal_remainder_dec \$counter \$elem] == 0} {
set prime 0
break
}
}
if {\$prime} {
lappend poly_list \$counter
if {\$counter >= \$start_find} {
lappend ret_list \$counter
}
}
}
incr counter
}
set ret_list
}
proc gal_is_primitive_elem_dec {num poly} {
############################################################
# checks whether num is a primitive element in the GF(2^n) #
# defined by poly. Aprimitive element generates all field  #
# values with its powers (excluding 0)                     #
############################################################
set max_num [get_two_power_dec \$poly]
if {\$num >= \$max_num} {return 0}
if {\$num == 1} {return 0}
incr max_num -1
if {\$max_num <= 0} {return 0}
set power_list {}
set power \$num
lappend power_list \$power
while {\$power != 1} {
set power [gal_mult_dec \$poly \$power \$num]
lappend power_list \$power
}
if {[llength [lsort -integer -unique \$power_list]] == \$max_num} {return 1}
return 0

}
proc gal_is_primitive_poly_dec {poly} {
######################################################################
# a polynom is primitive in GF(2^n)|poly if 2 is a primitive element #
######################################################################
return [gal_is_primitive_elem_dec 2 \$poly]
}
proc gal_get_generators_dec {poly} {
#####################################################
# finds all generating elements of the GF(2^n)|poly #
#####################################################
set max_num [get_two_power_dec \$poly]
incr max_num -1
if {\$max_num <= 0} {return}
set gen_list {}
for {set num 2} {\$num < \$max_num} {incr num} {
if {[gal_is_primitive_elem_dec \$num \$poly]} {
lappend gen_list \$num
}
}
return \$gen_list
}
################################################################################
# HELPER                                                                       #
################################################################################
proc get_two_power_dec {num} {
##################################
# find power of 2 lower than num #
##################################
set result \$num
set shifted \$num
while {\$shifted} {
set shifted [expr {\$shifted >> 1}]
set result [expr {\$result | \$shifted}]
}
return [expr {\$result ^ (\$result >>1)}]
}```

Anyone wanting to explore GF(2^n) can use this part, it is self-contained.

Why look-up tables?

In GF(whatever) the numbers are actually symbols representing polynomials, a multiplication in GF(whatever) is actually the result of : polymult(poly1, poly2) Modulo (generator polynom), that costs! (even on today's GHz machines)

That's why after finding one generator polynom and one generator element you build up look-up tables for exponentiation, logs and inverses, and use only these for heavy-duty calculations.

I would advise against using gal_find_gen_poly_dec for a polynom order >= 16; it takes ages to complete.

Lars H, 2008-02-14: What is that algorithm, anyway? Is it tabulating all irreducible polynomials, or what?

Jaf, 2008-02-14: Yes, it tabulates all irreducible polynomials up to the order given. It does this by checking the remainder of the polynomial being checked against all already-found irreducible polynomials. It is O(n^2). The remainder check does not use multiplications.

Lars H,2008-02-14:I can't help mentioning that there are tricks one can use to speed up the polymult operation. For very large n one should do some kind of fast Fourier transform to approach O(nlogn) complexity, but more interesting here are probably tricks that work below a given bound on n.

If the data bits are not packed tight in the integer used to encode the polynomial, then one can even use integer multiplication to do most of the work. For example, if we know that n<16 then it suffices to put the data at every fourth bit — this corresponds to using the binary digit expansion as a hex digit expansion. The following (rather corny) code works for n<=8 (main culprit is the binary conversions, but one also needs a 32-bit*32-bit product, hence the wide):

``` proc poly_mult8 {a b} {
binary scan [binary format cc \$a \$b] B8B8 a_spaced b_spaced
set prod [expr ( wide(0x\$a_spaced) * 0x\$b_spaced ) & 0x111111111111111]
binary scan [binary format B16 [format %016lx \$prod]] S res
return \$res
}```

Without conversions, it would of course be much more efficient (less messy to brace the expression, for one) ...

Between the two extremes, one can split the polynomial into blocks to speed things up. Keeping a table of all products of two polynomials of degree <=3 is for example quite feasible — then one can do 4 bits in each step rather than 1 bit.

The encoder part:

• Relies on the existence of an initialized GF(2^8)
• Will not overwrite existing files
• Output is OriginalFileName_coded_n (where n is the chunk number)
``` proc init_coder {filename chunks erasures matrix_type} {
#######################################################################
# this function inits the coder, gal_state  has to exist upon calling #
# the galois field MUST be 2^8 at the moment. The read/write  routines#
# with arbitrary number of  bits don't  exist  yet. The 2^16 case has #
# been left out  as the time to generate  the irreducible polynomials #
# is very unsatisfying,  at least in this implementation of galois.tcl#
#######################################################################
global coder_state
global gal_state

## encoding stuff
# check that galois field has been initialised
if {![info  exists gal_state]} { return 0 }
# check that the field is 2^8
if {\$gal_state(max_elem) != 255} { return 0 }
# check that the number of chunks permits a E matrix
# TODO Check above
if {\$chunks   >= \$gal_state(max_elem)} { return 0 }
# check that number of erasures is not higher than n=max_elem-1
# a vandermonde is invertible if it is a (n-1)x(n-1)
# TODO Check above
# if you REALLY want so many erasures: 100% more generated data , replicate
# the filei, you'll be better off
if {\$erasures >= \$chunks} { return 0 }
# check matrix type for computed extra erasure chunks
switch -exact -- \$matrix_type {
"vandermonde" {}
default {
puts "Sorry this matrix type (\$matrix_type) is not implemented yet"
return 0
}
}
## file stuff
# input file exists
if {![file exists \$filename]} {return 0}
# and is readeable for user
if {[catch {open \$filename r} fileptr]} {return 0}
# file is open, configure for binary access
fconfigure \$fileptr -translation binary

# fill state array, bookkeeping
set coder_state(chunks) \$chunks
set coder_state(erasures) \$erasures
set coder_state(mat_type) \$matrix_type
# the encoding matrix
# unit matrix part
set coder_state(coder_matrix) [mat_unity_dec \$chunks]
# vandermonde part, TODO: rework this into a proc to allow at least
# for a cauchy matrice
for {set counter 1} {\$counter <= \$erasures} {incr counter} {
set this_row {}
set this_elem 1
lappend this_row \$this_elem
for {set col_count 1} {\$col_count < \$chunks} {incr col_count} {
set this_mult [expr \$counter]
set this_elem [mult_dec \$this_elem \$this_mult]
lappend this_row \$this_elem
}
lappend coder_state(coder_matrix) \$this_row
}
# input file name
set coder_state(filename) \$filename
# input file descriptor
set coder_state(fileptr) \$fileptr
# file size
set coder_state(size) [file size \$filename]

## the idea is to construct a string that can be evalled later to
## convert the read bytes into the appropriate decimal values.
# the value names to receive the bytes
set vals {}
for {set counter 0} {\$counter < \$chunks} {incr counter} {
lappend vals val\$counter
}
set coder_state(vals_names) \$vals

set coder_state(vals_vals) \
[string map {"val" "\$val"} [join \$coder_state(vals_names)]]

# conversion string
set conv [concat  binary scan {\$coder_state(bytes_read)}]
set conv [concat \$conv [string repeat c1 \$coder_state(chunks)]]
set conv [concat \$conv \$coder_state(vals_names)]
set coder_state(conv) \$conv

return 1
}
############################################################################
# reads the number of chunks specified in the coder_state(chunks) variable #
# returns a list  containing  the  values  of  the  chunks  with  the most #
# significant chunk at list 0                                              #
############################################################################
global coder_state

# coder must have been intialised
if {![info  exists coder_state]} { return }
# no eof reached
if {[eof \$coder_state(fileptr)]} { return }
# clean up before conversion
foreach name \$coder_state(vals_names) {catch {unset \$name}}
# convert the data
set conversions [eval \$coder_state(conv)]
# make a list, either full or partial
set subst_str \$coder_state(vals_vals)
if {\$conversions != \$coder_state(chunks)} {
# split the subst_lst to extract fields, join result to have string
set subst_str [join [lrange [split \$subst_str] 0 [incr conversions -1]]]
}
# substitute string, make proper list
set ret [split [subst \$subst_str]]
# transform to unsigned
set num_elem [llength \$ret]
for {set idx 0} {\$idx < \$num_elem} {incr idx} {
lset ret \$idx [expr {[lindex \$ret \$idx] & 0Xff}]
}

return \$ret
}
proc write_coded {} {
###################################################################
# creates the output files, if  files  by  same name exist aborts #
# writes the coder_state into each file, writes the stream number #
# switches to binary mode and streams out the encoding results    #
###################################################################
global coder_state
global gal_state

# coder must have been intialised
if {![info  exists coder_state]} {return 0}
# generate names for the output files
set num_streams [expr \$coder_state(chunks) + \$coder_state(erasures)]
set outfiles {}
for {set index 0}  {\$index < \$num_streams} {incr index} {
lappend outfiles "\$coder_state(filename)_coded_\${index}"
}
puts "will encode in: \$outfiles"
# check that files do not overwrite
foreach name \$outfiles {if {[file exists \$name]} {return 0}}
# open file channels
set channels {}
foreach name \$outfiles {
if {[catch {open \$name w} fileptr]} {
foreach chan \$channels {close \$chan}
return 0
}
lappend channels \$fileptr
}
# write ascii header, switch to binary for further writing
# empty bytes_read just to be on the safe side
#set state_list [array get coder_state]
set state_list {}
foreach idx {chunks erasures filename size} {
lappend state_list \$idx \$coder_state(\$idx)
}
lappend state_list "poly" \$gal_state(poly)
lappend state_list "primitive" \$gal_state(primitive)
set counter 0
foreach chan \$channels {
set this_list \$state_list
lappend this_list "code_row"
lappend this_list [lindex \$coder_state(coder_matrix) \$counter]
lappend this_list "stream" \$counter
puts \$chan \$this_list
flush \$chan
fconfigure \$chan -translation binary
incr counter
}
# encode
set mat [lrange \$coder_state(coder_matrix) \$coder_state(chunks) end]
set chunks \$coder_state(chunks)
while {![eof \$coder_state(fileptr)]} {
while {[llength \$in_vect] < \$chunks} {lappend in_vect 0}
set out_vect [concat \$in_vect [mat_vect_mult_dec \$mat \$in_vect]]
foreach chan \$channels elem \$out_vect {
puts -nonewline \$chan [binary format c \$elem]
}
}
# clean up
foreach chan \$channels {close \$chan}
return 1
}```
• If you take a peek at the coded files you'll see that the first line is a plain text information line that gets gobbled by the decoder to initialize its own state.
• Only the Vandermonde matrix type is implemented; I will implement a Cauchy variant so that we have more choices
• You can pgp' your file first, then code it. As soon as I am satisfied with the codec, I'll implement an on the fly'' pgp in the encoder. I think there's a package somewhere....

The decoder part:

``` proc init_decoder {file_list} {
####################################################################
# initialises the decoder by reading the header lines of the files #
####################################################################
global gal_state
global decoder_state
global tmp_arr
global channel_list

# file list is not empty
if {![llength \$file_list]} {return 0}
# files exist
foreach file_name \$file_list {if {![file exists \$file_name]} {return 0}}
# files open
set channel_list {}
foreach file_name \$file_list {
if {[catch {open \$file_name} chan]} {
foreach chan \$channel_list {close \$chan}
return 0
}
lappend channel_list \$chan
}

catch {unset file_info}
foreach chan \$channel_list {gets \$chan file_info(\$chan)}
# ascii data is always one line at beginning, switch to binary mode now
foreach chan \$channel_list {fconfigure \$chan -translation binary}
# check consistency
catch {unset tmp_arr}
foreach chan \$channel_list {
foreach {name val} \$file_info(\$chan) {
set tmp_arr(\${chan}_\$name) \$val
}
}
# streams and  code_rows are different
set chunk_list {}
foreach idx [array names tmp_arr "*stream"] {
lappend chunk_list \$tmp_arr(\$idx)
}
if {[llength \$chunk_list] != [llength [lsort -unique \$chunk_list]]} {
foreach chan \$channel_list {close \$chan}
puts "Encountered duplicate stream"
return 0
}
set tmp_list {}
foreach idx [array names tmp_arr "*code_row"] {
lappend tmp_list [join \$tmp_arr(\$idx) ""]
}
if {[llength \$tmp_list] != [llength [lsort -unique \$tmp_list]]} {
foreach chan \$channel_list {close \$chan}
puts "Encountered duplicated code row"
return 0
}
# chunks, erasures, filename, size, poly and primitive must be the same
set check_list [list chunks erasures filename size poly primitive]
foreach check \$check_list {
set tmp_list {}
foreach idx [array names tmp_arr "*\$check"] {
lappend tmp_list [join \$tmp_arr(\$idx) ""]
}
if {[llength [lsort -unique \$tmp_list]] != 1} {
foreach chan \$channel_list {close \$chan}
puts "Encountered inconsistency while checking: \$check"
return 0
}
set decoder_state(\$check) [lindex \$tmp_list 0]
}
# check that enough chunks are there
if {[llength \$file_list] < \$decoder_state(chunks)} {
puts "Not enough chunks to restore data"
foreach chan \$channel_list {close \$chan}
return 0
}
# check whether we received the chunks that were used in the unity matrix
# part of the coding matrix
set decode_list {}
set list_end [expr \$decoder_state(chunks)-1]
set temp_list [lrange [lsort -integer  \$chunk_list] 0 \$list_end]
if {([lindex \$temp_list 0]         == 0        ) && \
([lindex \$temp_list \$list_end] == \$list_end)} {
puts "Chunks from unity matrix available, will use multiplex"
foreach chan \$channel_list {
lappend decode_list [list \$chan \$tmp_arr(\${chan}_stream)]
}
set decode_list [lsort -integer -index 1 \$decode_list]
set decode_list [lrange \$decode_list 0 \$list_end]
set decoder_list {}
foreach  elem \$decode_list {lappend decoder_list [lindex \$elem 0]}
unset decode_list
mux \$decoder_list
} else {
puts "At least one original chunk is missing, will use restore"
set matrix {}
set decoder_list {}
foreach chan \$channel_list {
lappend decoder_list  \$chan
lappend matrix \$tmp_arr(\${chan}_code_row)
}
set decoder_list [lrange \$decoder_list 0 \$list_end]
set matrix       [lrange \$matrix       0 \$list_end]
init_gal_dec \$decoder_state(poly) \$decoder_state(primitive)
puts "matrix: \$matrix"
set matrix [mat_invert_dec \$matrix]
puts "inverse: \$matrix"
restore \$decoder_list \$matrix
}

# files close
foreach chan \$channel_list {close \$chan}
}
proc mux {chan_list} {
########################################################################
# reads one byte at a time from the channels in chan_list sequentially #
# and write them into the file given by decoder_state(filename)        #
########################################################################
global decoder_state

# prep output file
set out_name \$decoder_state(filename)_decoded
puts "will write to \$out_name"
if {[file exists \$out_name]} {puts "out file exists";return 0}
if {[catch {open \$out_name w} out_chan]} {puts "could not open";return 0}
fconfigure \$out_chan -translation binary
# stream out until num bytes of original file is reached
set out_counter 0
while {1} {
foreach chan \$chan_list {
puts -nonewline \$out_chan [read \$chan 1]
incr out_counter
if {\$out_counter >= \$decoder_state(size)} {
close \$out_chan
return 1
}
}
}
}
proc restore {chan_list matrix} {
################################################
# The actual decoding stuff happens here       #
# Given the channels and the decoding matrix   #
# The bytes are read from the different chunks #
# Multiplied with the matrix and shipped out   #
################################################
global decoder_state

# prep output file
set out_name \$decoder_state(filename)_decoded
puts "will write to \$out_name"
if {[file exists \$out_name]} {puts "out file exists";return 0}
if {[catch {open \$out_name w} out_chan]} {puts "could not open";return 0}
fconfigure \$out_chan -translation binary

set out_counter 0
while {1} {
set vect {}
foreach chan \$chan_list {
binary scan [read \$chan 1] c1 byte
lappend vect [expr {\$byte & 0xff}]
}
set vect [mat_vect_mult_dec \$matrix \$vect]
if {![llength \$vect]} {return}
foreach byte \$vect {
puts -nonewline \$out_chan [binary format c \$byte]
incr out_counter
if {\$out_counter >= \$decoder_state(size)} {
close \$out_chan
return 1
}
}
}

}```

A crude GUI:

``` #!/usr/bin/env tclsh
# packages
package require Tk
package require BWidget
# sources, TODO these will be transformed into packages later
#source galois_dec.tcl
#source encoding.tcl
#source decoder.tcl
# GUI state array
set gui_state(#) anchor
set gui_state(field_order) 8
set gui_state(irr_polys) {}
set gui_state(polylistbox) ""
set gui_state(primlabeltext) ""
set gui_state(generatorlistbox) ""
set gui_state(generators) {}
set gui_state(poly) ""
set gui_state(generator) ""
set gui_state(filetoencode) ""
set gui_state(chunks) ""
set gui_state(erasures) ""
set gui_state(matrixtype) "vandermonde"
set gui_state(filestodecode) {}
set gui_state(filestodecodelistbox) ""
# GUI helpers
proc decode_files {} {
global gui_state
init_decoder \$gui_state(filestodecode)
}
global gui_state
set thisfile [tk_getOpenFile -title "Choose File to Decode"]
lappend gui_state(filestodecode) \$thisfile
}
proc remove_file_to_decode {} {
global gui_state
set thisindex [\$gui_state(filestodecodelistbox) curselection]
if {\$thisindex == ""} {return}
\$gui_state(filestodecodelistbox) delete \$thisindex \$thisindex
}
proc gui_init_coder {} {
global gui_state
if {\$gui_state(filetoencode) == ""} {return 0}
if {\$gui_state(chunks)       == ""} {return 0}
if {\$gui_state(erasures)     == ""} {return 0}
if {\$gui_state(matrixtype)   == ""} {return 0}
if {![string is integer \$gui_state(chunks)  ]} {return 0}
if {![string is integer \$gui_state(erasures)]} {return 0}
if {![init_coder \$gui_state(filetoencode)\
\$gui_state(chunks)      \
\$gui_state(erasures)    \
\$gui_state(matrixtype)] } {
return 0
}
return 1
}
proc get_file {} {
global gui_state
set gui_state(filetoencode) [tk_getOpenFile \
-initialfile \$gui_state(filetoencode) \
-title "Choose File to Encode"]
}
proc init_field {} {
global gui_state
if {(\$gui_state(poly) != "") && (\$gui_state(generator) != "")} {
}
}
proc use_poly {} {
global gui_state
set index [\$gui_state(polylistbox) curselection]
if {\$index != {}} {
set gui_state(poly) [lindex \$gui_state(irr_polys) \$index]
}
set gui_state(generator) ""
set gui_state(generators) {}
}
proc use_gen {} {
global gui_state
set index [\$gui_state(generatorlistbox) curselection]
if {\$index != {}} {
set gui_state(generator) [lindex \$gui_state(generators) \$index]
}
}
proc get_irr_polys {} {
global gui_state
set gui_state(irr_polys) [gal_find_gen_poly_dec \$gui_state(field_order)]
\$gui_state(polylistbox) selection clear 0 end
\$gui_state(polylistbox) selection set 0 0
set gui_state(generators) {}
set gui_state(generator) ""
set gui_state(poly) ""
set gui_state(primlabeltext) ""
}
proc check_prim_poly {} {
global gui_state
set index [\$gui_state(polylistbox) curselection]
if {\$index != {}} {
set poly [lindex \$gui_state(irr_polys) \$index]
set not ""
if {![gal_is_primitive_poly_dec \$poly]} { set not "not " }
set gui_state(primlabeltext) ""
append gui_state(primlabeltext) \$poly " is " \$not "primitive"
}
}
proc get_gen_elems {} {
global gui_state
set index [\$gui_state(polylistbox) curselection]
if {\$index != {}} {
set poly [lindex \$gui_state(irr_polys) \$index]
set gui_state(generators) [gal_get_generators_dec \$poly]
}
\$gui_state(generatorlistbox) selection clear 0 end
\$gui_state(generatorlistbox) selection set 0 0
set gui_state(generator) ""
}
#### prep notebook
set nb [NoteBook .nb -side top]
\$nb insert 0 galois -text "Galois"
\$nb insert 1 encode -text "Encode"
\$nb insert 2 decode -text "Decode"

### notebook page: galois
set page [\$nb getframe galois]
# field order
set thisframe [frame \$page.galorderframe]
set thislabel [label \$thisframe.galorderlabel -text "field order"]
set thisentry [entry \$thisframe.galorderent]
\$thisentry configure -width 4 -textvariable gui_state(field_order)
pack \$thisentry -side right
pack \$thislabel -side left
pack \$thisframe -side top -anchor w
# calculate generator polys
set thisbutton [button \$page.searchirred -text "Find Irreducible Polynoms"]
\$thisbutton configure -command get_irr_polys
pack \$thisbutton -side top -anchor w
# list generator polynoms
set thisframe [frame \$page.genpolyframe]
set thislabel [label \$thisframe.genpolylabel]
\$thislabel configure -text "Irreducible Polynoms" -anchor w
pack \$thislabel -side top -fill x -expand 1
set scrollerframe [ScrolledWindow \$thisframe.irredpolys]
set thislistbox [listbox \$scrollerframe.polylistbox]
\$thislistbox configure -listvariable gui_state(irr_polys)
\$thislistbox configure  -height 5 -selectmode single
\$scrollerframe setwidget \$thislistbox
set gui_state(polylistbox) \$thislistbox
pack \$scrollerframe
pack \$thisframe -side top -anchor w
# check if chosen polynom is primitive
set thisbutton [button \$page.primbutton -text "Check if Primitive"]
\$thisbutton configure -text "Check if Primitive"
\$thisbutton configure -command check_prim_poly
pack \$thisbutton -side top -anchor w
set thislabel [label \$page.prim_or_not]
\$thislabel configure -textvariable gui_state(primlabeltext) -anchor w
pack \$thislabel -side top -anchor w -fill x
# select one polynom for use
set thisbutton [button \$page.usepoly -text "Use Selected Polynom"]
\$thisbutton configure -command use_poly
pack \$thisbutton -side top -anchor w
set thislabel [label \$page.polytouse -textvariable gui_state(poly)]
pack \$thislabel -side top -anchor w
# get generator elements
set thisbutton [button \$page.getgenerators -text "Find Generator Elements"]
\$thisbutton configure -command get_gen_elems
pack \$thisbutton -side top -anchor w
# list field generators
set thisframe [frame \$page.genframe]
set thislabel [label \$thisframe.genlylabel]
\$thislabel configure -text "Generators" -anchor w
pack \$thislabel -side top -fill x -expand 1
set scrollerframe [ScrolledWindow \$thisframe.generators]
set thislistbox [listbox \$scrollerframe.generatorlistbox]
\$thislistbox configure -listvariable gui_state(generators)
\$thislistbox configure  -height 5 -selectmode single
\$scrollerframe setwidget \$thislistbox
set gui_state(generatorlistbox) \$thislistbox
pack \$scrollerframe
pack \$thisframe -side top -anchor w
# select one generator element for use
set thisbutton [button \$page.usegen -text "Use Selected Generator"]
\$thisbutton configure -command use_gen
pack \$thisbutton -side top -anchor w
set thislabel [label \$page.gentouse -textvariable gui_state(generator)]
pack \$thislabel -side top -anchor w
# initialise galois field
set thisbutton [button \$page.initgal -text "Use Poly. & Gen. to Init"]
\$thisbutton configure -command init_field
pack \$thisbutton -side top -anchor w
pack \$thislabel -side top -anchor w

### notebook page: encode
set page [\$nb getframe encode]
# give user the possibility to use own poly
set thislabel [label \$page.polyentrylabel -text "Genereating Polynom"]
pack \$thislabel -side top -anchor w
set thisentry [entry \$page.polyentry]
\$thisentry configure -width 10 -textvariable gui_state(poly)
pack \$thisentry -side top -anchor w
# give user the possibility to use own generator
set thislabel [label \$page.generatorelemlabel -text "Generator Element"]
pack \$thislabel -side top -anchor w
set thisentry [entry \$page.generatorelementry]
\$thisentry configure -width 10 -textvariable gui_state(generator)
pack \$thisentry -side top -anchor w
# init field
set thisbutton [button \$page.initgal -text "Use Poly. & Gen. to Init"]
\$thisbutton configure -command init_field
pack \$thisbutton -side top -anchor w
pack \$thislabel -side top -anchor w
# get filename to encode
set thislabel [label \$page.filenamelabel -text "File to Encode"]
pack \$thislabel -side top -anchor w
set thisframe [frame \$page.fileframe]
set thisentry [entry \$thisframe.filenameentry]
\$thisentry configure -width 14 -textvariable gui_state(filetoencode)
pack \$thisentry -side left
set thisbutton [button \$thisframe.filebrowse]
\$thisbutton configure -text "Browse" -command get_file
pack \$thisbutton -side right
pack \$thisframe -side top -fill x
# init encoding
set thisentry [LabelEntry \$page.chunkentry]
\$thisentry configure -label "Chunks" -textvariable gui_state(chunks)
\$thisentry configure -width 5 -labelwidth 8
pack \$thisentry -side top -anchor w
set thisentry [LabelEntry \$page.erasureentry]
\$thisentry configure -label "Erasures" -textvariable gui_state(erasures)
\$thisentry configure -width 5 -labelwidth 8
pack \$thisentry -side top -anchor w
set thisbutton [button \$page.initcoder -text "Init Coder"]
\$thisbutton configure -command gui_init_coder
pack \$thisbutton -side top -anchor w
pack \$thislabel -side top -anchor w
# write coded file
set thisbutton [button \$page.code -text "Write Coded"]
\$thisbutton configure -command write_coded
pack \$thisbutton -side top -anchor w

### notebook page: decode
set page [\$nb getframe decode]
# get files to decode
set thislabel [label \$page.filelistlabel -text "Files to Decode"]
pack \$thislabel -side top -anchor w
set scrollerframe [ScrolledWindow \$page.filestodecodescroller]
set thislistbox [listbox \$scrollerframe.filestodecodelist]
\$thislistbox configure -listvariable gui_state(filestodecode)
\$thislistbox configure  -height 5 -selectmode single
\$scrollerframe setwidget \$thislistbox
set gui_state(filestodecodelistbox) \$thislistbox
pack \$scrollerframe -side top -anchor w
pack \$thisbutton -side left -fill x
set thisbutton [button \$thisframe.removefile]
\$thisbutton configure -text "Remove File"
\$thisbutton configure -command remove_file_to_decode
pack \$thisbutton -side right -fill x
pack \$thisframe -side top -anchor w
# decode
set thisbutton [button \$page.decode]
\$thisbutton configure -text "Decode"
\$thisbutton configure -command "decode_files"
pack \$thisbutton -side top -anchor w

# start show
pack \$nb -fill both -expand 1
\$nb compute_size
\$nb raise galois```

That's it.

Usage:

• Galois-tab: Galois playground, useful to find irreducible polynoms and to have a look at the generating elements of the field. Finding the generating elements can take a while.
• Encode-tab: The impatient could use 285 as generating polynom, and 2 as generator element. The number of chunks means how many chunks you would like to generate from your original file, the number of erasures will generate one additional chunk per erasure permitted, see example above: 4 original chunks + 2 erasures = 6 chunks all in all
• Decode-tab: Just choose the files you want to use to restore the original file and press decode.

PS:

1) The code is a bit lengthy, if you find this disturbing in the wiki, please tell me.

2) My commenting style comes from the fact that I never know when I'm going to get interrupted by my son who quite vehemently demands his baby-drinking-bottle, I write as many comments as possible, then fill the code in after the milk tour. I need the comments as extended memory of what I intended to do.

Cheers.

Other solutions for basis of comparison:

• Parchive [1 ] (Reed-Solomon)
• Tkpar [2 ]
• zfec [3 ] (Reed-Solomon via Vandermonde matrices) Claims to be much faster than parchive.
• OceanStore [4 ] (Byzantine-fault tolerant commit protocol)
• Reed-Solomon erasure code in Python [5 ]
• Collection of links to code for various erasure tolerant strategies [6 ]
• Tornado codes [7 ] (claims to be 100 - 10,000 times faster than R-S)
• Low-density parity check (LDPC) [9 ] is patent-free, contrary to other advanced schemes such as Tornado or Raptor. See [8 ] for a LGPL library.
• Pro-MPEG FEC is trivial to implement using binary arithmetics, as it consists in simple row and column parity checks.

FB: See also Luigi Rizzo's FEC page [11 ]. He provides an implementation of a Reed-Solomon codec in C [10 ] that is free to use in any context, including commercial. I've used this library in several professional projects (e.g. multicast datacasting in DVB-H), it is rock-solid and works like a charm.

Jaf: The implementation actually relies on the Papers found there, the pages are a real treasure trove for FEC stuff.

SEH 20080213 -- zfec's author claims it is an update of Rizzo's fec library.

enter categories here