Version 5 of Erasure Tolerant Codes

Updated 2008-02-11 20:57:21 by jaf

During a Discussion at a LUG meeting a few had the idea to use the free data repositories that Google, Yahoo,gmx etc, provide to store a kind of off site backup.

In our opinion the data had to be somewhat securely stored, that meant that one should be able to access the data even when one or more of the servers were down.

A bit of 'Googling' showed that there are such things like erasure tolerant codes that don't need 100% replication of the data, actually like 'RAID', but on a file by file basis.

The idea used for erasure tolerant codes is quite simple, an example:

  • There are 4 bytes to store
  • The storage can be made on 6 different locations
  • This means 2 erasures are possible without compromising the data integrity
  • Make a vector out of the 4 bytes, multiply a 4(columns)X6(rows) matrix with the vector
  • The result is a vector with 6 elements
  • The matrix is chosen so, that any 4 of its rows result in an invertible matrix
  • Upon loosing 2 of the result bytes, choose the rows that generated the remaining bytes
  • Build a matrix out of the rows and invert it
  • Multiply the inverted matrix with vector made out of the remaining data
  • Et voila your original vector is back.

A big BUT remains there: But you can only get you data back if the multiplication can be made without loss of precision, e.g. no floats.

That's why one uses Galois fields for all the operations needed to encode or to decode the data. In a Galois field the inverse of a number is also from the field, if you start with an integer field, the result of any operation will still be in the field.


Here is my implementation of the above, actually a proof of concept, that it can be done in Tcl 'and then you get the GUI for free'.


enter categories here