'''[[An introductory, narrative, explanation of what your code is intended to perform here would be most helpful.  Not everyone here will be familiar with the term "Hard Archiving".]]'''

In April of 2011 I read that the Getty valued Mapplethorpe's work archive (slides, etc) at 30 million, 20 years after his death. Maybe a year earlier I read that somebody claimed to have a sheet of Ansel Adam's slides (which has since been considered untrue by experts) - however IF the slides were authentic they would have an estimated value of 200 million, about 30 years after his death. I realized that modern photographers who work in digital format have a problem - CD's, DVD's and Hard Drives might last ten years. Also, backups created by software applications may not be readable formats in the future. For example, I read an article about the US Gov't "losing" years of engineering/research information due to 'rapid digital aging'.

My work over the past several months has been to develop a strategy for 'digital work preservation', or 'hard archiving'. In this project I am not concerned with "prints", or finished product, but the "work" - the original files used/created by the artist. My initial concept is based on the 1980's magazine A.N.A.L.O.G  (Atari computer mag, not sci fi) method of distributing binary software on paper. example: ([http://www.cyberroach.com/analog/an21/default.htm%|%mag issue 21, 1984%|%]) ([http://www.cyberroach.com/analog/an21/avalanche.htm%|%"Avalance" Listing%|%])

My experiments show it is possible to use base64 encoding to store the image/work files, which can be stored in a "durable" format. Paper might last 50 years, microfiche might last 500. It is technically possible for a human to re-enter base64 encoded data, much like the software in the magazine.

With 'untrained' open source OCR software we achieve approximately 70% accuracy in scanning this base64 encoded data. Through 'training' the accuracy improves. Using a CRC24 calculated checksum on each line allows the programmer to validate input and 'brute force' guess the correct sequence. (this is the 'out.txt' file, below {[http://nefyou.com/nt.pdf%|%example pdf%|%]}). However, using the Quick Response Code technique to encode the data may prove to increase accuracy and minimize storage requirements.

It is perhaps a task similar to the "[http://voyager.jpl.nasa.gov/%|%Voyager Golden Record%|%]" project, but we must communicate with humans in the future (who have unknown capabilities). We are concerned with defining the information in such a way that someone with intelligence and some sort of computer technology can make use of the data.

This is a work in progress, and of course there are numerous issues to consider. I believe most think it's a silly waste of time, but whatever, i'm still fascinated by the project. :-)

Arguments I've heard against:

*) the archive may be valuable to some collector in the future, but the expense to the artist will not be compensated - the artist will (probably) never see any return from the effort

*) why bother, just give it all up to the established data collectors (as we are doing, anyway). (dump your terabytes on the cloud)

*) it would be "easier and cheaper" just to make archival prints of all files.


----
(in progress)

Step 1. Scan local path and select files for archival. Create cover sheet.

hard_archive.tcl

===
set master "\n
Hard Archive \"[pwd]\"
Path Time: [file mtime [pwd]]
Time this script was executed: [clock seconds]
[clock format [clock seconds]]
Epoch time ([clock format 0])

Legend: 
\"Filename\" \[ModifyTime\] \[FileSize\] \"File Type\"
Please refer to documentation for information regarding
file types, times and sizes

"

set files [glob *]

foreach x $files {
        if { [file type $x] == "file" } {
                append master "\"$x\" \[[file mtime $x]\] \[[file size $x]\] \"[exec file -b $x]\"\n"

                exec ./a.exe -e $x out.txt

        }
}

puts $master
===
----
'''[[Unless I am missing something, your foreach x $files loop appears to overwrite out.txt with each new file, such that at the end of the loop, the only data in out.txt is the data from the final filename in $files.]]'''

[waitman] (2011-07-27) true. I need to make step 2 a procedure that runs after out.txt is created, and I don't need out.txt anymore after it's run.

----
Note: 
a.exe is my hacked version of b64 (C) Copr Bob Trower 1986-01. (I added line numbering and CRC24 checksum to each line)
source code here: http://www.nefyou.com/b64-modified.c.txt


Step 2. Generate Quick Response Code images for out.txt 

testout.tcl
===
exec rm -Rf work
exec mkdir work     
set fp [open "out.txt" r]
fconfigure $fp -buffering line
set mc 10
set lc 1
set process ""

gets $fp data
while {$data != ""} {
        append process "$data\\n"
        set lc [incr lc]
        if { $lc>5 } {

                exec ./qrencode-3.1.1/qrencode.exe -o work/$mc.png -s 3 -8 "$process"
                
                set mc [incr mc 10]

                 set process ""
                set lc 1
        }
          gets $fp data
     }
     close $fp
exec ./qrencode-3.1.1/qrencode.exe -o work/$mc.png -s 3 -8 "$process"
===
----
'''[[I am curious as to why you did not use [file delete] for "exec rm -Rf work" (file delete -force work), and why you did not use [file mkdir] for "exec mkdir work", thereby avoiding two exec calls.  Also, if you want your "$md.png" files to be processed in order, you might consider [[format "work/$08d.png" $mc]] within your exec call to produce zero filled names, which will sort in numeric order.  At the moment, as soon as you hit 100.png your sort order of files will be wrong.]]'''


[waitman] (2011-07-27) Thank you, very good points. The png are temporary and will be removed, Still working out the montage bit but at the moment I don't believe the names need padding.

----

Note: qrencode is http://fukuchi.org/works/qrencode/index.en.html


Step 3: put png together in pdf

working... need to merge step 1&2
7z data is possible, it is well documented compression but adds another level of complexity.

[RLE] (2011-07-29) Tcl 8.6 includes built in zlib functionality, so you could "gzip" the data completely from within Tcl, reducing the complexity when compressing the data.

[waitman] (2011-07-27) Thanks, i'll check it out. Compression processing make it more difficult to describe data. I imagine that gzip is well documented :) In this project I must only assume the person receiving the information has some form of computer, and is intelligent enough to put the puzzle together. 

But, here's the problem: JPEG is a "worthless" format (8bit color stores approx. 1% of the possible information, there are numerous versions and subformats of JPEG, and besides JPEG is on it's way out to obsolescence), a serious photographer is going to shoot RAW. RAW is proprietary to the manufacturer, impossible to define, and changes with each new camera model. Therefore RAW images must be converted to TIFF / 16bits per channel. TIFF is a static/stable format, well documented. Problem is: large files. Compression of TIFF file will get you down to around 2% - 10% of original, but we must be able to adequately define the reverse algorithm. 


[QRSamp%|%Sample Image%|%]

<<categories>>Data Preservation, Archival Strategy