ulis, 2003-10-11.
If you have to access repeatedly a big file you need to optimize the i/o by building a file index. With a file index you can access the n-th enreg inside the big file with few i/o.
The idea is a simple one: you keep handy the offsets of the enregs in the big file and then you access the enreg with a single seek.
To access efficiently the offsets, you record them in an index file in a fixed format.
'''The proc to build the index''' file of a big file of newline delimited enregs: # -------------------- # build_index # # builds a file index for a big file # -------------------- # parm1: big file name # parm2: index file name # -------------------- # return: count of enregs # -------------------- proc build_index {bigfile indexfile} \ { # index the file set count 0 set fpi [open $bigfile r] set fpo [open $indexfile w] while {![eof $fpi]} \ { puts -nonewline $fpo [format %-8.8s [tell $fpi]] ### enregs are newline delimited lines ### set line [gets $fpi] incr count } puts -nonewline $fpo [format %-8.8s [tell $fpi]] close $fpo close $fpi return $count }
The proc to access the n-th enreg given the file pointers to the big file and to the index file:
# -------------------- # get_enreg # # get the n-th enreg of a big file indexed by an index file # -------------------- # parm1: the index of the enreg inside the big file # parm2: the handle of the opened big file # parm3: the handle of the opened index file # -------------------- # return: the n-th enreg of the big file # -------------------- proc get_enreg {n fileptr indexptr} \ { # get the starting & endind offsets seek $indexptr [expr {$index * 8}] start set start [read $indexptr 8] set end [read $indexptr 8] # seek inside the big file seek $fileptr $start start # return the enreg set enreg [read $fileptr [expr {$end - $start}]] }
See also