loadf

JR: This is a simple critcl function to read an entire file a line at a time and return it as a list.

    package require critcl 0.33

    critcl::ccode {
      #include <stdio.h>
    }

    critcl::cproc loadf {char* fn} Tcl_Obj* {
        FILE *fd;
        Tcl_Obj *r,*s;
        char line[1024];
        r=Tcl_NewListObj(0,0);
        Tcl_IncrRefCount(r);
        s=Tcl_NewStringObj("",0);
        fd=fopen(fn,"r");
        if (!fd) {
            return r;
        }
        while(!feof(fd)) {
            int l;
            if (!fgets(line,1024,fd)) {
                break;
            }
            l=strlen(line);
            if (line[l-1]=='\n') {
                line[--l]='\0';
                Tcl_AppendToObj(s,line,l);
                Tcl_ListObjAppendElement(NULL,r,s);
                s=Tcl_NewStringObj("",0);
            } else {
                Tcl_AppendToObj(s,line,l);
            }
        }
        fclose(fd);
        if (Tcl_GetCharLength(s) > 0) {
            Tcl_ListObjAppendElement(NULL,r,s);
        }
        return r;
    }

    # tcl version for comparison
    proc loadf_tcl {file} {
        set f [open $file]
        fconfigure $f -translation binary
        set d [read -nonewline $f]
        close $f
        split $d \n
    }

    # use 
    set lines [loadf something]

The critcl version is as much as 3 times faster than the tcl version on a large file.


I just noticed that on files with lots of short lines (e.g., /usr/share/dict/words) the critcl version is actually slower than the tcl version. This bears looking into.

AMG: Maybe your fgets() is buggy. I suggest using strace to see if it's making unreasonable numbers of read syscalls.


With the conventional wisdom that [read [open $file] [file size $file]]] is faster than [read [open $file]]], I wonder if the critcl function is still 3x faster.

Wow. I've never ever used critcl before. Now I'm convinced. I didn't even know I had critcl on my system, but the above cut'n'pasted and Just Worked. Nifty. And yes, the critcl version is still faster (after the initial compile)

AMG: Conventional wisdom? Why would it be faster to tell read how many bytes to get?

JMN: I recalled this tip about file size too. It's still mentioned on this wiki from the read page As is said in tcllib, it's better to "use the file size command to get the size, which preallocates memory, rather than trying to grow it as the read progresses." I don't know where it's mentioned in tcllib, who wrote that or whether it's true - but at the risk of admitting to being a cargo-cult programmer - I do find it as a reasonably common idiom in my code.


JMN 2023-07: With Tcl 8.6.13 and critcl 3.2 I was unable to find any speedup from the critcl version (on windows). I only tested with text files up to about 458MB or so and up to around 1.9M lines I don't know what constitutes a 'large' file. Perhaps it needs to be large with respect to available memory, or perhaps Tcl script performance has improved.