Version 10 of loadf

Updated 2023-07-17 04:06:23 by JMN

JR: This is a simple critcl function to read an entire file a line at a time and return it as a list.

    package require critcl 0.33

    critcl::ccode {
      #include <stdio.h>
    }

    critcl::cproc loadf {char* fn} Tcl_Obj* {
        FILE *fd;
        Tcl_Obj *r,*s;
        char line[1024];
        r=Tcl_NewListObj(0,0);
        Tcl_IncrRefCount(r);
        s=Tcl_NewStringObj("",0);
        fd=fopen(fn,"r");
        if (!fd) {
            return r;
        }
        while(!feof(fd)) {
            int l;
            if (!fgets(line,1024,fd)) {
                break;
            }
            l=strlen(line);
            if (line[l-1]=='\n') {
                line[--l]='\0';
                Tcl_AppendToObj(s,line,l);
                Tcl_ListObjAppendElement(NULL,r,s);
                s=Tcl_NewStringObj("",0);
            } else {
                Tcl_AppendToObj(s,line,l);
            }
        }
        fclose(fd);
        if (Tcl_GetCharLength(s) > 0) {
            Tcl_ListObjAppendElement(NULL,r,s);
        }
        return r;
    }

    # tcl version for comparison
    proc loadf_tcl {file} {
        set f [open $file]
        fconfigure $f -translation binary
        set d [read -nonewline $f]
        close $f
        split $d \n
    }

    # use 
    set lines [loadf something]

The critcl version is as much as 3 times faster than the tcl version on a large file.


I just noticed that on files with lots of short lines (e.g., /usr/share/dict/words) the critcl version is actually slower than the tcl version. This bears looking into.

AMG: Maybe your fgets() is buggy. I suggest using strace to see if it's making unreasonable numbers of read syscalls.


With the conventional wisdom that [read [open $file] [file size $file]]] is faster than [read [open $file]]], I wonder if the critcl function is still 3x faster.

Wow. I've never ever used critcl before. Now I'm convinced. I didn't even know I had critcl on my system, but the above cut'n'pasted and Just Worked. Nifty. And yes, the critcl version is still faster (after the initial compile)

AMG: Conventional wisdom? Why would it be faster to tell read how many bytes to get?

JMN 2023-07: With Tcl 8.6.13 I was unable to find any speedup from the critcl version (on windows). I only tested with text files up to about 458MB or so and up to around 1.9M lines I don't know what constitutes a 'large' file. Perhaps it needs to be large with respect to available memory, or perhaps Tcl script performance has improved.