Tcl IO performance

MJ - 20071105, Sparked by an inquiry by Larry McVoy on the Tcl chatroom, the IO performance of Tcl and specifically compared to Perl has been benchmarked quite extensively and was shown to be slower than Perl. In order to get closer to the speed of Perl the IO implementation in Tcl is currently undergoing scrutiny to see where performance gains can be made.

The goal of the page is to summarize the discussion as it has been held up to this point, solutions that have been suggested and patches that have been tried. This should prevent circular discussions on the chat, because people can be directed to this page to get an update of the current state of affairs.

The benchmarks used to time the effect of changes can be downloaded from [L1 ]

JE - 20071105. In particular, it's important to note that the point of the exercise is not to make a "fast cat" in Tcl or to make a "fast grep" in Tcl: these are both easily doable. The point of the exercise is to find where the real bottlenecks are in real text processing filters, and to make those hotspots cooler.

KBK - 20071105:

Hypothesis: The variable access in the three-argument form of gets is a significant contributor to the observed poor performance.

Experiment: A disc file was created containing 1 million lines, each comprising the single letter 'x'. A modification was made to Tcl_GetsObjCmd as shown:

RCS file: /cvsroot/tcl/tcl/generic/tclIOCmd.c,v
retrieving revision 1.44
diff -u -r1.44 tclIOCmd.c
--- generic/tclIOCmd.c  15 Oct 2007 21:27:48 -0000      1.44
+++ generic/tclIOCmd.c  5 Nov 2007 21:02:36 -0000
@@ -287,9 +287,13 @@
        lineLen = -1;
     }
     if (objc == 3) {
-       if (Tcl_ObjSetVar2(interp, objv[2], NULL, linePtr,
-               TCL_LEAVE_ERR_MSG) == NULL) {
-           return TCL_ERROR;
+       if (*Tcl_GetString(objv[2]) != '\0') {
+           if (Tcl_ObjSetVar2(interp, objv[2], NULL, linePtr,
+                              TCL_LEAVE_ERR_MSG) == NULL) {
+               return TCL_ERROR;
+           }
+       } else {
+           Tcl_DecrRefCount(linePtr);
        }
        Tcl_SetObjResult(interp, Tcl_NewIntObj(lineLen));
        return TCL_OK;

The following script, roughly equivalent to 'wc -l' was used as benchmark.

proc read1 {} {
    set lc 0
    set f [open ~/tmp/1mlines r]
    while {[gets $f line] >= 0} {
        incr lc
    }
    close $f
    return $lc
} 

proc read2 {} {
    set lc 0
    set f [open ~/tmp/1mlines r]
    while {[gets $f {}] >= 0} {
        incr lc
    }
    close $f
    return $lc
} 

set methods {read1 read2}
foreach method $methods {
    set min($method) Inf    
}
for {set i 0} {$i < 5} {incr i} {
    foreach method $methods {
        set us [lindex [time $method] 0]
        puts [list $method: $us]
        if {$us < $min($method)} {
            set min($method) $us
        }
    }
}
foreach method $methods {
    puts [list $method $min($method)]
}

Results:

The minimum observed run time for the script where the line was returned into a variable was 2.46 seconds; the minimum observed time where the line was discarded was 2.08 seconds. (The maxima were 2.52 and 2.17 respectively, so system load was presumed not to be an issue).

Conclusion:

Storing the return string in a variable appears to consume roughly 0.28 microseconds per line, or about 12% of the total time spent in this micro-benchmark.

Further experiments by kbk as reported on the chat:

due to the fact that buf is not further used in the script, the variable is not compiled to be accessed by index; adding a line 'set buf {}' before the loop to read1 (yielding read1a) corrects this
insuring that buf is a compiled local variable (read1a) gives the times: read1: 2.44 s, read1a 2.20 s, read2 2.09 s
the same as above, on larry's larger data file: read1: 2.85 read1a 2.52 read2 2.57
with fakegets and readm as below: 1.91 s This means that the call/setVar/return is no more than 1.91s
finally, the experiment with readm2 below puts a tighter upper bound of 0.50s on call/setVar/return: it timed at 0.69 s, versus 0.19 s for an empty loop ([incr ct -1] instead of [$x ct -1], insuring that it is bytecompiled so that there is no call out of TEBC)

proc fakegets {chan v} {
    global ct
    if {![incr ct -1]} {return -1}
        return 1
    }
}

proc readm {} {
    set ::ct 1000000
    set f [open ~/tmp/DATA r]
    while {[fakegets $f {}] >= 0} {
        incr lc
    }
    close $f
    return $lc
}

proc readm2 {} {
    set x incr
    set ct 1000000
    set f [open ~/tmp/DATA r]
    while {[$x ct -1] > 0} {
        incr lc
    }
    close $f
    return $lc
}

Interim conclusion The time spent in the original benchmark seems to be decomposable in

8% in startup and loop overhead (0.19/2.5)
20% in command call and setting a variable in the called command (0.50/2.50) Further experimentation with a C-coded command that just returns TCL_OK shows that the cost of setting the variable is almost negligible
72% doing the file access, reading and conversion within gets itself
the cost of setting the variable in the called command seems to be below the measurable threshold for the larger dataset (read1a actually faster than read2 - the difference must be noise)

Note that the file is assumed to be in the system's encoding; it might be interesting to test the timings on a binary file.

MS: Moving parts of the discussion to Tcl invoke performance

LV 2007 Nov 06 Platform: SPARC Solaris 9, using Sun's C compiler, built with 64 bit support and threads, and tcl 8.5 CVS head.

I found that the sort.tcl ran about 6 times slower than Perl, 4 times slower than Python, and almost twice as slow as Ruby. The grep.tcl ran about 3 times slower than Ruby's, almost 7 times slower than Python's and about 29 times slower than Perl's... The hash.tcl ran about 40% or so slower than Ruby, about 9 times slower than Python and about 10 times slower than Perl.

Category Internals	Category Performance