Version 17 of Tcl IO performance

Updated 2007-11-06 13:22:53 by miguel

MJ - 20071105, Sparked by an inquiry by Larry McVoy on the Tcl chatroom, the IO performance of Tcl and specifically compared to Perl has been benchmarked quite extensively and was shown to be slower than Perl. In order to get closer to the speed of Perl the IO implementation in Tcl is currently undergoing scrutiny to see where performance gains can be made.

The goal of the page is to summarize the discussion as it has been held up to this point, solutions that have been suggested and patches that have been tried. This should prevent circular discussions on the chat, because people can be directed to this page to get an update of the current state of affairs.

The benchmarks used to time the effect of changes can be downloaded from [L1 ]

JE - 20071105. In particular, it's important to note that the point of the exercise is not to make a "fast cat" in Tcl or to make a "fast grep" in Tcl: these are both easily doable. The point of the exercise is to find where the real bottlenecks are in real text processing filters, and to make those hotspots cooler.


KBK - 20071105:

Hypothesis: The variable access in the three-argument form of gets is a significant contributor to the observed poor performance.

Experiment: A disc file was created containing 1 million lines, each comprising the single letter 'x'. A modification was made to Tcl_GetsObjCmd as shown:

RCS file: /cvsroot/tcl/tcl/generic/tclIOCmd.c,v
retrieving revision 1.44
diff -u -r1.44 tclIOCmd.c
--- generic/tclIOCmd.c  15 Oct 2007 21:27:48 -0000      1.44
+++ generic/tclIOCmd.c  5 Nov 2007 21:02:36 -0000
@@ -287,9 +287,13 @@
        lineLen = -1;
     }
     if (objc == 3) {
-       if (Tcl_ObjSetVar2(interp, objv[2], NULL, linePtr,
-               TCL_LEAVE_ERR_MSG) == NULL) {
-           return TCL_ERROR;
+       if (*Tcl_GetString(objv[2]) != '\0') {
+           if (Tcl_ObjSetVar2(interp, objv[2], NULL, linePtr,
+                              TCL_LEAVE_ERR_MSG) == NULL) {
+               return TCL_ERROR;
+           }
+       } else {
+           Tcl_DecrRefCount(linePtr);
        }
        Tcl_SetObjResult(interp, Tcl_NewIntObj(lineLen));
        return TCL_OK;

The following script, roughly equivalent to 'wc -l' was used as benchmark.

proc read1 {} {
    set lc 0
    set f [open ~/tmp/1mlines r]
    while {[gets $f line] >= 0} {
	incr lc
    }
    close $f
    return $lc
} 

proc read2 {} {
    set lc 0
    set f [open ~/tmp/1mlines r]
    while {[gets $f {}] >= 0} {
	incr lc
    }
    close $f
    return $lc
} 

set methods {read1 read2}
foreach method $methods {
    set min($method) Inf    
}
for {set i 0} {$i < 5} {incr i} {
    foreach method $methods {
        set us [lindex [time $method] 0]
	puts [list $method: $us]
	if {$us < $min($method)} {
	    set min($method) $us
	}
    }
}
foreach method $methods {
    puts [list $method $min($method)]
}

Results:

The minimum observed run time for the script where the line was returned into a variable was 2.46 seconds; the minimum observed time where the line was discarded was 2.08 seconds. (The maxima were 2.52 and 2.17 respectively, so system load was presumed not to be an issue).

Conclusion:

Storing the return string in a variable appears to consume roughly 0.28 microseconds per line, or about 12% of the total time spent in this micro-benchmark.


Further experiments by kbk as reported on the chat:

  • insuring that buf is a compiled local variable (read1a) gives the times: read1: 2.44 s, read1a 2.20 s, read2 2.09 s
  • the same as above, on larry's larger data file: read1: 2.85 read1a 2.52 read2 2.57
  • with fakegets and readm as below: 1.91 s This means that the call/setVar/return is no more than 1.91s
  • finally, the experiment with readm2 below puts a tighter upper bound of 0.50s on call/setVar/return: it timed at 0.69 s, versus 0.19 s for an empty loop ([incr ct -1] instead of [$x ct -1], insuring that it is bytecompiled so that there is no call out of TEBC)

As interim conclusion: The overhead of loop/call/setVar/return is below 30% (0.7s of 2.5s), 70% of the time is spent doing the actual read/convert within gets itself.

proc fakegets {chan v} {
    global ct
    if {![incr ct -1]} {return -1}
        return 1
    }
}

proc readm {} {
    set ::ct 1000000
    set f [open ~/tmp/DATA r]
    while {[fakegets $f {}] >= 0} {
        incr lc
    }
    close $f
    return $lc
}

proc readm2 {} {
    set x incr
    set ct 1000000
    set f [open ~/tmp/DATA r]
    while {[$x ct -1] > 0} {
	incr lc
    }
    close $f
    return $lc
}