[DKF]: Sometimes you come across something that makes you think “wow!” Here's one such thing: comparing the flat out single threaded performance of [Tcl] and [Python]. The problem I was looking at was to compute the sum of all prime numbers less than ten million (there are quite a few of them!) and the limiting factor is an efficient method for generating all the primes in the range. I present here two implementations for doing this in the languages under consideration, based on code originally from http://code.activestate.com/recipes/117119/ by way of https://stackoverflow.com/questions/567222/simple-prime-generator-in-python%|%StackOverflow *** Tcl *** ======tcl proc sum_primes_to {n {i 1}} { set total 0 incr n 0 for {set q [expr {$i + $i}]} {$q < $n} {incr q} { if {![info exists d($q)]} { incr total $q lappend d([expr {$q*$q}]) $q } else { foreach p $d($q) { lappend d([expr {$p + $q}]) $p } unset -nocomplain d($q) } } return $total } puts [sum_primes_to 10000000] ====== *** Python *** ======none def sum_primes_to(n): total = 0 d = {} q = 2 while q < n: if q not in d: total += q d[q * q] = [q] else: for p in d[q]: d.setdefault(p + q, []).append(p) del d[q] q += 1 return total print(sum_primes_to(10000000)) ====== So… comparing the performance (overall for the script, with `time`) on a single system with production-grade builds of both languages, I get this: Tcl 8.6: 13.250s <
> Python 2.7: 20.369s <
> Python 3.5: 22.204s <
> Python 3.6: 13.874s These are all production builds that I've built locally to be as fast as possible on my hardware. (Also, they all produce the correct result, `3203324994356`.) ---- gonwalf 2018.01.31: I tried the same code on python 3.5 and tcl 8.6.6 and 8.6.8 with different results: Tcl 8.6.6: 10.07s <
> Python 3.5: 10,80s <
> on Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz and Tcl 8.6.8: 8,53s <
> Python 3.5: 8,08s <
> on Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz Which compiler flags did you use for the Tcl build? [DKF]: 04-Feb-2018: I used the builds of both Tcl and Python built by macports, all running on my laptop (on mains power). All should be in release as-much-optimisation-as-usually-reasonable mode, and with chunks of computation as large as this, the CPU should be scaled up pretty equally. (NB: The system Tcl build on OSX is actually very slow; they enable an option that adds close tracking of low-level metrics but at great performance overhead.) I've got additional experimental builds where I make the Tcl code quite a lot faster, but they're definitely not used by anyone else yet (and aren't yet correct, semantically). ---- I recently made a post on https://codegolf.stackexchange.com/questions/188133/bentleys-coding-challenge-k-most-frequent-words/223212#223212%|%StackExchange about the classic "find the N most frequent words (\[A-Za-z]+) in a text" problem and found that Tcl was quite slow. Any diagnostic?: *** Tcl *** ======tcl #!/usr/bin/env tclsh proc wordcount {path head} { set data [string tolower [read [open $path]]] foreach word [regexp -all -inline {[a-z]+} $data] { dict incr wordcount $word } set sorted [lsort -stride 2 -index 1 -int -decr $wordcount] lrange $sorted 0 [expr {$head * 2 - 1}] } foreach {count word} [wordcount {*}$argv] { puts "$word\t$count" } ====== *** Python *** ======none #!/usr/bin/env python3.9 import collections, re, sys filename = sys.argv[1] k = int(sys.argv[2]) reg = re.compile('[a-z]+') counts = collections.Counter() counts.update(reg.findall(open(filename).read().lower())) for i, w in counts.most_common(k): print(i, w) ====== Those are the times I get: ====== $ time ./wordcount.tcl /tmp/ulysses64.txt 10 ... ./wordcount.tcl /tmp/ulysses64.txt 10 24.27s user 1.02s system 99% cpu 25.287 total $ time ./wordcount.py /tmp/ulysses64.txt 10 ... ./wordcount.py /tmp/ulysses64.txt 10 10.42s user 0.90s system 99% cpu 11.329 total ====== and timing the various parts of the Tcl code gets me: * A massive slowdown (25 -> 40 s) * file read: 1.2 s, regexp: 15.9 s, dict incr loop: 21.6 s, lsort: 10 ms, lrange: 40 µs <>Performance | Tcl | Python