Multi-Core Processor Test using Threads and Critcl

I recently purchased a new desktop computer for home use that has a Core 2 Quad processor. Every once in a while I would use the task manager to see if more than one core was working and I seldom found processor using more than one core. This started to make me feel like a Ferrari owner that lives in city and never has a chance to see what it can do on the highway.

I decided to take my quad processor for drive on the highway using Critcl and Thread package. The approach I took was to write a C function that took an iteration count and did some useless work the number of times indicated by the iteration count. The C function was written using Critcl so it could be run from tcl. I then wrote a tcl control procedure that creates a user defined number of threads and then runs the C function in each thread. The control procedure also takes an iteration count as input. The control procedure divides its iteration count by the number of requested threads and passes the divided iteration count to each thread.

I then did multiple runs on my computer using different numbers of threads to see how much time it took to process the same number of iterations. The results are shown below followed by the code (from 3 files) that I used.

--tomk


Usage Results

The following pictures were taken from the task manager on my PC. My comments assume that the tcl control procedure (which is also a thread) isn't using a core to enough to affect the results since it is just waiting for the other threads to complete.

This first plot was taken from a test using 1 thread. It appears that the thread was probably run on core 3 but there also is activity on other cores.

This plot was taken from a test using 2 theads. The thread appear to have run on cores 1 and 3 and the work pushed their usage to 100%

This plot was taken from a test using 3 theads. The thread appear to have run on cores 1 and 3 and the work pushed their usage to 100%

This plot was taken from a test using 4 theads. All cores appear to have been used and the work pushed their usage to 100%


Performance Results

The plot below show the effect of using threads to do work. The change from a single thread to four threads (the number of cores in my processor) is 33 seconds to 9 seconds which is about a factor of 3.5 improvement in throughput.

It is also interesting to note that adding threads above the maximum number of cores in the processor initially reduces throughput until even more threads are added. I suspect this is an artifact of the way in which the testing is performed.

perf_vs_threads_plot.jpg


Control Procedure (test.tcl)

package require Thread
# note: when critcl builds the C code in ccode.tcl
# it places the dll in a local directory named 'lib'
lappend auto_path lib
package require ccode
# this is the thead management proc
proc main { num_threads iterations } {
        # spread the iterations over the theads
        set interations_per_thead [expr ${iterations}/${num_threads}]
        # Create threads
        puts stderr "Creating threads:"
        for {set t 0} {${t}<${num_threads}} {incr t} {
                set tid(${t}) [thread::create {
                        thread::wait
                }]
                puts stderr "...$tid(${t})"
        }
        puts stderr "Give each thread something to do:"
        for {set t 0} {${t}<${num_threads}} {incr t} {
                set thread_job "
                        # add lib dir to the thread's search path
                        lappend auto_path lib
                        # load ccode dll into the thread
                        package require ccode
                        # run the C routine
                        do_work ${interations_per_thead}
                "
                puts stderr "...running job in thread($tid(${t}))"
                thread::send -async $tid(${t}) ${thread_job} done
        }
        puts stderr "Waiting for threads to complete:"
        for {set t 0} {${t}<${num_threads}} {incr t} {
                vwait done
        }
        puts stderr "All theads complete"
}
# run the test
set tstart [clock seconds]
lassign ${argv} theads interations
puts "##### testing: (${theads}) threads running (${interations}) interations"
main ${theads} ${interations}
set tstop [clock seconds]
puts "time: [expr ${tstop}-${tstart}]"

C Function (ccode.tcl)

package provide ccode 1.0
# This package defines a some C code that will
# be run in each thread. The outer loop effectively
# multiplies the number of iterations by 1000.
# This package is compiled using critcl.
::critcl::config keepsrc 1
set    code ""
append code "#include <stdio.h>\n"
append code "#include <stdlib.h>\n"
::critcl::ccode ${code}
# Define a C routine that does some work
::critcl::cproc do_work {
        int max
} ok {
        int     error ;
        int     i,j,k,m,n ;

        error  = TCL_OK ;

        for ( i = 0; i<1000; ++i ) {
                for ( j = 0; j<max; ++j ) {
                        k = rand();
                        m = rand();
                        n = k * m  ;
                }
        }

        return error ;
}

Makefile

I used this make file to simplify the build and test process. After you build the code you can modify the THREADS variable and rerun the test.

THREADS := 1
ITERATIONS := 1000000
build: clobber
        @echo "### build"
        @tclsh critcl2.kit -pkg ccode.tcl
test:
        @echo "### test"
        @tclsh test.tcl ${THREADS} ${ITERATIONS}
clobber:
        @echo "### clobber"
        @tclsh critcl2.kit -clean
        @rm -rf lib