(Author SL)
This is a simple package, to see if MPI is going to work within Tcl. Although it's not reasonable to work with tcl in a high performance environment, it's still interesting.
CL disagrees: specifically, I believe that it can be quite reasonable to run Tcl in HPC. Datum: one of Python's early milestones was a paper [CL needs to supply reference] on the use of Python on global-class supercomputers.
AM (28 august 2008) Could Tcl benefit from OpenMP? I guess there are a number of places where the basic commands could be enhanced ... but that would probably not be a loadable extension anymore. Oh well, just musing.
Lars H: Do you by any chance mean MPI rather than OpenMP [L1 ]? They take rather different approaches to multiprocessing. I have hard to see how the OpenMP style of parallelization could be applied to Tcl, given that Tcl doesn't allow interps to be shared between threads.
AM No, OpenMP - loops inside the functions that implement the various commands might be parallellised that way ... But I have not looked into it, just toying with the idea.
Lars H: OK, for speeding up things like operations on large vectors it might work, but I have hard to see how the core could make use of that — it's typically very closely tied to the scripting level when it comes to execution order, so there isn't much parallelism to exploit. Extensions like NAP is another matter though.
But since the package on this page uses MPI rather than OpenMP, we should move this discussion to a separate page.
Usage:
package require tmpi 0.1
(loadable binary extension)
You have to have a valid MPI-Installation on your machine! In my case (Duo-Core-PC XP) this is [L2 ]. Than you have a runtime environment (e.g., mpiexec) to start a multi-processor-process e.g.,
mpiexec -n ?nb of processors/processes? tclsh85.exe script.tcl
This actually means you have many tclsh85.exe processes running, each on a different processor, and now it's the developers job to get benefit out of that. You have to use tmpi to synchronize both processes and transfer data from one process to another.
The code is harder to read and write, and the data transfer from one process to another is time-consuming. The trick is to let every process calculate another part of the result. At the end you merge the results in one process.
Let t1 be the time it takes to calculate X on a single-processor machine. Let X be "parallelable". Than t2 can be near t1/n + transfer-time + divide-merge-time with n processes.
The big question: t1 >> t2 ?
source distribution [L3 ]
binary (winXP) [L4 ]
Commands:
Initialize the MPI execution environment. Returns 1 or fails.
possible options are:
-mpiqueue print out the state of the message queues when tmpi::finalize is called. All processors print; the output may be hard to decipher. This is intended as a debugging aid.
-mpiversion print out the version of the implementation (not of MPI), including the arguments that were used with configure.
-mpinice nn Increments the nice value by nn (lowering the priority of the program by nn). nn must be positive (except for root). Not all systems support this argument; those that do not will ignore it.
NOTE: The MPI standard does not say what a program can do before an tmpi::init or after an tmpi::finalize. In the MPICH implementation, you should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input or writing to standard output. According to that, the usage within Tcl is questionable, or even risky, because Tcl does a lot before it evaluate its actual Tcl-Input.
Determines the size of the group associated with the communicator MPI_COMM_WORLD. Returns the size.
Determines the rank of the calling process in the communicator MPI_COMM_WORLD. Returns the rank.
Blocks until all process have reached this routine. Returns 1 or fails.
Performs a basic send of val from one process to dest process among MPI_COMM_WORLD. Send-tags are not supported. Returns 1 or fails.
Counterpart of send to receive val from source process. If -length is supplied, a list is assumed. Returns the recieved value, possibly a list of values.
Terminates MPI execution environment. All processes must call this routine before exiting. The number of processes running after this routine is called is undefined; it is best not to perform much more than a return rc after calling. Returns 1 or fails.
EXAMPLE: hello_world.tcl
package require tmpi 0.1 proc hello_world {} { tmpi::init set size [tmpi::size] set rank [tmpi::rank] puts "Hello world from process $rank of $size." tmpi::finalize }
execute:
mpiexec -n 2 tclsh85.exe hello_world.tcl
prints:
>Hello world from process 0 of 2. >Hello world from process 1 of 2.
EXAMPLE: approximation of pi [L5 ]
package require tmpi 0.1
Single process:
proc approx_pi_sp {n} { set circle_count 0 for {set i 0} {$i < $n} {incr i} { set x [expr rand()] set y [expr rand()] if {[expr {sqrt(pow($x,2)+pow($y,2)) < 1.0}]} { incr circle_count } } puts [expr {4.0 * $circle_count / $n}] } puts [time {approx_pi_sp 5000000} 1]
execute:
tclsh85.exe approx_pi_sp.tcl
prints:
>24.07 sek.
Multi process:
proc approx_pi_mp {n} { tmpi::init set size [tmpi::size] set rank [tmpi::rank] set nb [expr {$n / $size}] set circle_count 0 for {set i 0} {$i < $nb} {incr i} { set x [expr rand()] set y [expr rand()] if {[expr {sqrt(pow($x,2)+pow($y,2)) < 1.0}]} { incr circle_count } } # main process collects the vec if {$rank == 0} { for {set i 1} {$i < $size} {incr i} { set circle_count [expr {$circle_count + [tmpi::recv -type integer $i]}] } puts [expr {4.0 * $circle_count / ($nb * $size)}] } else { for {set i 1} {$i < $size} {incr i} { tmpi::send -type integer $circle_count 0 } } tmpi::finalize } puts [time {approx_pi_mp 5000000} 1]
execute:
mpiexec -n 2 tclsh85.exe approx_pi_mp.tcl
prints:
>12.12 sek. >12.14 sek.
Comparison:
Multi process
12.14 sek.
Single process
24.07 sek.