**tmpi - Message Passing Interface via Tcl** (Author [SL]) This is a simple package, to see if MPI is going to work within Tcl. Although it's not reasonable to work with tcl in a high performance environment, it's still interesting. <> [CL] disagrees: specifically, I believe that it can be quite reasonable to run Tcl in [HPC]. Datum: one of [Python]'s early milestones was a paper [[CL needs to supply reference]] on the use of Python on global-class supercomputers. [AM] (28 august 2008) Could Tcl benefit from OpenMP? I guess there are a number of places where the basic commands could be enhanced ... but that would probably not be a loadable extension anymore. Oh well, just musing. [Lars H]: Do you by any chance mean [MPI] rather than OpenMP [http://en.wikipedia.org/wiki/OpenMP]? They take rather different approaches to multiprocessing. I have hard to see how the OpenMP style of parallelization could be applied to Tcl, given that Tcl doesn't allow interps to be shared between threads. [AM] No, OpenMP - loops inside the functions that implement the various commands might be parallellised that way ... But I have not looked into it, just toying with the idea. [Lars H]: OK, for speeding up things like operations on large vectors it might work, but I have hard to see how the core could make use of that — it's typically very closely tied to the scripting level when it comes to execution order, so there isn't much parallelism to exploit. Extensions like [NAP] is another matter though. But since the package on this page uses [MPI] rather than OpenMP, we should move this discussion to a separate page. <> ---- '''Usage:''' ====== package require tmpi 0.1 ====== (loadable binary extension) You have to have a valid MPI-Installation on your machine! In my case (Duo-Core-PC XP) this is [http://www-unix.mcs.anl.gov/mpi/mpich/]. Than you have a runtime environment (e.g., mpiexec) to start a multi-processor-process e.g., ======none mpiexec -n ?nb of processors/processes? tclsh85.exe script.tcl ====== This actually means you have many tclsh85.exe processes running, each on a different processor, and now it's the developers job to get benefit out of that. You have to use tmpi to synchronize both processes and transfer data from one process to another. The code is harder to read and write, and the data transfer from one process to another is time-consuming. The trick is to let every process calculate another part of the result. At the end you merge the results in one process. Let ''t1'' be the time it takes to calculate X on a single-processor machine. Let X be "parallelable". Than ''t2'' can be near ''t1/n + transfer-time + devide-merge-time'' with ''n'' processes. The big question: ''t1 >> t2 ?'' ---- source distribution [http://www.tfh-berlin.de/~sleuthold/files/tcl/tmpi01.tar.gz] binary (winXP) [http://www.tfh-berlin.de/~sleuthold/files/tcl/tmpi01.zip] ---- '''Commands:''' : '''tmpi::init''' ?''options''? Initialize the MPI execution environment. Returns 1 or fails. possible ''options'' are: ''-mpiqueue'' print out the state of the message queues when tmpi::finalize is called. All processors print; the output may be hard to decipher. This is intended as a debugging aid. ''-mpiversion'' print out the version of the implementation (not of MPI), including the arguments that were used with configure. ''-mpinice nn'' Increments the nice value by nn (lowering the priority of the program by nn). nn must be positive (except for root). Not all systems support this argument; those that do not will ignore it. ''NOTE:'' The MPI standard does not say what a program can do before an tmpi::init or after an tmpi::finalize. In the MPICH implementation, you should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input or writing to standard output. According to that, the usage within Tcl is questionable, or even risky, because Tcl does a lot before it evaluate its actual Tcl-Input. : '''tmpi::size''' Determines the size of the group associated with the communictor MPI_COMM_WORLD. Returns the size. : '''tmpi::rank''' Determines the rank of the calling process in the communicator MPI_COMM_WORLD. Returns the rank. : '''tmpi::barrier''' Blocks until all process have reached this routine. Returns 1 or fails. : '''tmpi::send -type''' [['''integer'''|'''double''']] ?'''-list'''? ''val dest'' Performs a basic send of ''val'' from one process to ''dest'' process among MPI_COMM_WORLD. Send-tags are not supported. Returns 1 or fails. : '''tmpi::recv -type''' [['''integer'''|'''double''']] ?'''-length''' ''l''? ''source'' Counterpart of send to recieve ''val'' from ''source'' process. If ''-length'' is supplied, a list is assumed. Returns the recieved value, possibly a list of values. : '''tmpi::finalize''' Terminates MPI execution environment. All processes must call this routine before exiting. The number of processes running after this routine is called is undefined; it is best not to perform much more than a return rc after calling. Returns 1 or fails. ---- '''EXAMPLE:''' ''hello_world.tcl'' ====== package require tmpi 0.1 proc hello_world {} { tmpi::init set size [tmpi::size] set rank [tmpi::rank] puts "Hello world from process $rank of $size." tmpi::finalize } ====== execute: mpiexec -n 2 tclsh85.exe hello_world.tcl prints: >Hello world from process 0 of 2. >Hello world from process 1 of 2. ---- '''EXAMPLE:''' ''approximation of pi'' [http://www.llnl.gov/computing/tutorials/parallel_comp/#ExamplesPI] ====== package require tmpi 0.1 ====== '''Single process:''' ====== proc approx_pi_sp {n} { set circle_count 0 for {set i 0} {$i < $n} {incr i} { set x [expr rand()] set y [expr rand()] if {[expr {sqrt(pow($x,2)+pow($y,2)) < 1.0}]} { incr circle_count } } puts [expr {4.0 * $circle_count / $n}] } puts [time {approx_pi_sp 5000000} 1] ====== execute: tclsh85.exe approx_pi_sp.tcl prints: >24.07 sek. '''Multi process:''' ====== proc approx_pi_mp {n} { tmpi::init set size [tmpi::size] set rank [tmpi::rank] set nb [expr {$n / $size}] set circle_count 0 for {set i 0} {$i < $nb} {incr i} { set x [expr rand()] set y [expr rand()] if {[expr {sqrt(pow($x,2)+pow($y,2)) < 1.0}]} { incr circle_count } } # main process collects the vec if {$rank == 0} { for {set i 1} {$i < $size} {incr i} { set circle_count [expr {$circle_count + [tmpi::recv -type integer $i]}] } puts [expr {4.0 * $circle_count / ($nb * $size)}] } else { for {set i 1} {$i < $size} {incr i} { tmpi::send -type integer $circle_count 0 } } tmpi::finalize } puts [time {approx_pi_mp 5000000} 1] ====== execute: mpiexec -n 2 tclsh85.exe approx_pi_mp.tcl prints: >12.12 sek. >12.14 sek. '''Comparison:''' Multi process 12.14 sek. Single process 24.07 sek. ---- !!!!!! %|[Category Performance] | [Category Concept] | [Category Interprocess Communication]| [Category Package]|% !!!!!!