(Remark about the catergory) Uh well, the idea is that the Cuda part of the graphics card is like a sittle supercomputer with like 100s of Gigaflops and fine grained threads, which could run list commands. Also I don't mean a cuda compiler or language port, such as there is a python interface to the cuda processing, but more along the lines of a tcl working as a big scale parallel compiled language, such that is runs really fast, and can do things like inter-thread scheduling very quick.
TV Jan 3 '09 Since no one else seems to be working on thus particular desktop supercomputing, I took a few hours to do a little testing myself.
First I compiled Tcl 8.6.b1 from source (It seemed 8.5 didn't download. Since there are some backward canvas saving issues since some version in BWise that could be not ok), and then I added a Cuda example (From the NVidia Cuda SDK) to the source code of tclAppInit.c in the Unix directory (I run this on Fedora 10/64, the processor is Pentium D, the graphics card a FX9500GT), in principle by adding the source of the recursiveGaussian.cu example to the file and calling the result tclAppInitnv.cu [L1 ] . Then I added a tcl function (I'm sure breaking some style rules possibly) 'hello' which calls the Cuda test (-bench) which doesn't run the graphics, but tests an actual CUDA function call with data transporting to the graphics card (using CudaMemCopy).
To compile the result I used this command (you need the cuda devenv for that, see [L2 ], and of course a driver which can handle cuda, in my case the latest cuda 2.1 and opengl3 and such), which can be compared with the Cinepaint plugin approach I described here [L3 ]:
nvcc -I"." -I/home/theo/Tcl/tcl8.6b1/unix -I/home/theo/Tcl/tcl8.6b1/generic -I/home/theo/Tcl/tcl8.6b1/libtommath -DPACKAGE_NAME=\"tcl\" -DPACKAGE_TARNAME=\"tcl\" -DPACKAGE_VERSION=\"8.6\" -DPACKAGE_STRING=\"tcl\ 8.6\" -DPACKAGE_BUGREPORT=\"\" -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DHAVE_LIMITS_H=1 -DHAVE_SYS_PARAM_H=1 -DTCL_CFGVAL_ENCODING=\"iso8859-1\" -DHAVE_ZLIB=1 -DMODULE_SCOPE=extern\ __attribute__\(\(__visibility__\(\"hidden\"\)\)\) -DTCL_SHLIB_EXT=\".so\" -DTCL_CFG_OPTIMIZED=1 -DTCL_CFG_DEBUG=1 -DTCL_TOMMATH=1 -DMP_PREC=4 -D_LARGEFILE64_SOURCE=1 -DTCL_WIDE_INT_IS_LONG=1 -DHAVE_GETCWD=1 -DHAVE_OPENDIR=1 -DHAVE_STRTOL=1 -DHAVE_WAITPID=1 -DHAVE_GETADDRINFO=1 -DUSE_TERMIOS=1 -DHAVE_SYS_TIME_H=1 -DTIME_WITH_SYS_TIME=1 -DHAVE_STRUCT_TM_TM_ZONE=1 -DHAVE_TM_ZONE=1 -DHAVE_GMTIME_R=1 -DHAVE_LOCALTIME_R=1 -DHAVE_MKTIME=1 -DHAVE_TM_GMTOFF=1 -DHAVE_TIMEZONE_VAR=1 -DHAVE_STRUCT_STAT_ST_BLKSIZE=1 -DHAVE_ST_BLKSIZE=1 -DHAVE_INTPTR_T=1 -DHAVE_UINTPTR_T=1 -DHAVE_SIGNED_CHAR=1 -DHAVE_LANGINFO=1 -DHAVE_SYS_IOCTL_H=1 -DTCL_UNLOAD_DLLS=1 -I ~/NVIDIA_CUDA_SDK/common/inc tclAppInitnv.cu -L ~/NVIDIA_CUDA_SDK/lib/ -L ~/NVIDIA_CUDA_SDK/common/lib/ -L /usr/lib64/ ~theo/NVIDIA_CUDA_SDK/common/lib/linux/libGLEW_x86_64.a -lcutil -lglut -L/usr/lib64 -L/home/theo/Tcl/tcl8.6b1/unix -ltcl8.6 -ldl -lz -lieee -lm
and making a local data directory with the lena.ppm example available.
The result is now a a.out as a tclsh with an extra command which executes the cuda test code, in the case of the above file a graphicsless test after which one is returned to the tclsh prompt, when "-bench" is removed from the argv list, the cuda graphics window will close after pressing <escape> (like in the Cuda SDK example), but then the interpreter is terminated because the GLUT graphics loop cannot be broken.
[theo@medion2 unix]$ ./a.out % hello Loaded 'data/lena.ppm', 512 x 512 pixels Processing time: 350.947998 (ms) 74.70 Mpixels/sec Hello, World! %
Oh, in this case the cuda function linked to the hello command can be run only once.
I is possible to install both the tcl from above, compile Tk too, and then run a.out (the nvcc compiled main tcl app) and then do:
package require Tk
to use Tk (and possibly BWise in conjunction with the compiled CUDA .cu code