**tarray - Typed Array extension for Tcl** Project is hosted at SourceForge [http://sourceforge.net/projects/tarray]. Latest download is V0.4. Web site and documentation is at http://tarray.sourceforge.net ***Overview*** ''Note that this extension is very much in an experimental stage. It should be robust enough with a fairly comprehensive test suite. However, the API is incomplete and subject to change.'' The tarray extension implements a new Tcl collection data type - typed array - and associated commands `column` and `table`. A typed array stores elements of a specific data type in native format. The primary motivation for this extension is efficient memory utilization and speed of certain operations in applications dealing with very large numbers of elements. Typed columns and tables do not really provide any additional capabilities over Tcl's built-in commands, particularly [list] and [dict] and the script libraries based on these. Their primary use is in applications with large number of data items where memory consumption and cost of operations, such as searching and sorting, are greatly reduced through the use of native types and parallelization (see preliminary numbers below). The tarray extension was inspired in part by [Speed Tables] and to a lesser extent by [TclRal]. The philosophy behind tarray is to provide efficient facilities on top of which more sophisticated data structures, possibly customized for specific applications, can be easily scripted and experimented with. Therefore, unlike [Speed Tables], tarray does not require creation and recompilation of a new extension for each table definition. Moreover, tarray provides value-based semantics so that columns and tables can be used as basic building blocks. Additional facilities that [Speed Tables] provides, like remote access, are expected to be implemented at the script level. 2013-05-25 The repository now has V0.4 which includes an experimental optional parallelized sort. On Windows this uses native calls and is built by default. Other platforms require the use of the libdispatch cross-platform library. This has also been tested on Windows but not on other platforms. Anybody having access to a multicore Linux box and willing to build and test it please let me know. Initial non-scientific benchmarks results show tarray sorts being more than 6 times as fast as array sizes grow larger. ====== Version: 0.4 Run date: Sun May 26 06:33:52 IST 2013 Size Type lsort singlethread parallel lsort+mt 1000 strings 893 920 (0.97) 1030 (0.87) 866 (1.03) 1000 doubles 750 408 (1.84) 320 (2.34) 563 (1.33) 1000 ints 689 384 (1.79) 301 (2.29) 506 (1.36) 10000 strings 13394 12318 (1.09) 8310 (1.61) 9199 (1.46) 10000 doubles 9897 4649 (2.13) 2995 (3.30) 4693 (2.11) 10000 ints 9168 4076 (2.25) 2374 (3.86) 3930 (2.33) 100000 strings 223131 197087 (1.13) 139956 (1.59) 153763 (1.45) 100000 doubles 140676 56714 (2.48) 30745 (4.58) 48894 (2.88) 100000 ints 130058 47122 (2.76) 26130 (4.98) 41090 (3.17) 1000000 strings 3248631 2861923 (1.14) 1962076 (1.66) 2152512 (1.51) 1000000 doubles 1982473 686388 (2.89) 378555 (5.24) 637827 (3.11) 1000000 ints 1829315 514018 (3.56) 276120 (6.63) 435518 (4.20) ====== The lsort column is the baseline showing the performance of lsort for each data type (using the -real, -integer etc. options). The numbers in parenthesis show performance relative to the lsort baseline. The next two columns show tarray performance without parallelizing and parallelized to 2 threads. The last column shows the performance of a wrapper that converts a list to a tarray and back. Note this last column shows that even if you prefer to keep your data as standard lists, a two way transform to do a sort is very much worthwhile from a performance perspective. <> [SEH] -- It would be useful to be able to do dumps of raw binary data once a TArray had been constucted. Then one could, for example, use it with a [reflected channel] to duplicate the function of [memchan]. Or to do device I/O. (Is this already possible?) [APN] Direct I/O files/databases to and from typed arrays is on the to-do list but low down for a couple of reasons. First, there are still a bunch of basic operations and optimizations that have to be implemented to make the package more useful as a building block. Second, it is not clear what the output / input format should be. Even just binary dumps raise questions of endianness etc. ---- '''[AK] - 2013-04-18 22:18:04''' Tcllib already has a memchan emulation based on reflected channels. Documentation @ https://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/virtchannel_base/tcllib_memchan.html SEH -- Since that package is pure Tcl, I was hoping a TArray-based solution would be faster. AK -- What makes this then different from the original memchan ? SEH -- Nothing at all, except that the author states above that TArray is intended to be a modular base for a range of specific solutions. If the function of memchan could be duplicated, that would be one less purpose-specific package to maintain, replaced by a flexible multi-purpose tool. I think it would be healthier for the Tcl ecosystem to have fewer of the former and more of the latter, for an equivalent range of applications. <> Command | Package | Data Structure