**Announcement** === ANNOUNCE: Colibri version 0.1 ============================= As part of my ongoing work on the Cloverfield project, I am pleased to announce the first version of colibri. WHAT IS CLOVERFIELD ? ===================== Wiki page: http://wiki.tcl.tk/Cloverfield WHAT DOES COLIBRI STANDS FOR ? ============================== Colibris, known in English as hummingbirds, are a family of bird known for their small size and high wing speed. The bee hummingbird (Mellisuga helenae), found in Cuba, is the smallest of all birds, with a mass of 1.8 g and a total length of about 5cm. They are renowed for their stationary and backward flight abilities on par with flying insects, which allows them to feed on the nectar of plants and flowers in-flight. I've chose this name for this project because its goal is to be fast and lightweight, and to follow the feather and bird theme shared with Tcl and many related projects. WHAT IS COLIBRI ? ================= Colibri is a string and data type infrastructure. It features: - Rope-based strings (see http://wiki.tcl.tk/20690) : * ropes are immutable ... but ... * extraction, insertion, deletion, concatenation... are fast * limited compatibility with native C strings allocated in static space (or guaranteed to survive the whole application lifetime) * support for the full Unicode range (up to 32-bit codepoints) * 1/2/4-byte fixed-width encodings as well as variable-width UTF-8 with transparent conversions and access * custom rope types allows for lazy or procedural representations * iterator interface for easy access - Extensible data type sytem dubbed "words" * similar to Tcl_Obj * minimum payload is identical to that of Tcl_Obj, i.e. 8 bytes, but can take more space if needed * words have values that can be of any type, including other words * values can form chains of arbitrary length, so a word can have several values with different representations -- no more shimmering! * predefined types such as ints, single chars and small strings (up to 3 chars on 32-bit systems) are represented as immediate values: the value is stored directly in the pointer rather than in an allocated structure - Fast and efficient cell-based allocator * page-based allocation for optimal locality of reference and cache use * 16-byte cells on 32-bit systems fit most needs, but elements can allocate more cells if needed (up to 63 cells on 32-bit systems) * overhead is small: 2 bits per 16-byte cell. * raw alloc performances are competitive with the stock system malloc, and in many cases much faster (up to 5 times as fast on small strings and on words vs. Tcl_Obj-like structs) * single cell allocation (the most frequent case) is very fast - Automatic memory management thanks to an exact (AKA accurate or precise), generational, copying, mark-and-sweep, garbage collector * exact GC implies that roots (externally referenced elements) and parent-child relations must be declared by the application, using a simple API * custom types can define a finalizer * the GC process is fully controllable (pause/resume) so that the application don't get interrupted unexpectedly * generational GC limits the overhead by restricting the collection to newer elements, which are typically short-lived * longer-living elements are collected less often and get promoted after two cycles * promotion is done by copying old elements to older pools, which performs compaction as a side effect, thus limiting the fragmentation and improving the cache-friendliness over time * contrary to reference-counting schemes (RC), GCs allow circular references without memory leaks * studies show that GCs consistently outperform RC in raw performances and real-world cases, because the cost (both space and time) of maintaining reference counters outweights the amortized GC overhead, especially with generational GCs Colibri is written in plain C and is free of any dependency besides system libraries. The compiled binary DLL on Windows is about 27kB. The source code is heavily commented and follows the Tcl quality standards. HOW DOES COLIBRI RELATES TO TCL =============================== From the Cloverfield announcement: " The last major point of the project is related to implementation and low level issues. The existing Tcl implementation is notoriously solid, however some changes are too radical to be performed on the current code base, so a reimplementation is certainly needed on vast portions. For example, the string representations, object structures, and virtual machines will certainly require complete rewrite to accommodate with the needed changes, which means that a lot of code won't be reused. However vast portions of library code and algorithms certainly will (clock scan, bigint, regexp...), as well as the high coding standards and QA that are the trademarks of our beloved language. " So Colibri is a candidate infrastructure for Tcl9 as an alternative to the current Tcl_Obj-based core implementation. I believe that the features provided by Colibri shall yield higher levels of performances than the current architecture, at the price of an ABI incompatibility (for which major versions are made anyway), but with a similar philosophy that should ease conversion of existing code. WHAT NEEDS TO BE DONE ? ======================= My main development platform is Windows, so for now the source archive only includes Microsoft Visual Studio project files. Microsoft provides a free edition of their development environment known as Visual Studio Express for those willing to compile and test the library without having to buy an expensive license. Other systems need a makefile and/or autoconf scripts. The code is fairly portable on 32-bit systems. 64-bit support will need more work because all the internals are fine-tuned and optimized at the bit level; however porting to 64-bit should be rather straightforward: the algorithms will remain unchanged, structure access is abstracted behing macros, and cell size is proportional to the word size (a cell should be able to store 4 pointers, which add up to 16 bytes on 32-bit systems). The only part that needs platform-specific code is the low-level page allocator. Colibri needs system calls that allocate boundary-aligned pages. At present only a Windows version is provided, but porting to other systems should require minimal effort, as the platform-specific code is limited to a handful of functions. Platform-specific peculiarities should not impact the overall architecture. Exception handling is poor, especially because Colibri is meant to be part of a larger system that provides the appropriate infrastructure. The current implementation is not thread-safe as it uses unprotected global structures. However I don't think that adding synchronization is necessary given that it should eventually be integrated into a larger application with the same appartment threading model as Tcl. The main advantage of having a global allocator is inter-thread data sharing, and Tcl typically shares no data between threads. As Colibri is not designed to be a general-purpose solution, and unless there is a real benefit in using a global thread-safe allocator, converting global structures to thread-local should fit our needs. Tests have been run extensively during development, both on stability and performance. However Colibri lacks a real test suite. The source includes a test application that has to be hand-edited to run or customize specific tests. Note that the current values defined by this application require a system with a large memory size (2GB). Last, it lacks user and design documentation, although the source code is extensively commented. WHERE CAN IT BE FOUND ? ======================= Wiki page: http://wiki.tcl.tk/Colibri Project page @ SourceForge.net: https://sourceforge.net/projects/tcl9/ Mailing list: https://sourceforge.net/mailarchive/forum.php?forum_name=tcl9-colibri Direct Download: - source: http://downloads.sourceforge.net/tcl9/colibri0.1.src.zip?use_mirror= - Windows binary: http://downloads.sourceforge.net/tcl9/colibri0.1.win32.zip?use_mirror= LICENSE ======= The license is the same as for Tcl. === ---- !!!!!! %| [Category Data Structure] | [Category Cloverfield] |% !!!!!!