Version 36 of Array enumeration

Updated 2016-11-29 20:10:04 by AMG

AMG: Tcl arrays are enumerated via the [array names] or [array get] commands or by the (virtually obsolete) combination of [array startsearch], [array nextelement], [array anymore], and [array donesearch]. At present, they cannot be enumerated via the Tcl C API. The FlightAware Tcl bounty programme [L1 ] seeks to address this and other Tcl shortcomings.

I have implemented a Tcl C API for enumerating array elements, patterned after the Tcl API listed above and on the C API for accessing variables. The starting point for the implementation was had by rearranging existing code as much as possible to minimize the inadvertent introduction of new bugs. All [array] commands have been reimplemented as calls into the new C API so it can be tested by the existing Tcl test suite.

This will be my first Tcl Improvement Proposal and my first case of Tcl core programming. Outside of random bug and documentation fixes, my Tcl C experience is writing proprietary Tcl extensions over the last 10-15 years, and my Tcl scripting experience goes back to 1999.


Code

http://core.tcl.tk/tcl/timeline?r=amg-array-enum-c-api


API

Here is my working API design. Please share any comments or questions you may have. I welcome opinions about the alternatives and wishes I outline below, and please feel free to add your own ideas.

Types

typedef struct ArraySearch *Tcl_ArraySearch

Tcl_ArraySearch is an opaque pointer to struct ArraySearch which has the search internals.

Arguments

Tcl_Interp *interp (in)

Interpreter containing array variable.

Tcl_Obj *part1Ptr (in)

Points to a Tcl object containing the variable's name. The name may include a series of :: namespace qualifiers to specify an array variable in a particular namespace.

Tcl_Obj *part2Ptr (in)

Points to a Tcl object containing the element name filter. If NULL, no filtering is applied.

Tcl_Obj *listPtr (in/out)

Tcl list object to which array names are appended.

Tcl_Obj *dictPtr (in/out)

Tcl dict object from/to which array names and values are read/written.

Tcl_Obj *stringPtr (in/out

Tcl string object to which hash table statistics information is appended.

Tcl_ArraySearch search (in)

Search token obtained from Tcl_ArraySearchStart().

int flags (in)

OR-ed combination of zero or more of the following bit values providing additional information:

TCL_GLOBAL_ONLY
The variable is looked up only in the global namespace even if there is a procedure call active.
TCL_NAMESPACE_ONLY
The variable is looked up only in the current namespace; if a procedure is active its variables are ignored, and the global namespace is also ignored unless it is the current namespace

OR-ed with zero or one of the following values:

TCL_MATCH_EXACT
The filter accepts only the element whose name exactly matches part2Ptr. This is the default.
TCL_MATCH_GLOB
The filter interprets part2Ptr as a [string match]-style glob expression.
TCL_MATCH_REGEXP
The filter interprets part2Ptr as a regular expression.

Functions

int Tcl_ArraySet(interp, part1Ptr, dictPtr, flags)

Set the elements of an array. If there are no elements to set, create an empty array.

int Tcl_ArrayUnset(interp, part1Ptr, part2Ptr, flags)

Unsets array elements.

int Tcl_ArrayGet(interp, part1Ptr, part2Ptr, dictPtr, flags)

Loads the elements of an array into a dict. The dict need not be empty before calling this function, in which case the array elements are merged with (and supersede) the original contents of the dict.

int Tcl_ArrayNames(interp, part1Ptr, part2Ptr, listPtr, flags)

Appends array element names to the listPtr list object. part2Ptr can be used to specify an element filter.

The reason listPtr is not created or reinitialized but rather appended to is so Tcl_ArrayNames() can be called multiple times on a single list object, with a different filter each time, to build a list of all elements matching any one of the filters. In this usage, elements matching multiple filters will be listed multiple times. To reset listPtr before calling Tcl_ArrayNames(), truncate it by calling Tcl_SetListObj(listPtr, 0, NULL).

int Tcl_ArraySize(interp, part1Ptr, part2Ptr, intPtr, flags)

Obtains the array size. part2Ptr can be used to specify an element filter, in which case the return value is the number of array elements matching the filter.

int Tcl_ArrayExists(interp, part1Ptr, part2Ptr, intPtr, flags)

Checks if an array exists or if at least one array element matches an optional filter.

Tcl_ArraySearch Tcl_ArraySearchStart(interp, part1Ptr, part2Ptr, flags)

Returns a Tcl_ArraySearch on success or NULL on error. part2Ptr can be used to specify an element filter.

Tcl_Obj *Tcl_ArraySearchPeek(search)

Returns name of next element or NULL if finished.

Tcl_Obj *Tcl_ArraySearchNext(search)

Returns name of next element or NULL if finished. The search state is updated so successive calls to Tcl_ArraySearchNext() will return successive array element names.

void Tcl_ArraySearchDone(search)

Cleans up array search internals.

int Tcl_ArrayStatistics(interp, part1Ptr, stringPtr, flags)

Obtains array hash table statistics.


Scope creep

The baseline [array names] supports filtering by exact, glob, and regular expression matches. The baseline [array get], [array set], and [array unset] support filtering only by glob matches. The baseline [array exists], [array size], and [array startsearch] don't support filtering at all. Because I implemented a common infrastructure for the array enumeration commands, I need to support the general case: filtering by exact, glob, and regular expression matches. In order to properly test this functionality, I need to expose it to the script level via options to all the commands listed in this paragraph.

Even though exposing [array get] at the C level was not part of my original plan, it's arguably part of array enumeration. Furthermore, it should be able to benefit from the new common infrastructure. Thus I added a function for it as well.

Since I did that much, I went ahead and completed the set by providing C interfaces to [array exists], [array set], [array unset], and even [array statistics]. All of [array] is now C-callable.

Now that I have multiple kinds of changes (new C API functions, new Tcl command arguments), do I need to write multiple TIPs? Or is the fact that they are closely and usefully related mean a single TIP will suffice?

Lastly, I am considering changing [array unset] to start by making a list of elements it intends to unset before it actually unsets them. This works around various pathological trace problems and is in accordance with DKF's recommendation found at [L2 ], dated 2010-02-03 16:46:16. It's now easy to do this. Just look at Tcl_ArrayGet() which already does this. Would this have to be a third TIP? It is a potential incompatibility in case any scripts actually depend on the current ill-defined interaction between [array unset] and traces.


Progress

20 November 2016
Began work, implemented Tcl_ArraySize().
24 November 2016
Published commits, fixed [array size] reporting of trace errors, improved Wiki page, added filtering. Implementation of above-listed functions is complete.
25 November 2016
Implemented Tcl_ArrayExists(), Tcl_ArraySet(), Tcl_ArrayUnset(), and Tcl_ArrayGet().
27 November 2016
Implemented Tcl_ArrayStatistics(), refined API, added man page for C API.
28 November 2016
Updated [array] man page, added filtering to [array exists] and Tcl_ArrayExists().

Remaining work

Write test suite, write TIP, respond to comments and any other issues that may arise.

I should help merge the recent [array for] work, though one of these two projects will have to be accepted first.

Also I am interested in using this work as a platform from which to implement the per-array default value feature requested by FlightAware. Is it too soon to claim this? I'm not trying to be greedy here, I just think I'm in a good position to get it done because of the rest of my array work.