Using array syntax for, well, array manipulation

Arjen Markus (17 june 2018) In the discussion on adding RBC to Tk recently on the Tcl core mailing list the idea of data objects as implemented in the original BLT extension was brought to my attention - see [L1 ]. Another extension that uses data objects as an alternative to lists, associative arrays and dicts is VecTcl. But there are others as well. We could conclude that the use of large/largish amounts of data is a ubiquitous topic.

While the actual implementation of such extensions is an interesting in its own, we can also look at the syntax that is used. One intriguing aspect of BLT's vector data type is that you can use array syntax to access the values.

This aspect inspired me to the following toy program:

  • At the script level we want to use a suitable syntax, such as the array syntax to access the values and even manipulate the values.
  • At the script level the actual implementation is irrelevant.

The toy program uses the trace command to separate the implementation of data storage from the syntax. Borrowing from Fortran, it allows "array sections" as if they were array elements:

    set array(1:10) {1 2 3 4 5 6 7 8 9 10}

does not set an array element "1:10" but is expanded to 10 virtual elements that take as values 1, 2, 3, etc from the given list.

The implementation is rather limited - ideally you would want to have multidimensional arrays and performance issues are completely ignored. Think of storing 10 million data, filtering them with some criterium and summing the result. You would want to have most of that done in low-level languages like C, just not the specification:

    puts "Sum of positive values: [sum [filter x {$x > 0.0} $data]]"

or some such functionality.

Well, enough explanation. Here is the program:

# array_obj.tcl --
#     Use the ideas behind BLT's vector objects in a pure Tcl fashion:
#     - a Tcl array is backed up by "private" data
#     - the array syntax is used to access these data
#     - traces allow access to the actual data
#
proc mkarray {name size} {
    global _$name
    upvar 1 $name arrayName

    set _$name [lrepeat $size 0]

    trace add variable arrayName read  readElement
    trace add variable arrayName write writeElement
}

proc readElement {arrayName element op} {
    global _$arrayName
    upvar 1 $arrayName array

    if { [string first : $element] < 0 } {
        set array($element) [lindex [set _$arrayName] $element]
    } else {
        regexp {(\d+):(\d+)} $element => first last
        set array($element) [lrange [set _$arrayName] $first $last]
    }
}

proc writeElement {arrayName element op} {
    global _$arrayName
    upvar 1 $arrayName array

    if { [string first : $element] < 0 } {
        lset _$arrayName $element $array($element)
    } else {
        regexp {(\d+):(\d+)} $element => first last

        incr first -1
        incr last   1
        if { [llength $array($element)] > 1 } {
            set _$arrayName [concat [lrange [set _$arrayName] 0 $first]   \
                                    $array($element)                      \
                                    [lrange [set _$arrayName] $last end]]
        } else {
            set _$arrayName [concat [lrange  [set _$arrayName] 0 $first]             \
                                    [lrepeat [expr {$last-$first-1}] $array($element)] \
                                    [lrange  [set _$arrayName] $last end]]
        }
    }
}

#
# Demonstrate the idea ...
#
mkarray x 100

set x(1) 2
puts "x: $x(1)"

set x(2:11) {1 2 3 4 5 6 7 8 9 10}
puts $x(0:15)

set x(0:4) 11
puts $x(0:15)

# Now iterate over a part of the array
set sum 0
foreach v $x(0:20) {
    set sum [expr {$sum + $v**2}]
}
puts "Sum of squares: $sum"

# Or:
proc sum {lambda list} {
    set result 0
    foreach item $list {
        set result [expr {$result + [apply $lambda $item]}]
    }
    return $result
}

puts "Sum of squares: [sum {v {expr {$v**2}}} $x(0:20)]"

arjen - 2018-06-19 07:28:06

I intend to write a small C extension that uses these ideas, but that may take some time.


joheid - 2018-06-19

I think, R offers some more ideas of good functionality ....


arjen - 2018-06-20 07:17:34

Certainly, for now this is merely an experiment to see if Tcl's current features can be exploited to work with array operations. When this becomes something more serious, we will have to have a look at all the possibilities other languages demonstrate already.