Low-level vector data in Tcl

The "mvec" command is now part of the mvector package in CritLib [L1 ]


What is a vector, and how does it differ from a list/array/whatever? Is this the same as BLT's vector?

The essential difference is the internal representation. For a discussion on arrays, see Adding a hashed datatype and Better Arrays for Tcl9. BLT only had vectors of doubles, and was not using Tcl_Obj*'s, last time I looked. On the other hand, BLT also tries to make vectors and arrays play well together. --jcw


See also VKit is a vector engine.

See also NAP.

See also VecTcl.

See also tarray.


This demo illustrates an experimental "mvec" command and datatype for Tcl, which provides efficient access and storage to a variety of binary formats: 1/2/4/8/16/32-bit ints, 32/64-bit floats, and reversed endian.

The interface is essentially read-only (but stay tuned for a couple of ideas on how to deal with modifiable data).

Some key features are:

  • a natural integration between "mvec" lists and "lindex" lists
  • the C code extensions is optional, there is also a pure-Tcl version
  • integration with memory-mapped file access (see [L2 ])

Mvec hinges on the dual-representation model of Tcl in a fundamental way. The basis is the storage of a little "descriptor list in a variable". That descriptor defines the access mode and optionally also the data.

An example: the list "8 abc 3" is a descriptor of a vector of size 3, with 8-bit elements. So the values are the character codes matching the ASCII chars "abc". Let's try to access it:

    set a {8 abc 3}
    puts [mvec a 0]   ;# returns "97", the numeric code for "a"
    puts [mvec a 1]   ;# returns "98"

Another example, accessing as 1-bit ints this time:

    set a {1 abc 24}
    puts [mvec a 0]   ;# returns "1", the lowest bit of "a"
    puts [mvec a 1]   ;# returns "0"

Lets get all the data, using that last example:

    puts [mvec a 0 -1]
      # returns "1 0 0 0 0 1 1 0 0 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0"

Or the 4 items starting at position 8:

    puts [mvec a 8 4]
      # returns "0 1 0 0"

Here is a very different example:

    set a {lindex {un deux trois} 3}
    puts [mvec a 0]   ;# returns "un"
    puts [mvec a 1]   ;# returns "deux"

Explanation: descriptors can also be a Tcl call. What "mvec" does, if the first value is not one it recognizes, is treat the whole thing as a virtual descriptor instead. This unifies the concept of "stored data" and "calculated data". For mvec, indexing works in one consistent way.

Some more background:

    set a {1 abc 24}
    puts [mvec a]     ;# returns "24 1 0"
    set a {3 abc 3}
    puts [mvec a]     ;# returns "3 8 3"
    set a {lindex {un deux trois} 3}
    puts [mvec a]     ;# returns "3 {} {lindex {un deux trois} 3}"

The three values returned in the above examples are:

  • the size of the vector
  • its encoding mode (an empty string if calculated)
  • the bits/item, or the script

It might seem like overkill to return the script as third arg, after all that will be precisely the same as $a ... but it is quite important for reasons of performance, that the descriptor not be converted to a string by doing things like "puts $a" when "a" is used as an mvec descriptor. Instead, use "puts lindex [mvec a 2]".

There's far more to say about mvec one day (such as describing element- wise processing, and its filtering capability), but for now this'll have to do. For examples, see the "jolt.tcl" script - it has several small samples at the end.

JOLT stands for JC's Own Little Toolbox - it's a context I use to try out new things, especially when mixing Tcl and C and exploring possibilities. This demo includes the C code for mvec (and a few more pieces), with a binary build (just Linux for now).

But there's also a "jolt.tcl" script, containing a pure-Tcl implementation of mvec. It is of course far slower, but it's also platform-independent.

PERFORMANCE

Well, that's one of the things this is all about. Mvec is designed to soar as high as the Tcl C interface will let it go. It evolved from the MetaKit database library, which uses a highly vectorized (columnar) design.

Below are three sections to give some (preliminary) insight into the levels of performance that are being reached right now:

  • the script that generated the output
  • the run *with* the C-level Jolt extension
  • the run *without* the C-level Jolt extension (i.e. pure Tcl)

It's easy to miss, but the output lines marked "mvec {0,1,2,4,8}" show that mvec can unpack large vectors of small ints at the rate of 10 PER microSec.

Warning: don't draw too many conclusions from this right now. There is far more to it than meets the eye. Representation formats, memory use, coding loops, locality of reference, CPU caching - it all matters greatly!

The results are still very early. Vkit, Jolt, et al. is work in progress.


Script

  puts "version = [package require jolt]"

  if {[info commands nop] != ""} {
    puts "10 nops = [time {nop;nop;nop;nop;nop;nop;nop;nop;nop;nop} 10000]"
  }

  proc mvec_try {} {
    if {[info procs ::mvec] != ""} { set N 1000 } else { set N 10000 }

    set fd [open [info script]]

    foreach x {1 2 4 8 16 16r 32 32r} {
      set v [list $x $fd 256 1000]
      set a {}
      for {set i 0} {$i < 6} {incr i} {
        lappend a [format %08x [mvec v $i]]
      }
      puts "mvec [format %3s $x] = [join $a { }]"
    }

    set v [list 32 $fd 256 1000]
    set r [list 32r $fd 256 1000]
    set w [list 32 $fd 256]
    puts "mvec rawlist = [time {mvec v 10} $N]"
    puts "mvec reverse = [time {mvec r 10} $N]"
    puts "mvec convert = [time {set x $w; lappend x 1000; mvec v 10} $N]"
    puts "mvec shimmer = [time {set v $w; lappend v 1000; mvec v 10} $N]"
    close $fd

    set t [list 8 "Hello world!"]
    set v {}
    for {set i 0} {$i < 12} {incr i} {
      lappend v [mvec t $i]
    }
    puts "mvec vardata = $v"
    puts "mvec varname = [time {mvec t 10} $N]"

    proc identity {idx} {
      return $idx
    }

    proc showargs {args} {
      puts "showargs: $args";
      set x [lindex $args end]
      return [expr {$x * $x}]
    }

    set u [list showargs U 1234]
    puts "mvec virtual = [mvec u 11]"
    set q [list identity 1234]
    puts "mvec virsize = [mvec q]"
    puts "mvec virtime = [time {mvec q 123} $N]"
    puts "mvec vircall = [time {identity 123} $N]"

    set v [list 8 bingo]
    puts "mvec size = [mvec v]"
    puts "mvec data = [mvec v 0 5]"

    set x [string repeat x 10000]
    puts "splitchr = [time {split $x {}}]"
    puts "bytescan = [time {binary scan $x c* v}]"
    set y [list string index $x 10000]
    puts "strindex = [time {mvec y 0 -1}]"
    set y [list 0 123 10000]
    puts "mvec   0 = [time {mvec y 0 -1}]"
    set y [list 1 $x 10000]
    puts "mvec   1 = [time {mvec y 0 -1}]"
    set y [list 2 $x 10000]
    puts "mvec   2 = [time {mvec y 0 -1}]"
    set y [list 4 $x 10000]
    puts "mvec   4 = [time {mvec y 0 -1}]"
    set y [list 8 $x]
    puts "mvec   8 = [time {mvec y 0 -1}]"

    set z [list 16 $x$x]
    puts "mvec  16 = [time {mvec z 0 -1}]"
    set z [list 16r $x$x]
    puts "mvec 16r = [time {mvec z 0 -1}]"

    set z [list 32 $x$x$x$x]
    puts "mvec  32 = [time {mvec z 0 -1}]"
    set z [list 32r $x$x$x$x]
    puts "mvec 32r = [time {mvec z 0 -1}]"

    set z [list 64 $x$x$x$x$x$x$x$x]
    puts "mvec  64 = [time {mvec z 0 -1}]"
    set z [list 64r $x$x$x$x$x$x$x$x]
    puts "mvec 64r = [time {mvec z 0 -1}]"

    set ::g [mvec y 0 10000]
    puts "mvecsize = [llength $::g]"
    puts "base lx = [time {lindex $::g 10} $N]"
    set v [list lindex $::g [llength $::g]]
    puts "mvec lx = [time {mvec v 10} $N]"
    puts "vall lx = [time {mvec v 0 -1}]"
    proc p0 {i} { return 123 }
    puts "base p0 = [time {p0 10} $N]"
    set v [list p0 10000]
    puts "mvec p0 = [time {mvec v 10} $N]"
    puts "vall p0 = [time {mvec v 0 -1}]"
    proc p1 {i} { return [lindex $::g $i] }
    puts "base p1 = [time {p1 10} $N]"
    set v [list p1 10000]
    puts "mvec p1 = [time {mvec v 10} $N]"
    puts "vall p1 = [time {mvec v 0 -1}]"
    proc p9 {a1 a2 a3 a4 a5 a6 a7 a8 a9 i} { return [lindex $::g $i] }
    puts "base p9 = [time {p9 1 2 3 4 5 6 7 8 9 10} $N]"
    set v [list p9 1 2 3 4 5 6 7 8 9 10000]
    puts "mvec p9 = [time {mvec v 10} $N]"
    puts "vall p9 = [time {mvec v 0 -1}]"

    set z [list 0 9 99]
    puts "num = [mvec z 0 5]"
    set z [list 0 nine 99]
    puts "str = [mvec z 0 5]"
    set z [list 0 [list I X] 99]
    puts "two = [mvec z 0 5]"

    # various identity mapping comparisons
    set x [list set _ 10000]
    puts "id set _ = [time {mvec x 0 -1}]"
    set x [list format %d 10000]
    puts "id formt = [time {mvec x 0 -1}]"
    set x [list string trimleft 10000]
    puts "id tleft = [time {mvec x 0 -1}]"
    set x [list string map {} 10000]
    puts "id s map = [time {mvec x 0 -1}]"
    set i [list identity 10000]
    puts "o timing = [time {mvec i 0 -1}]"
    puts "orig len = [llength [mvec i 0 -1]]"
    proc even {v x y} { return [expr {$x % 2 == 0}] }
    puts "p timing = [time {mvec i 0 -1 even}]"
    puts "pred len = [llength [mvec i 0 -1 even]]"
    puts "e timing = [time {mvec i 0 -1 j {$j % 2 == 0}}]"
    puts "even len = [llength [mvec i 0 -1 j {$j % 2 == 0}]]"
    puts "h timing = [time {mvec i 0 5000}]"

    proc mkident {n} {
      set v {}
      for {set i 0} {$i < $n} {incr i} { lappend v $i }
      return $v
    }

    puts "proc step = [time {mkident $N}]"
    set i [list - 0 10000]
    puts "minus inc = [time {mvec i 0 -1}]"
    puts "minus val = [mvec i 0 10]"
    puts "minus big = [time {mvec i 0 $N}]"
    puts "minus len = [llength [mvec i 0 10000]]"
    puts "minus end = [lindex [mvec i 0 10000] end]"

    set ::v1 [list 0 3 1000]
    proc accum {i} { return [incr ::sum [mvec ::v1 $i]] }
    set ::sum 0
    set c [list accum 1000]
    puts "accum time = [time {mvec c 0 -1}]"
    set ::sum 0
    set d [mvec c 0 -1]
    puts "accum last = [lindex $d end]"

    set i [list lindex $::g 10000]
    puts "direct = [time {mvec i 0 -1}]"
    proc geti {v i} { return [lindex $v $i] }
    set i [list geti $::g 10000]
    puts "indirg = [time {mvec i 0 -1}]"
    proc getg {v i} { return [lindex [set $v] $i] }
    set i [list getg ::g 10000]
    puts "global = [time {mvec i 0 -1}]"
  }

Output (SuSE 7.1 Linux, PIII/650) - WITH MVEC IN C:


  version = 2001.11.06.194312
  10 nops = 7 microseconds per iteration

    MVEC
    ====

  mvec   1 = 00000001 00000000 00000001 00000001 00000000 00000001
  mvec   2 = 00000001 00000003 00000002 00000001 00000001 00000000
  mvec   4 = 0000000d 00000006 00000001 00000006 00000000 00000007
  mvec   8 = 0000006d 00000061 00000070 0000003a 00000020 00000066
  mvec  16 = 0000616d 00003a70 00006620 00006c61 0000626c 00006361
  mvec 16r = 00006d61 0000703a 00002066 0000616c 00006c62 00006163
  mvec  32 = 3a70616d 6c616620 6361626c 6f74206b 6c635420 0a7d2022
  mvec 32r = 6d61703a 2066616c 6c626163 6b20746f 2054636c 22207d0a
  mvec rawlist = 2 microseconds per iteration
  mvec reverse = 2 microseconds per iteration
  mvec convert = 8 microseconds per iteration
  mvec shimmer = 13 microseconds per iteration
  mvec vardata = 72 101 108 108 111 32 119 111 114 108 100 33
  mvec varname = 2 microseconds per iteration
  showargs: U 11
  mvec virtual = 121
  mvec virsize = 1234 {} {identity 1234}
  mvec virtime = 5 microseconds per iteration
  mvec vircall = 3 microseconds per iteration
  mvec size = 5 8 3
  mvec data = 98 105 110 103 111
  splitchr = 2372 microseconds per iteration
  bytescan = 3248 microseconds per iteration
  strindex = 22874 microseconds per iteration
  mvec   0 = 719 microseconds per iteration
  mvec   1 = 973 microseconds per iteration
  mvec   2 = 1270 microseconds per iteration
  mvec   4 = 970 microseconds per iteration
  mvec   8 = 917 microseconds per iteration
  mvec  16 = 4012 microseconds per iteration
  mvec 16r = 4096 microseconds per iteration
  mvec  32 = 5467 microseconds per iteration
  mvec 32r = 5067 microseconds per iteration
  mvec  64 = 14150 microseconds per iteration
  mvec 64r = 13884 microseconds per iteration
  mvecsize = 10000
  base lx = 2 microseconds per iteration
  mvec lx = 3 microseconds per iteration
  vall lx = 11362 microseconds per iteration
  base p0 = 2 microseconds per iteration
  mvec p0 = 5 microseconds per iteration
  vall p0 = 24318 microseconds per iteration
  base p1 = 5 microseconds per iteration
  mvec p1 = 7 microseconds per iteration
  vall p1 = 49533 microseconds per iteration
  base p9 = 6 microseconds per iteration
  mvec p9 = 8 microseconds per iteration
  vall p9 = 59702 microseconds per iteration
  num = 9 9 9 9 9
  str = nine nine nine nine nine
  two = {I X} {I X} {I X} {I X} {I X}
  id set _ = 21696 microseconds per iteration
  id formt = 45962 microseconds per iteration
  id tleft = 30672 microseconds per iteration
  id s map = 14276 microseconds per iteration
  o timing = 27590 microseconds per iteration
  orig len = 10000
  p timing = 62039 microseconds per iteration
  pred len = 5000
  e timing = 77364 microseconds per iteration
  even len = 5000
  h timing = 14642 microseconds per iteration
  proc step = 17689 microseconds per iteration
  minus inc = 2060 microseconds per iteration
  minus val = 0 1 2 3 4 5 6 7 8 9
  minus big = 2091 microseconds per iteration
  minus len = 10000
  minus end = 9999
  accum time = 10219 microseconds per iteration
  accum last = 3000
  direct = 12119 microseconds per iteration
  indirg = 34260 microseconds per iteration
  global = 51344 microseconds per iteration

Output (SuSE 7.1 Linux, PIII/650) - PURE TCL RUN:


  version = 0.0
  10 nops = 7 microseconds per iteration

    MVEC
    ====

  mvec   1 = 00000001 00000000 00000001 00000001 00000000 00000001
  mvec   2 = 00000001 00000003 00000002 00000001 00000001 00000000
  mvec   4 = 0000000d 00000006 00000001 00000006 00000000 00000007
  mvec   8 = 0000006d 00000061 00000070 0000003a 00000020 00000066
  mvec  16 = 0000616d 00003a70 00006620 00006c61 0000626c 00006361
  mvec 16r = 00006d61 0000703a 00002066 0000616c 00006c62 00006163
  mvec  32 = 3a70616d 6c616620 6361626c 6f74206b 6c635420 0a7d2022
  mvec 32r = 6d61703a 2066616c 6c626163 6b20746f 2054636c 22207d0a
  mvec rawlist = 135 microseconds per iteration
  mvec reverse = 128 microseconds per iteration
  mvec convert = 136 microseconds per iteration
  mvec shimmer = 134 microseconds per iteration
  mvec vardata = 72 101 108 108 111 32 119 111 114 108 100 33
  mvec varname = 90 microseconds per iteration
  showargs: U 11
  mvec virtual = 121
  mvec virsize = 1234 {} {identity 1234}
  mvec virtime = 52 microseconds per iteration
  mvec vircall = 2 microseconds per iteration
  mvec size = 5 8 3
  mvec data = 98 105 110 103 111
  splitchr = 2629 microseconds per iteration
  bytescan = 3275 microseconds per iteration
  strindex = 127347 microseconds per iteration
  mvec   0 = 246942 microseconds per iteration
  mvec   1 = 460335 microseconds per iteration
  mvec   2 = 460603 microseconds per iteration
  mvec   4 = 467320 microseconds per iteration
  mvec   8 = 473284 microseconds per iteration
  mvec  16 = 487373 microseconds per iteration
  mvec 16r = 483201 microseconds per iteration
  mvec  32 = 486319 microseconds per iteration
  mvec 32r = 484024 microseconds per iteration
  mvec  64 = 518690 microseconds per iteration
  mvec 64r = 511418 microseconds per iteration
  mvecsize = 10000
  base lx = 2 microseconds per iteration
  mvec lx = 49 microseconds per iteration
  vall lx = 110759 microseconds per iteration
  base p0 = 2 microseconds per iteration
  mvec p0 = 49 microseconds per iteration
  vall p0 = 142445 microseconds per iteration
  base p1 = 5 microseconds per iteration
  mvec p1 = 52 microseconds per iteration
  vall p1 = 184868 microseconds per iteration
  base p9 = 7 microseconds per iteration
  mvec p9 = 55 microseconds per iteration
  vall p9 = 200334 microseconds per iteration
  num = 9 9 9 9 9
  str = nine nine nine nine nine
  two = {I X} {I X} {I X} {I X} {I X}
  id set _ = 134189 microseconds per iteration
  id formt = 177690 microseconds per iteration
  id tleft = 137569 microseconds per iteration
  id s map = 116554 microseconds per iteration
  o timing = 144760 microseconds per iteration
  orig len = 10000
  p timing = 304220 microseconds per iteration
  pred len = 5000
  e timing = 288705 microseconds per iteration
  even len = 5000
  h timing = 72741 microseconds per iteration
  proc step = 1817 microseconds per iteration
  minus inc = 245631 microseconds per iteration
  minus val = 0 1 2 3 4 5 6 7 8 9
  minus big = 24741 microseconds per iteration
  minus len = 10000
  minus end = 9999
  accum time = 85872 microseconds per iteration
  accum last = 3000
  direct = 111821 microseconds per iteration
  indirg = 151856 microseconds per iteration
  global = 188127 microseconds per iteration