Version 1 of Experimenting with numPy

Updated 2020-11-02 11:01:04 by arjen

Arjen Markus (16 october 2020) In the Tclers' chat the other day, Steve Blinkhorn mentioned an article in Nature on the numPy package that is used for scientific and technical calculation in Python. The question he posed was whether it would be an idea to build an interface to numPy, so that we can use it from a Tcl program. One reason for this: numPy is a well-respected and well-known package and it would potentially make it easier for Tclers to get their work accepted.

Steve Landers told me of the tclpy package that enables you to call Python from within Tcl and vice versa (see Accessing Tcl and Python from one another for more information, as there are solutions too). So, this provided me a first step towards using numPy in a Tcl program. Well, "program" is a bit of a big word, as I have not tried anything serious yet. Still, it is a useful first step to investigate how this package can be used.

The preparations were simple: build the package and load it into Tcl. My Cygwin environment presented a few problems, though, as the makefile required a file python-config to provide the Python include files and libraries. After some experimenting I used the following command to build it:

gcc -o libtclpy0.3.so -shared generic/tclpy.c -I/usr/include/tcl8.6 -I/usr/include/python2.7 -ltcl8.6 -lpython2.7 -Wl,--export-all-symbols

and introduced a few missing macros in the source code:

#define PACKAGE_VERSION "0.3"
#define PY_LIBFILE "libpython2.7.dll"

Then it was a matter of trying a few commands. Like from the documentation:

package require tclpy

py eval {import numpy as np}
py eval {def xx(x,y): return x+y}
puts [py call xx string1 string2]

which diligently prints string1string2

Defining an array in the Python interpreter is a trifle more demanding:

set a [list 4 2 3]
py eval "a = np.array(\[[join $a ,]\])"

to convert the data in the list into the right format for Python.

Now, however, the limitation: the tclpy package currently passes strings, not variables. This makes it a trifle difficult to manipulate variables on the Python whose names we pass from the Tcl side. For instance: to sort the numbers in the array we just created, this does not work:

puts [py call np.sort a]

Instead we need to convert the string "a" into the Python variable "a". Luckily Python does offer that possibility via the globals() and locals() function.

So what we can do:

py eval {def tolist(array): var = globals().get(array); return var.tolist()}
puts [py call tolist a]

This works also with two-dimensional arrays

set a {[[4,2,3],[1,2,3]]}
py eval "z = np.array($a)"
puts [py call tolist z]

prints:

{4 2 3} {1 2 3}

Of course, this is not very efficient - with a large array or matrix you would need to convert everything to a long string, pass it to the Python interpreter, do the computations and pass the result back.

But the fact that it is so easy to do these first few steps makes it possible to think about the next steps: provide more direct transfer methods to pass large amounts of numerical data to and fro.


AM (2 november 2020) Here is a small program to test the performance of the interface. Note: this still uses the Python 2 version - in the meantime Aidan has updated the code to support Python 3, but I have no tested that yet.

# perform.tcl --
#     Simple checks on the performance of the interface to Python
#
#     Check: setting up matrices of different sizes
#     Idea:
#     Does it make a difference if you fill the matrix all at once or
#     fill it one element at a time?
#     What about filling it by row or by column?
#     Matrix sizes: 10x10, 30x30 and 100x100
#
#     Clear result:
#     Setting the values in the whole matrix at once is 3x faster
#     than setting the entries one by one
#
#     Further checks:
#     - Getting back the matrix
#     - Solving a linear system (comparing with math::linearalgebra?)
#

proc setMatrix {size} {
    set matrix {}

    for {set r 0} {$r < $size} {incr r} {
        set row {}
        for {set c 0} {$c < $size} {incr c} {
            lappend row [expr {rand()}]
        }
        lappend matrix "\[[join $row ,]\]"
    }
    set pythonMatrix "\[[join $matrix ,]\]"

    py eval "matrix = np.array($pythonMatrix)"
}

proc setByElement {size} {

    for {set r 0} {$r < $size} {incr r} {
        for {set c 0} {$c < $size} {incr c} {
            set rnd [expr {rand()}]
            py eval "matrix\[$r,$c\] = $r"
        }
    }
}

proc getMatrix {size} {
    py call tolist matrix
}

proc getByElement {size} {

    for {set r 0} {$r < $size} {incr r} {
        for {set c 0} {$c < $size} {incr c} {
            py call tostr matrix $r $c
        }
    }
}

proc solveEquations {} {

    py eval "x = np.linalg.solve(matrix,b)"

    set x [py call tolist x]
}

set auto_path [concat . $auto_path]
package require tclpy

py eval {import numpy as np}
py eval {def tolist(array): aa = globals().get(array); return aa.tolist()}
py eval {def tostr(array,r,c): aa = globals().get(array); rr = int(r); cc = int(c); return aa[rr,cc]}


# Test:
#     First set up the matrix on the Python side, then
#     use the interface to fill the entries
#     Use random numbers to make sure no string representation
#     exists.
#
foreach size {10 30 100} counts {100 30 10} {
    set size2 [expr {$size * $size}]
    py eval "matrix = np.zeros($size2).reshape($size,$size)"

    set result1 [time {setMatrix    $size} $counts]
    set result2 [time {setByElement $size} $counts]
    #set result3 [time {setByRow     $size} $counts]
    #set result4 [time {setByColumn  $size} $counts]

    puts "Size: $size"
    puts "    Fill the matrix at once:              $result1"
    puts "    Fill the matrix one entry at a time:  $result2"
    #puts "    Fill the matrix one row at a time:    $result3"
    #puts "    Fill the matrix one column at a time: $result4"
}

#
# Now retrieve the values of a matrix
#
foreach size {10 30 100} counts {100 30 10} {
    set size2 [expr {$size * $size}]
    py eval "matrix = np.random.random(($size,$size))"

    set result1 [time {getMatrix    $size} $counts]
    set result2 [time {getByElement $size} $counts]
    #set result3 [time {setByRow     $size} $counts]
    #set result4 [time {setByColumn  $size} $counts]

    puts "Size: $size"
    puts "    Retrieve the matrix at once:              $result1"
    puts "    Retrieve the matrix one entry at a time:  $result2"
    #puts "    Fill the matrix one row at a time:    $result3"
    #puts "    Fill the matrix one column at a time: $result4"
}

#
# Solve a linear system
#
foreach size {10 30 100} counts {100 30 10} {
    py eval "matrix = np.random.random(($size,$size))"
    py eval "b      = np.random.random($size)"

    set result [time {solveEquations} $counts]

    puts "Size: $size"
    puts "    Solve a linear system: $result"
}

#puts [solveEquations]

The output on my system, for easy reference:

Size: 10
    Fill the matrix at once:              508.48 microseconds per iteration
    Fill the matrix one entry at a time:  1149.97 microseconds per iteration
Size: 30
    Fill the matrix at once:              2593.766666666667 microseconds per iteration
    Fill the matrix one entry at a time:  10133.833333333334 microseconds per iteration
Size: 100
    Fill the matrix at once:              33364.6 microseconds per iteration
    Fill the matrix one entry at a time:  110351.6 microseconds per iteration
Size: 10
    Retrieve the matrix at once:              76.88 microseconds per iteration
    Retrieve the matrix one entry at a time:  337.51 microseconds per iteration
Size: 30
    Retrieve the matrix at once:              731.1 microseconds per iteration
    Retrieve the matrix one entry at a time:  3083.6 microseconds per iteration
Size: 100
    Retrieve the matrix at once:              7080.7 microseconds per iteration
    Retrieve the matrix one entry at a time:  35484.4 microseconds per iteration
Size: 10
    Solve a linear system: 81.17 microseconds per iteration
Size: 30
    Solve a linear system: 87.33333333333333 microseconds per iteration
Size: 100
    Solve a linear system: 396.8 microseconds per iteration