Experimenting with numPy

[Arjen Markus] (16 october 2020) In the Tclers' chat the other day,
[Steve Blinkhorn] mentioned an [https://www.nature.com/articles/s41586-020-2649-2%|%article in Nature] on the ''numPy'' package
that is used for scientific and technical calculation in [Python]. The
question he posed was whether it would be an idea to build an interface
to ''numPy'', so that we can use it from a Tcl program. One reason for
this: ''numPy'' is a well-respected and well-known package and it would
potentially make it easier for Tclers to get their work accepted.

[Steve Landers] told me of the [https://github.com/aidanhs/libtclpy%|%tclpy] package that enables you
to call [Python] from within Tcl and vice versa (see [Accessing Tcl and Python from one another]
for more information, as there are solutions too). So, this provided me
a first step towards using ''numPy'' in a Tcl program. Well, "program"
is a bit of a big word, as I have not tried anything serious yet. Still,
it is a useful first step to investigate how this package can be used.

The preparations were simple: build the package and load it into Tcl. My
Cygwin environment presented a few problems, though, as the makefile
required a file `python-config` to provide the Python include files and
libraries. After some experimenting I used the following command to
build it:

======none
gcc -o libtclpy0.3.so -shared generic/tclpy.c -I/usr/include/tcl8.6 -I/usr/include/python2.7 -ltcl8.6 -lpython2.7 -Wl,--export-all-symbols
======

and introduced a few missing macros in the source code:

======none
#define PACKAGE_VERSION "0.3"
#define PY_LIBFILE "libpython2.7.dll"
======

Then it was a matter of trying a few commands. Like from the
documentation:


======tcl
package require tclpy

py eval {import numpy as np}
py eval {def xx(x,y): return x+y}
puts [py call xx string1 string2]
======

which diligently prints `string1string2`

Defining an array in the Python interpreter is a trifle more demanding:

======tcl
set a [list 4 2 3]
py eval "a = np.array(\[[join $a ,]\])"
======

to convert the data in the list into the right format for Python.

Now, however, the limitation: the ''tclpy'' package currently passes
strings, not variables. This makes it a trifle difficult to manipulate
variables on the Python whose names we pass from the Tcl side. For
instance: to sort the numbers in the array we just created, this does
not work:

======tcl
puts [py call np.sort a]
======

Instead we need to convert the string "a" into the Python variable "a".
Luckily Python does offer that possibility via the `globals()` and
`locals()` function.

So what we can do:

======tcl
py eval {def tolist(array): var = globals().get(array); return var.tolist()}
puts [py call tolist a]
======

This works also with two-dimensional arrays

======tcl
set a {[[4,2,3],[1,2,3]]}
py eval "z = np.array($a)"
puts [py call tolist z]
======

prints:

======none
{4 2 3} {1 2 3}
======

Of course, this is not very efficient - with a large array or matrix you
would need to convert everything to a long string, pass it to
the Python interpreter, do the computations and pass the result back.

But the fact that it is so easy to do these first few steps makes it
possible to think about the next steps: provide more direct transfer
methods to pass large amounts of numerical data to and fro.

**Performance**

[AM] (2 november 2020) Here is a small program to test the performance of the interface. Note: this still uses the Python 2 version - in the meantime Aidan has updated the code to support Python 3, but I have no tested that yet.

======tcl
# perform.tcl --
#     Simple checks on the performance of the interface to Python
#
#     Check: setting up matrices of different sizes
#     Idea:
#     Does it make a difference if you fill the matrix all at once or
#     fill it one element at a time?
#     What about filling it by row or by column?
#     Matrix sizes: 10x10, 30x30 and 100x100
#
#     Clear result:
#     Setting the values in the whole matrix at once is 3x faster
#     than setting the entries one by one
#
#     Further checks:
#     - Getting back the matrix
#     - Solving a linear system (comparing with math::linearalgebra?)
#

proc setMatrix {size} {
    set matrix {}

    for {set r 0} {$r < $size} {incr r} {
        set row {}
        for {set c 0} {$c < $size} {incr c} {
            lappend row [expr {rand()}]
        }
        lappend matrix "\[[join $row ,]\]"
    }
    set pythonMatrix "\[[join $matrix ,]\]"

    py eval "matrix = np.array($pythonMatrix)"
}

proc setByElement {size} {

    for {set r 0} {$r < $size} {incr r} {
        for {set c 0} {$c < $size} {incr c} {
            set rnd [expr {rand()}]
            py eval "matrix\[$r,$c\] = $r"
        }
    }
}

proc getMatrix {size} {
    py call tolist matrix
}

proc getByElement {size} {

    for {set r 0} {$r < $size} {incr r} {
        for {set c 0} {$c < $size} {incr c} {
            py call tostr matrix $r $c
        }
    }
}

proc solveEquations {} {

    py eval "x = np.linalg.solve(matrix,b)"

    set x [py call tolist x]
}

set auto_path [concat . $auto_path]
package require tclpy

py eval {import numpy as np}
py eval {def tolist(array): aa = globals().get(array); return aa.tolist()}
py eval {def tostr(array,r,c): aa = globals().get(array); rr = int(r); cc = int(c); return aa[rr,cc]}


# Test:
#     First set up the matrix on the Python side, then
#     use the interface to fill the entries
#     Use random numbers to make sure no string representation
#     exists.
#
foreach size {10 30 100} counts {100 30 10} {
    set size2 [expr {$size * $size}]
    py eval "matrix = np.zeros($size2).reshape($size,$size)"

    set result1 [time {setMatrix    $size} $counts]
    set result2 [time {setByElement $size} $counts]
    #set result3 [time {setByRow     $size} $counts]
    #set result4 [time {setByColumn  $size} $counts]

    puts "Size: $size"
    puts "    Fill the matrix at once:              $result1"
    puts "    Fill the matrix one entry at a time:  $result2"
    #puts "    Fill the matrix one row at a time:    $result3"
    #puts "    Fill the matrix one column at a time: $result4"
}

#
# Now retrieve the values of a matrix
#
foreach size {10 30 100} counts {100 30 10} {
    set size2 [expr {$size * $size}]
    py eval "matrix = np.random.random(($size,$size))"

    set result1 [time {getMatrix    $size} $counts]
    set result2 [time {getByElement $size} $counts]
    #set result3 [time {setByRow     $size} $counts]
    #set result4 [time {setByColumn  $size} $counts]

    puts "Size: $size"
    puts "    Retrieve the matrix at once:              $result1"
    puts "    Retrieve the matrix one entry at a time:  $result2"
    #puts "    Fill the matrix one row at a time:    $result3"
    #puts "    Fill the matrix one column at a time: $result4"
}

#
# Solve a linear system
#
foreach size {10 30 100} counts {100 30 10} {
    py eval "matrix = np.random.random(($size,$size))"
    py eval "b      = np.random.random($size)"

    set result [time {solveEquations} $counts]

    puts "Size: $size"
    puts "    Solve a linear system: $result"
}

#puts [solveEquations]

======
The output on my system, for easy reference:

======none
Size: 10
    Fill the matrix at once:              508.48 microseconds per iteration
    Fill the matrix one entry at a time:  1149.97 microseconds per iteration
Size: 30
    Fill the matrix at once:              2593.766666666667 microseconds per iteration
    Fill the matrix one entry at a time:  10133.833333333334 microseconds per iteration
Size: 100
    Fill the matrix at once:              33364.6 microseconds per iteration
    Fill the matrix one entry at a time:  110351.6 microseconds per iteration
Size: 10
    Retrieve the matrix at once:              76.88 microseconds per iteration
    Retrieve the matrix one entry at a time:  337.51 microseconds per iteration
Size: 30
    Retrieve the matrix at once:              731.1 microseconds per iteration
    Retrieve the matrix one entry at a time:  3083.6 microseconds per iteration
Size: 100
    Retrieve the matrix at once:              7080.7 microseconds per iteration
    Retrieve the matrix one entry at a time:  35484.4 microseconds per iteration
Size: 10
    Solve a linear system: 81.17 microseconds per iteration
Size: 30
    Solve a linear system: 87.33333333333333 microseconds per iteration
Size: 100
    Solve a linear system: 396.8 microseconds per iteration 
======

**An expanded example: a simple partial differential equation**

[AM] (4 november 2020) Here is a further experiment - solving a simple reaction-diffusion equation. The transport (diffusion) part is done on the Python side, but the reaction part is done via Tcl. This necessitates the transfer to and from Python of the whole solution, but it is the most variable part of such problems. So, a Tcl implementation is preferred.

''Note:'' 

During the development of the code below I was confronted with mysterious error messages from the Python parser/interpreter.
These problems quite likely originated from my Tcl-based intuition on how the commands are interpreted and my poor 
understanding of Python. Still, the ability to express the finite difference formula for the transport step in such a 
compact form is encouraging.
 
======tcl
# chkpde.tcl --
#     Use NumPy and libtclpy to solve a simple PDE
#
#     The PDE in question is a reaction-diffusion equation defined
#     on a square grid. The equation:
#
#     dC/dt = D nabla C - k C + 1
#
#     The discretisation is straightforward:
#
#     Cij(new) = Cij + Dt * (D/Dx^2 (Ci-1,j + Ci+1,j + Ci,j-1 + Ci,j+1 - 4 Cij) - k Cij + 1)
#
#     Defined on a square grid and the boundary conditions on the four sides are
#     simply: C = 0.
#
#     D  =  1.0e-3     m2/s
#     Dx =  0.1/size   m
#     Dt =  0.1/size   s
#     k  =  0.1        /s
#     T  = 10          s
#
#     With the experience on the performance in mind, we send full matrices back and forth
#     between Tcl and Python
#

proc setMatrix {name matrix} {
    set content {}

    foreach row $matrix {
        lappend content "\[[join $row ,]\]"
    }
    set pythonMatrix "\[[join $content ,]\]"

    py eval "$name\[:,:\] = $pythonMatrix"
}

proc setByElement {size} {

    for {set r 0} {$r < $size} {incr r} {
        for {set c 0} {$c < $size} {incr c} {
            set rnd [expr {rand()}]
            py eval "matrix\[$r,$c\] = $r"
        }
    }
}

proc getMatrix {name} {
    py call tolist $name
}

proc calculateReaction {conc} {
    global decay

    set reaction {}
    foreach row $conc {
        set newrow {}
        foreach col $row {
            lappend newrow [expr {-$decay * $col + 1.0}]
        }
        lappend reaction $newrow
    }

    return $reaction
}

#
# Load the package
#
set auto_path [concat . $auto_path]
package require tclpy

py eval {import numpy as np}
py eval {def tolist(array): aa = globals().get(array); return aa.tolist()}
py eval {def tostr(array,r,c): aa = globals().get(array); rr = int(r); cc = int(c); return aa[rr,cc]}

#
# Try different grid sizes
#

set T 10.0

foreach size {5 10 30 100} {
    puts "Size: $size"

    set size2 [expr {$size * $size}]

    set dt    [expr {1.0 / $size}]
    set dx    [expr {1.0 / $size}]
    set diff  1.0e-3
    set decay 0.1

    #
    # Set the parameters
    #
    py eval "dt    = $dt"
    py eval "dx    = $dx"
    py eval "diff  = $diff"
    py eval "decay = $decay"

    set rcentre [expr { $size / 2 }]
    set ccentre [expr { $size / 2 }]

    #
    # Initial condition
    #
    py eval "matrix   = np.zeros($size2).reshape($size,$size)"
    py eval "dmatrix  = np.zeros($size2).reshape($size,$size)"
    py eval "reaction = np.zeros($size2).reshape($size,$size)"

    #
    # Loop over time
    #
    set t 0.0
    while { $t < $T } {

        #
        # Set a time step
        #
        set reaction [calculateReaction [getMatrix "matrix"]]
        setMatrix "reaction" $reaction

        #py eval {dmatrix[1:-1,1:-1] = diff / (dx*dx) * (matrix[0:-2,1:-1] + matrix[2:,1:-1] + matrix[1:-1,0:-2] + matrix[1:-1,2:] - 4.0 * matrix[1:-1,1:-1])}
        #py eval {dmatrix[1:-1,1:-1] = dmatrix[1:-1,1:-1] + reaction[1:-1,1:-1]}
        py eval {dmatrix[1:-1,1:-1] = diff / (dx*dx) * ( matrix[0:-2,1:-1] + matrix[2:,1:-1] + matrix[1:-1,0:-2] + matrix[1:-1,2:] - 4.0 * matrix[1:-1,1:-1] ) + reaction[1:-1,1:-1]}
        py eval {matrix[1:-1,1:-1] = matrix[1:-1,1:-1] + dt * dmatrix[1:-1,1:-1]}
        set t [expr {$t + $dt}]

        #
        # Examine the midpoint
        #
        set centre [py call tostr matrix $rcentre $ccentre]
        if { abs($t - int($t+0.5)) < 0.5 * $dt } {
            puts "$t\t$centre"
        }
        #puts $t
    }
}
======

And the output from the above:

======none
Size: 5
1.0        0.960599948
1.9999999999999998        1.8271874881419155
3.0000000000000004        2.6071575895203023
4.000000000000001        3.30769748349534
5.000000000000002        3.935698818258164
6.000000000000003        4.497697658329985
7.0000000000000036        4.999836700287916
8.000000000000004        5.447844946834249
9.0        5.847030867503363
9.999999999999996        6.202285768638312
Size: 10
0.9999999999999999        0.9561787941780185
2.0000000000000004        1.8209076395834303
3.0000000000000013        2.602822500607819
4.000000000000002        3.3096203144671
4.999999999999998        3.9481744379070376
5.999999999999995        4.524655800808933
6.999999999999991        5.044640537773083
7.999999999999988        5.513198185316464
8.999999999999984        5.934960816904726
9.99999999999998        6.31417603667851
Size: 30
0.9999999999999999        0.9531371154286179
2.0000000000000027        1.8154271946171865
2.999999999999999        2.5955291739697137
3.9999999999999956        3.301275936820813
4.999999999999992        3.9397486027667545
5.9999999999999885        4.51733657940186
6.999999999999985        5.039788170428689
7.999999999999981        5.512261607539829
8.999999999999979        5.939382226970672
9.999999999999975        6.325304268761792
Size: 100
1.0000000000000007        0.9520785288629056
2.0000000000000013        1.8135117052136498
2.99999999999998        2.5929296783229674
3.9999999999999587        3.2981409141647706
4.999999999999938        3.9362097981062054
5.9999999999999165        4.513523291350504
6.999999999999895        5.035844462033559
7.999999999999874        5.5083581162268365
8.999999999999853        5.93571804353677
9.999999999999831        6.322100126712278 
======



----
'''[arjen] - 2020-11-06 08:04:42'''

Some additional information: Aidan has updated the code for Python 3 and Paul Obermeier has added support for CMake so that the library can be built for all manner of environments (notably the ones on Windows: Windows itself, Cygwin, MinGW)

'''[PO] - 2021-01-08'''

The tclpy package is now part of the [BAWT] 1.3.0 framework.

<<categories>>Category Numerical analysis