Manipulating blocks of data

Arjen Markus (15 april 2003) Inspired by a remark in the Tcl'ers chatroom I started thinking about a smallish extension that can handle blocks of data or gridded data or whatever you want to call it. Such data arise in geographical information systems as the spatial distribution of, say, population density or terrain level. They arise in meteorology as the result of computer models and they can be found in the field of partial differential equations, which is my professional field of interest. Another application is digital image processing.

(I want to stress that I am not talking about matrices and linear algebra, though both enter the picture. I want to manipulate data in a uniform way, without having to explicitly deal with for-loops and the like.)

So how can we represent such blocks of data? What operations do we need?

To answer the first question: I think we need an extension in C or Fortran for this. At the script level we manipulate these blocks of data via handles, much like files. The actual storage is handled by the system programming language of choice (or by Tcl, if we use binary arrays as an opaque data type). We may need some special arrangements to prevent memory leaks, though.

With regard to the second question: I am biased towards differential equations, so the list of possible operations that I come up with may not be complete from your point of view, but I would say (for esthetic reasons, the words look "nicer", I use the suffix mat to indicate this type of variables):

initmat: set the sizes of the matrix variables - a global operation and perhaps arrange for variables like the X and Y coordinates
setmat: copy values into a new or existing matrix variable (the second argument is a matrix variable or a scalar)
addmat, subtractmat, multiplymat, dividemat: elementwise arithmetic operations on two matrix variables or a matrix and a scalar
maxmat, minmat: elementwise maximum and minimum
backwardDiffX, forwardDiffX, centralDiffX: first-order difference in X direction (various flavours). Note that these are not estimates of the derivatives (to keep the operation general)
backwardDiffY, forwardDiffY, centralDiffY: ditto in Y direction
secondDiffX, secondDiffY: second-order difference in X and Y direction
exprmat: evaluate an expression in matrix variables
setelem: set a particular element to a new value
getelem: get the value of a particular element
matToImage: create a gray-scale image for visualisation
shift operations?
overall properties, such the maximum over the whole matrix

What is also required, is a way to handle the boundary conditions, any difference operation will need to deal with this:

The value on the boundary is a prescribed value
The value on the boundary is the same as the value just inside
The value on the boundary differs by a known amount from that just inside

As a simple example, the mean slope of a terrain could be determined like this:

   setmat slopeX [backwardDiffX $level]
   setmat slopeY [backwardDiffY $level]
   set meanSlopeX [overallMean $slopeX]
   set meanSlopeY [overallMean $slopeY]

Or, a differential equation like the diffusion equation in two dimensions (forgive me the clumsy notation):

   dC           d2C        d2C
   --    =    D ---   +  D ---
   dt           dx2        dy2

The derivative in time of the concentration C can be calculated as:

   setmat derivc [scalemat \
      [addmat [secondDiffX $c] [secondDiffY $c]] $factor]

(where for simplicity, the grid sizes in X and Y direction are taken the same, and the variable factor absorbes all the scalar parameters involved).

Solving this equation for the stationary solution:

   ... Set up the computational grid

   #
   # Initial condition
   #
   setmat c 0.0

   ... Define boundary conditions

   #
   # Continue until convergence is achieved
   # (checked in the proc "convergence?")
   #
   while { ! $converged } {
      setmat derivc    [scalemat ...]
      setmat cnew      [addmat $c $derivc]
      set    converged [convergence? $c $cnew]
      setmat c         $cnew
   }

Implementation notes

Of course, the extension needs to deal with memory management: if a temporary Tcl variable goes out of scope, because the proc returns, then the memory will get lost, unless we can cleverly reclaim the memory or some other means of accessing the variable's contents still exists.

I can see a number of reasonable ways for implementing this:

Via a compiled extension:

Use a specific extension written in C or Fortran (my personal idea is to do it in Fortran, as this has excellent array operation facilities, making the implementation so much more elegant).
Use Harvey Davies' NAP extension. I do not know at the moment if this extension has all the facilities we need for this, but it might be a nice application. (From the documentation, I think that differences are going to be a difficulty)
By letting Tcl deal with it as much as it can - store the actual matrix in a binary array and let the extension do the calculations, not the storage.

Via a pure Tcl solution (thanks to Andreas Kupries and Kevin Kenny):

Use the storage facilities of struct::matrix in Tcllib.
Use Ed Hume's LA package.

Both provide the basic storage facilities we would need for this.

And, why not, solutions abound - Vkit (see remark below) should not be left out.

TV Just a humorously tainted side remark: in serious profi software land I would have known of no one, really, no one who would even even consider using Fortran in favour of a more modern language except for compatibility reasons....

Why not use C? Maybe you'd want to try finding the sources for Khoros, which should be somewhere on the web for Unix for sure (for free, was a project at a Mexico university years ago); it even does nice enough graphical block diagrams to combine lots of field filters. Then again, never hurts to take a fresh angle, and tcl is nicer and in ways nicer than unix shells.

AM Why indeed? Consider this:

    real, dimension(:,:) :: matrix1, matrix2, result

    result = matrix1 + matrix    ! Assuming shape compatibility, but let the compiler
                                         ! make sure via runtime array bounds checking

versus:

    float *matrix1, *matrix2, *result ;
    int i,j,k,n1,n2 ;

    k = 0 ; 
    for ( j = 0 ; j < n2 ; j ++ ) {
        for ( i = 0 ; i < n1 ; i ++ ) {
            result[k] = matrix1[k] + matrix2[k] ; /* Trusting the programmer to supply 
                                                             sufficient large arrays */
            k ++ ; 
        }
    }

Arguments as to "Fortran is not a modern language, whereas C is" (or C++ or Java or C# or ....) remind me of similar statements regarding Tcl :D.

Seen Vkit is a vector engine?

escargo 17 Apr 2003 - See Playing APL for a look at matrix operations in Tcl.

Other matrix operations that I could imagine wanting are:

Matrix inverse
Identity matrices of size set by a parameter
Solutions to simultaneous equations

Category Mathematics