Version 2 of Manipulating blocks of data

Updated 2003-04-17 06:47:33

Arjen Markus (15 april 2003) Inspired by a remark in the Tcl'ers chatroom I started thinking about a smallish extension that can handle blocks of data or gridded data or whatever you want to call it. Such data arise in geographical information systems as the spatial distribution of, say, population density or terrain level. They arise in meteorology as the result of computer models and they can be found in the field of partial differential equations, which is my professional field of interest. Another application is digital image processing.

So how can we represent such blocks of data? What operations do we need?

To answer the first question: I think we need an extension in C or Fortran for this. At the script level we manipulate these blocks of data via handles, much like files. The actual storage is handled by the system programming language of choice (or by Tcl, if we use binary arrays as an opaque data type). We may need some special arrangements to prevent memory leaks, though.

With regard to the second question: I am biased towards differential equations, so the list of possible operations that I come up with may not be complete from your point of view, but I would say (for esthetic reasons, the words look "nicer", I use the suffix mat to indicate this type of variables):

  • initmat: set the sizes of the matrix variables - a global operation and perhaps arrange for variables like the X and Y coordinates
  • setmat: copy values into a new or existing matrix variable (the second argument is a matrix variable or a scalar)
  • addmat, subtractmat, multiplymat, dividemat: elementwise arithmetic operations on two matrix variables or a matrix

and a scalar

  • maxmat, minmat: elementwise maximum and minimum
  • backwardDiffX, forwardDiffX, centralDiffX: first-order difference in X direction (various flavours). Note that these are not estimates of the derivatives (to keep the operation general)
  • backwardDiffY, forwardDiffY, centralDiffY: ditto in Y direction
  • secondDiffX, secondDiffY: second-order difference in X and Y direction
  • exprmat: evaluate an expression in matrix variables
  • setelem: set a particular element to a new value
  • getelem: get the value of a particular element
  • matToImage: create a gray-scale image for visualisation
  • shift operations?
  • overall properties, such the maximum over the whole matrix

What is also required, is a way to handle the boundary conditions, any difference operation will need to deal with this:

  • The value on the boundary is a prescribed value
  • The value on the boundary is the same as the value just inside
  • The value on the boundary differs by a known amount from that just inside

As a simple example, the mean slope of a terrain could be determined like this:

   setmat slopeX [backwardDiffX $level]
   setmat slopeY [backwardDiffY $level]
   set meanSlopeX [overallMean $slopeX]
   set meanSlopeY [overallMean $slopeY]

Or, a differential equation like the diffusion equation in two dimensions (forgive me the clumsy notation):

   dC           d2C        d2C
   --    =    D ---   +  D ---
   dt           dx2        dy2

The derivative in time of the concentration C can be calculated as:

   setmat derivc [scalemat \
      [addmat [secondDiffX $c] [secondDiffY $c]] $factor]

(where for simplicity, the grid sizes in X and Y direction are taken the same, and the variable factor absorbes all the scalar parameters involved).

Solving this equation for the stationary solution:

   ... Set up the computational grid

   # Initial condition
   setmat c 0.0

   ... Define boundary conditions

   # Continue until convergence is achieved
   # (checked in the proc "convergence?")
   while { ! $converged } {
      setmat derivc    [scalemat ...]
      setmat cnew      [addmat $c $derivc]
      set    converged [convergence? $c $cnew]
      setmat c         $cnew

Implementation notes

Of course, the extension needs to deal with memory management: if a temporary Tcl variable goes out of scope, because the proc returns, then the memory will get lost, unless we can cleverly reclaim the memory or some other means of accessing the variable's contents still exists.

I can see a number of reasonable ways for implementing this:

Via a compiled extension:

  • Use a specific extension written in C or Fortran (my personal idea is to do it in Fortran, as this has excellent array operation facilities, making the implementation so much more elegant).
  • Use Harvey Davies' NAP extension. I do not know at the moment if this extension has all the facilities we need for this, but it might be a nice application. (From the documentation, I think that differences are going to be a difficulty)
  • By letting Tcl deal with it as much as it can - store the actual matrix in a binary array and let the extension do the calculations, not the storage.

Via a pure Tcl solution (thanks to Andreas Kupries and Kevin Kenny):

  • Use the storage facilities of struct::matrix in Tcllib
  • Use Ed Hume's LA package

Both provide the basic storage facilities we would need for this.

[ Category Mathematics ]