Version 5 of Ideas for a numerical analysis package

Updated 2004-03-22 16:31:27

Arjen Markus This page is meant for collecting ideas on how to deal with numerical analysis in Tcl.

The rationale

There are quite a few attempts to implement numerical analysis methods in Tcl, but so far there is no framework (conceptual or otherwise) that you can readily use. Everybody tries to do it in his or her own way.

If we look at our Perl and Python colleagues, they have PDL (Perl Data Language) and Numpy (or Numarray it seems to be called nowadays). There is no equivalent to my knowledge in Tcl, though there are quite extensive packages like la and NAP that might classify as such.

The basic problem

Numerical methods often deal with collections of data:

  • Arrays (in the C or Fortran sense) of numbers
  • Vectors in an N-dimensional space
  • Matrices (square or rectangular)
  • Higher-dimensional structures (but these tend to be used mainly in specialised areas)

One could think of nested lists (see Playing APL for an example) to represent them, but the la package uses plain lists for good reasons:

  • More efficient
  • The possibility to represent row and column vectors, important for a linear algebra package

Standard libraries exist in both C and Fortran for many problems (think of: Lapack and FFT libraries for instance). We can access these via small wrappers, generated via SWIG or Critcl and in fact a few such wrappers already exist.

The solution?

We should decide what the best methods are for dealing with numerical data and create an easy to use framework out of this. The design issues are:

  • Comfortable use from within Tcl
  • Acceptable performance, even with large sets of data
  • Easy to pass to and from binary extensions

My first guess is that we need a hybrid solution:

  • For small sets of data, a nested list may be the easiest way
  • For large sets of data, the LA approach or even an approach with binary strings (these are opaque to the scripting side but easy to pass to binary extensions) can be used

Lars H: I'd suggest using a flat list to hold the numerical data, with a separate "shape specification" that tells commands (that bother about the shape of data) how to treat it. I recall Fortran has some operation which e.g. allows you to say that the thing declared as a 6x6 matrix should be treated as a 36 element vector, or even 4x9 matrix. Saying "it's all basically vectors, but with a shape specification" makes that kind of thing easy.

As for how to handle the Tcl <-> binary conversions, I'd suggest making the string representation of the thing like a vector of numbers, or like a list containing such a thing. E.g. a 2x2 identity matrix might be

  {2 2} {1.0 0.0 0.0 1.0}

where the {2 2} part specifies the shape and the rest is the data. Note that this wouldn't have to be a list-of-lists-of-numbers as Tcl_Objs, but could be a single object of some new type whose string representation just happens to be possible to parse as lists. (In practice one shouldn't apply list operations on it, because that would discard the internal representation as "numerical array", but being able to do this anyway can be very useful when debugging.)

AM Your solution comes very close to what several packages are doing (I wanted to keep the discussion open by not proposing a "definite" solution :) But yes, it is the sort of solution you can find a lot.

DKF: Actually, for an N-dimensional matrix, your printed representation only needs to state N-1 dimensions, since you have the overall number of items "for free".


AM In Clustering data I use an approach with a list of lists, where each sublist represents the "coordinates" of the data point. As the algorithm only needs the data point by point and does not do anything "across" points, this works very well.


[ Category Numerical Analysis ]