Version 27 of Ideas for a numerical analysis package

Arjen Markus This page is meant for collecting ideas on how to deal with numerical analysis in Tcl.

The rationale

There are quite a few attempts to implement numerical analysis methods in Tcl, but so far there is no framework (conceptual or otherwise) that you can readily use. Everybody tries to do it in his or her own way.

If we look at our Perl and Python colleagues, they have PDL (Perl Data Language) and Numpy [L1 ] (or Numarray it seems to be called nowadays). There is no equivalent to my knowledge in Tcl, though there are quite extensive packages like la and NAP that might classify as such.

The basic problem

Numerical methods often deal with collections of data:

Arrays (in the C or Fortran sense) of numbers
Vectors in an N-dimensional space
Matrices (square or rectangular)
Higher-dimensional structures (but these tend to be used mainly in specialised areas)

One could think of nested lists (see Playing APL for an example) to represent them, but the la package uses plain lists for good reasons:

More efficient
The possibility to represent row and column vectors, important for a linear algebra package

Standard libraries exist in both C and Fortran for many problems (think of: Lapack and FFT libraries for instance). We can access these via small wrappers, generated via SWIG or Critcl and in fact a few such wrappers already exist.

The solution?

We should decide what the best methods are for dealing with numerical data and create an easy to use framework out of this. The design issues are:

Comfortable use from within Tcl
Acceptable performance, even with large sets of data
Easy to pass to and from binary extensions

My first guess is that we need a hybrid solution:

For small sets of data, a nested list may be the easiest way
For large sets of data, the LA approach or even an approach with binary strings (these are opaque to the scripting side but easy to pass to binary extensions) can be used

Lars H: I'd suggest using a flat list to hold the numerical data, with a separate "shape specification" that tells commands (that bother about the shape of data) how to treat it. I recall Fortran has some operation which e.g. allows you to say that the thing declared as a 6x6 matrix should be treated as a 36 element vector, or even 4x9 matrix. Saying "it's all basically vectors, but with a shape specification" makes that kind of thing easy.

As for how to handle the Tcl <-> binary conversions, I'd suggest making the string representation of the thing like a vector of numbers, or like a list containing such a thing. E.g. a 2x2 identity matrix might be

  {2 2} {1.0 0.0 0.0 1.0}

where the {2 2} part specifies the shape and the rest is the data. Note that this wouldn't have to be a list-of-lists-of-numbers as Tcl_Objs, but could be a single object of some new type whose string representation just happens to be possible to parse as lists. (In practice one shouldn't apply list operations on it, because that would discard the internal representation as "numerical array", but being able to do this anyway can be very useful when debugging.)

AM Your solution comes very close to what several packages are doing (I wanted to keep the discussion open by not proposing a "definite" solution :) But yes, it is the sort of solution you can find a lot.

DKF: Actually, for an N-dimensional matrix, your printed representation only needs to state N-1 dimensions, since you have the overall number of items "for free".

AM In Clustering data I use an approach with a list of lists, where each sublist represents the "coordinates" of the data point. As the algorithm only needs the data point by point and does not do anything "across" points, this works very well.

disneylogic I would not look to APL [L2 ] as a model for doing numerical analysis, or even J [L3 ], its successor. There are aspects of it which fall short and are, frankly, rather old compared to the state of knowledge of numerical methods, particularly numerical linear algebra. Instead, MATLAB [L4 ] should be the model. It is very well done, although the scope of the language is, in my opinion, now overextended.

One very nice thing about Tcl/Tk is that you have what is basically a simple language to which you can add packages of arbitrary complexity, but you don't have to have them there.

The corresponding book references are G.H.Golub, C.F.Van Loan, MATRIX COMPUTATIONS, ISBN 0-818-3010-9 [L5 ], J.Dongarra, J.R.Bunch, C.B.Moler, G.W.Stewart, LINPACK USERS GUIDE, ISBN 089871172X [L6 ], and J.E.Dennis, Jr, R.B.Schnabel, NUMERICAL METHODS FOR UNCONSTRAINED OPTIMIZATION AND NONLINEAR EQUATIONS, ISBN 0-13-627216-9 [L7 ].

Although it's not well known, Didier Besset, OBJECT-ORIENTED IMPLEMENTATION OF NUMERICAL METHODS: AN INTRODUCTION WITH SMALLTALK & JAVA, ISBN 1558606793 [L8 ] is really a nice layout of this kind of project in the context of specific languages.

Also, I would start out small.

The major topics are:

preliminaries of machine precision, roundoff, and error and their effects on things like convergence
how to represent complex numbers and shut off the representation when you don't want them
interpolation: you can never have too many kinds available
Householder transforms
polynomials and polynomial fits
least squares
numerical integration: not as hard as it seems
basic linear algebra fashioned in a manner which does not rely on determinants
solving various systems of linear equations having specific structures
numerical differentiation, a delicate subject
various powerful but essential operators for linear systems, like the Singular Value Decomposition, QR decomposition
eigenvalues in the general case

So, one part could be "solved" at a time and, then, as pieces are provided, they could be cobbled together into a larger structure. Tclers are good at cobbling!

It's important, in my opinion, to keep an eye on the eventual goal and also not to be too seduced by APLish syntactic sugar. Sure, you want a good notation for, say, matrices but lists of lists is already there and is similar to MATLAB's notation, apart from MATLAB's use of commas. Indeed, I would argue such a package should be able to import and export datasets of matrices in MATLAB's notation.

I would go on and list specialized functions and polynomial systems, linear differential equations and higher order equations, but the project given above is quite large. If available, it also would go a huge way towards providing the Tcl/Tk world with a very competitive numerical capability.

I have begun to make a living again using numerical methods, am wedded to Tcl/Tk, and so am interested in this project.

AM (30 march 2004) With respect to the topics above:

preliminaries of machine precision, roundoff, and error and their effects on things like convergence

The math::fuzzy package in Tcllib may be of help, as well as Running error analysis

how to represent complex numbers and shut off the representation when you don't want them

Several pages on the Wiki work with complex numbers or rational numbers
My "mathematical workbench" (see starkit archive: tclmath) uses a tagged list to represent them (and in fact implements a whole infrastructure to deal with things other than plain numbers)
Martin Russell has a package that deals with complex numbers and quaternions

interpolation: you can never have too many kinds available

I have little to add to that - except that I am interested in that sort of things myself

Householder transforms

The la package?

polynomials and polynomial fits

My page on Discrete Fourier Transforms is a start (this morning I played a bit with "discrete" orthogonal polynomials :)

least squares

Again the la package

numerical integration: not as hard as it seems

See math::calculus in Tcllib
This includes: definite integrals, solving ordinary differential equations

basic linear algebra fashioned in a manner which does not rely on determinants, solving various systems of linear equations having specific structures

Both: la

numerical differentiation, a delicate subject

Not as difficult as many people make you believe - at least from an engineer's point of view :)

various powerful but essential operators for linear systems, like the Singular Value Decomposition, QR decomposition, eigenvalues in the general case

The la package again ...

Other stuff that may be mentioned:

The math::combinatorics package for various functions that pop up in combinatorial and statistical problems.
Numerical array operations
Experiments with matrix operations deals with various representations of matrices

TV Well, er, what's the actual point of this page? A collection of linear algebra or (discrete,finite) matrix definable algorithms coded in Tcl or as an extension? Or a summary of what already exists in that field (Several pages and extension each witch their raison d'etre are there already).

Or a short overview or summary of the field?

A good 'n decent set of tcl coded NA routines for the not so expert to smell the field and play around with?

Numerical analysis for me encompasses about half my first and second, and in fact significant portions of subsequent years of my EE study, starting with two semesters of general linear algebra, a percentage of 4 semesters of general math (analysis) containing everthing from Tailor approximation through integral approximation and numerical Navier Stokes, followed by complex matrix solutions (mna method) for general electrical networks and practica with supercomputer programming to numerically solve network equitions, and standard existing electronics programs for very advanced (and long existing, just like the LA package) circuit analysis.

Followed by a host of research like topics in new or specialized fields, and subsequently a wealth of solutions in nuclear physics, which is almost from the start a century ago a matter of measurements and subsequent numerical analysis, they got about the biggest research centre in europe (CERN) doing such fundamenta analysis with 10,000 scientists on a daily basis!

What is your proposal proposing in all this?

I like to have available a good set of general purpose routines I don't need to rewrite and test when I want to do some analyis, and I don't mind writing some of my own, and find it important enough a subject to think about including external programs and some of my other programs into a good comglomerate to work with.

The danger exists to become like my former professor and project leader were in this field, where the GUT (Grand Unifying Theory) of being an important person in life is to acquire as much as possible theory about matrices and things attached, like the methods to compute with them, and sort of claim that everything is a matrix, and that therefore with top knowledge in that field, one can compute everything, and be supremr per definition.

That's an error, because the field of NA is huge, and more involved with integral and differential equations and incredible amounts of other mathematical theory, to the point that important scientific progress usually isn't depending on knowledge of things in the LA package, primarily.

Though such computations, and the way they are done in those craftily made packages, without question remain relevant.

PDR Having done some very simple analysis on moderatly large datasets, it is clear for me that lists of numbers won't cut it. I had to use a binary representation to keep memory use at reasonable levels. I would say we need new basic Tcl object types for the basic elements (vectors, matrices) that use an efficient internal representation. Preferably this representation would be compatible with desired external libraries. The string representation could be something list like, but this would of course be very dangerous to cause shimmering. As an ideal we would also have some functionality (Feather?) that prevents shimmering: interfaces so that the object can present a list (or array) interface without changing its internal representation, and some object flag that prevents the internal representation from being changed inadvertantly (error). These basic object types could form the basis for unified and efficient numerical analysis tools.

Lars H: Something like Feather to prevent shimmering is overkill in this respect (and probably a bit unTclish, denying programmers the excellent debugging asset of having complete string representations of every value). What one needs is just native commands for accessing "numerical analysis Tcl_Objs". In parallel of lists and dictionaries one might have:

na size obj: Total number of elements, cf. llength or array size.
na shape obj: Shape list, e.g. "3 3" for a 3x3 matrix. "10" for a 10 element vector.
na index obj index1 index2 ...: The value at position (index1, index2, ...) of obj. Cf. lindex and dict get.
na replace obj index1 index2 ... value: Return new object which is like obj except the value at position (index1, index2, ...) has been changed to value. Cf. lreplace and dict replace.
na set objvar index1 index2 ... value: Change the object stored in objvar by setting the value at position (index1, index2, ...) to value. Cf. lset and dict set.
na for index-var-list obj body: Cf. foreach and dict for.

And so on. (These are just some elementary manipulation commands; typically one would use higher level things most of the time.) na probably isn't the best name for this ensemble of commands.

PDR Having "numerical analysis Tcl_Objs" with their specific commands is exactly what I mean. The "ideal" functionality described would be the icing on the cake. It would not prevent having a string representation. The idea is to prevent turning the efficient internal representation of your vector (or whatever) e.g. into a list by using the lrange command on it. Prefereably the vector object would support a list interface, so that lrange would just provide the desired result without changing the internals. If you would apply a command that needs to convert the object to something for which it does not support an interface, an error could be given.

AM Two things crossed my mind when I read Peter's contribution above:

If I talk of a moderately large datasets, I usually think of one hundred to one thousand numbers. That may not be Peter's frame of reference though.
The problem with binary arrays is that though they are easy to create and pass on to C or Fortran routines, updating such arrays at the script level is cumbersome: you need to re-create them from scratch.

The first: Peter, could you tell me what order of magnitude you consider moderately large?

The second: a suitable modification/extension to the binary command could help us here, I guess

PDR In this case, moderatly large was 4 times about 10k datapoints per sample, 96 samples per run. (A lot of people in data analysis would call this small data sets). I actually used the binary command to encode and decode the data when needed, but this is not performant, nor easy. For small datasets, list would be enough, but if we want a generally usable system, the bigger data sets should be accounted for as well. Anyway, the general system would be better for the smaller sets as wel.

AM This very strongly brings Critcl to my mind ... either the C or the Fortran variant (or both)

AM Here is an experiment with Slicing arrays

AM Some references that appeared on the French c.l.t (thanks to Gerard Sookahet):

"Au passage, puisqu'il est question de package numérique. J'en profite pour signaler qu'il existe au moins deux bindings Tcl pour GSL (Gnu Scientific Library - http://sources.redhat.com/gsl/ ).

Le premier est en cours de développement par Marco Maggi:

  http://web.tiscali.it/marcomaggi/software/index.html

Le second est inclus dans un logiciel d'Astronomie (Audela) qui utilise extensivement Tcl:

  http://software.audela.free.fr/exten2.htm

[ Category Numerical Analysis ]