Advanced Optimisation - TestCase 3d polyhedra

PWQ 26 Aug 05

22 Sept 05, download location changed.

INTRODUCTION

I was recently translating some C code to Tcl to test out some tclogl demos.

In the process I was changing the some of the constructs to 'Tclisims', but it occurred to me that as programmers we sometimes get stuck in the paradigm of languages that we grew up with (such as C) and do not make the shift to the dynamics of Tcl when writing code.

I was fixing the code for 3D polyhedra with simple tk canvas making hidden-line removal work when I decided to try and apply the dynamics of Tcl to this code to make it faster.

I have tried to leave as much of the original code untouched so that you can get real time comparisons of the speed increases, but some minor tweaks were necessary to support the two modes concurrently.

A starkit or starpack of this code can be found at (moved) http://qs.co.nz/Download.html . This is part of my as yet unreleased 3D showcase at: http://qs.co.nz/Tcl/3D/

INITIAL CONVERSION

In looking at the code we can see some undesirable constructs. These are addressed outside the main optimisation effort as they are just wrong.

  1. return [list ....]

This is quite common, and I must say that I still find myself doing this from time to time. Unlike C, procs return the value of the last command executed. So when a command such as list or expr is the last one in the proc we do not need to use (and should not use) return. I advocate using set x rather than return $x just to reinforce the practice even though there is no longer any performance penalty for the latter.

  1. select ... case x : do x()

We sometimes get lost in the woods with the select statement because in C you need to use numeric enums in place of text strings. In the ReadData proc we simply default to sourcing the file with the same name as the switch statement test case.

  1. expr { ... [expr ...]}

There is generally never any need to use expr inside itself. While we find that Tcl's readability of infix notation is less than we would like, we have to trust that once we are inside expr we can cut loose with infix notation as we would with any other language.

  1. for vs foreach

When we have a fixed number of iterations, then creating a list of loop items is always quicker then using for. When we are in a tight loop, we need to think 'foreach' rather than 'for'.

OPTIMISATIONS

We now turn to the code and look at how we can modify the existing code to increase performance.

Caching

Unlike some other languages there is nothing special about the expr command. As such there is no such thing as common subexpression elimination, partial evaluation and the like.

We should be kind to the language and provide assistance where necessary.

E.g.

     set x [expr {[lindex $M 1 2] * 45}]
     set y [expr {[lindex $M 1 2] * 63}]

Could become:

     set p [lindex $M 1 2]
     set x [expr {$p * 45}]
     set y [expr {$p * 63}]

My preference:

     set x [expr {[set p [lindex $M 1 2]] * 45}]
     set y [expr {$p * 63}]

While lindex may not be much of an overhead, one only has to change the above to use [averyslowproc ...] and the benefit becomes self-evident.

The latter example is the way specialisation would occur without affecting the execution of the code.

Memoizing is such a simple technique that any proc that is in fact functional that may be called numerous times during the life of an application should be memoized by default.

We have seen that we can auto memoize a proc by including the command 'MEMOIZE as the first line of a proc. We could also simply add the line:

        memoize MyProc

at the end of the source file.

Colour lookup cache

While the natural string-based approach to colour generation is to use "#$r$g$b" there is no need to do this every frame when the colours are a fixed selection. We precompute the colors and then select the appropriate one from a list. This also allows Tk to convert the object into a colour type for faster processing.

It would be nice if Tcl would support a numeric specification for colours such as a three number list, so that we don't have to resort to string concatination all the time.

Flatten lists.

While the new syntax of Tcl allows for accessing nested lists from lindex like we can access arrays in C, this is not quick when we must perform multiple lindex commands to access data.

While the notation of a list of lists models the way we view the data such as vertex arrays, we find that we are repeatedly converting the sublist back into its separate components (e.g., x y z) to process the data. This means constructs like:

        foreach v $lvtx {
                foreach {x y z} $v {break}
                ...

With a flattened list we can use one foreach with:

        foreach {x y z} $flat_lvtx {
                ...

Foreach extension

Now the lists are flattened we can place the list scanning into one foreach statement.

Most Tcl programmers are familiar with scanning a single value with foreach, but the command is much more flexible than that.

A standard loop may look something like:

        foreach v $vertex_list {
                foreach {x y z} $v {break}
                foreach f $face {
                ...

We can now merge all the above into one giant loop:

    foreach {x y z} $::flat_lnv kv $::flat_lvtx p $lpoly 
        ...

We still need to process each vertex with a separate loop with:

        foreach {x y z} $kv {
                ...

This is because we have a variable length of vertices and a fixed length for normals and faces.

Specialisation

In only 50 lines of raw code we specialise three produres each frame. For objects with a small number of faces this reduces the performance due to the time taken to byte code the new procs.

Note that in this demonstration we create a new proc from the existing one. This is solely for the purpose of being able to run with the optimisations turned on or off for comparison. This has meant some minor changes to some procs to select the appropriate proc to call. There is nothing in this technique that would stop the changes from being applied to the original procs. This is the desired goal so that we do not need to adopt special coding conventions to allow specialisation to be effective.

I have also inserted special comment markers in the code to make it trivial for the specialisation procs to locate the correct code fragment to extract. Again this not necesarry for this technique per se, and we could have appled analysis to the code to make the necessary changes needed for specialisation.

In this application we specialise the Projection and VectorMatrix code. As a next level we specialise the Rendering code to inline the vertex projection code as well as take out the Shading code when rendering as wire frame.

Cull then transfrom vertices

Since we are going to cull the back facing polygons, there is not much point in transforming the vertices for these polygons. We compute the face normal z component and remove the poly by moving to a place off screen. For visible faces we then continue with the x and y projection.

Since vertices are shared amoung faces we now need to know which ones we have transformed so that we don't do it twice.

For this we use an associative array to provide memoisation of the vertex transformation. We could also have used a list but the multiple lindex commands would be slower than the single foreach.

Advantageous Changes

Some code was changed to make the rendering faster that does less work than the previous version. This makes a somewhat unfair comparison of end result but the code dropped does not make a huge difference to the overall time.

The changes are:

  1. Culled faces are removed.

In the previous version they were reduced to a point at the center of the poly. If we wished to retain this feature then a more optimal way of doing this would be to create an array of vertices at the 3d center of the poly and transform this point and use this point as the center rather than looping through the transformed vertices to compute the center point once per frame.

  1. Light vector.

We don't use DotProduct to compute the light normal since we know the light normal is {1 1 -1}, We could optimise this and only update the transformation when the light source moves.

  1. Transform BarryCenter3D for culling

Depth sorting will fail under some conditions regardless of how accurately we compute depth. Rather than having to project the center we just use the z value from the last vertex. The detrimental effects of this are seen on the teapot spout.

RESULTS

Totaling the number of milliseconds it takes to render a frame we find that:

  1. For all but the simplest of polyhedra, the specialising of the matrix transformation provides large performance gains.
  2. At up to 100 faces, the specalising of the Main display proc takes as long to byte code as is saved by inlining the projection code.
  3. Over 100 faces there is no contest as the specialised code gains increase with an increasing number of polygons.
  4. At over 2000 polygons the time taken to display the polygons in the Tk canvas starts to dominate the loop time and the apparant frame rate does not increase.

CONCLUSIONS

With the changes made we realise up to a three times performance increase without having to optimise for the particular type of application.

While some aspects of specialisation decrease performance, we can choose to apply specialisation based on input data at run time, so that we can get the maximal speed benefit for all types of input data.

   ie  if num faces > $breakpoint  Specialise else do not.

While compiled languages make the decision to optimise at compile time and must have decision logic in the processing loop to select different code paths, in Tcl we have the opportunity to make the decision at run time.

Because of the code as data model of Tcl we can analyse the data given at this instant in time and then rewrite our processing routines to best match the given data.

With the ability to introspect our environment we can apply heuristics to the running application and dynamicaly change the processing and evaluate performance of those changes.

The technique of specialisation can be applied to existing code, thus allowing the programmer to write code using a structured approach without having to think in terms of how to make it efficient.

The days of writing programs to process data are over. As programmers we need to focus more on writing programs that create programs that process data.


GS (050827) - Wow! Very, very impressive. Thanks for this wonderful and instructive contribution.

RS 2005-08-28 - Another argument for using set res instead of return $res: the latter, while being easier readable for people not so acquainted with Tcl, is in fact two commands - $res being basically equivalent to [set res]. But this is a (sometimes) hotly debated matter of taste.

RHS 22Sept2005 - Just to add an opposing argument, the use of [return $x] is considered better style by some, since it gives a clear indication that the code is returning the value on purpose, rather than just because it happens to be the last command to run in the proc. I'll note that I prefer [return $x] myself, but rely on my automated tests to tell me what behaviour is intended, and what is an accidental effect of how I wrote the code.