''[PWQ] 26 Aug 05''

'''INTRODUCTION'''

I was recently translating some ''C'' code to ''Tcl'' to test out some [tclogl] demos.

In the process I was changing the some of the constructs to 'Tclisims' but it occured to me that as programmers we sometimes get stuck in the paradigm of languages that we grew
up with ''(such as C)'' and do not make the shift to the dynamics of ''Tcl'' when writing code.

I was fixing the code for [3D polyhedra with simple tk canvas] making hidden line removal work when I decided to try and apply the dynamics of Tcl to this code to make it faster.

I have tried to leave as much of the original code untouched to so you can get real time comparisons of the speed increases but some minor tweaks were necesary to support the two modes concurrently.

A starkit of this code can be found at 
http://qs.co.nz/Tcl/3D/3display_specialised.kit .

'''INITIAL CONVERSION'''

In looking at the code We can see some undesirable constructs. These are addressed outside the main optimisation effort as they are just ''wrong''.

   1. '''return [[list ....]]'''
This is quite common and I must say that I still find myself doing this from time to time. Unlike ''C'' procs return the value of the last command executed. So when a command such as ''list'' or ''expr'' is the last one in the proc we do not need to use (and should not use) return.
I advocate using ''set x'' rather than ''return $x'' just to reinforce the practice even though there is no longer any performance penaly for the latter.

   1. '''select ... case x : do x()'''
We sometimes get lost in the woods with the select statement because in ''C'' you need to use numeric enums in place of text strings. In the ''ReadData'' proc we simply default to sourcing the file with the same name as the switch statement test case.

   1. '''expr { ... [[expr ...]]}'''
There is generally never any need to use ''expr'' inside it'self. While we find that ''Tcl's'' readibility of infix notation is less than we would like, we have to trust that once we are inside expr we can cut loose with infix notation as we would with any other language.

   1. '''for vs foreach'''
When we have a fixed number of iterations, then creating a list of loop items is always quicker then using ''for''. When we are in a tight loop, we need to think 'foreach' rather than 'for'.


'''OPTIMISATIONS'''

We now turn to the code and look at how we can modify the existing code to increase performance.


'''Caching'''

Unlike other languages there is nothing special about the ''expr'' command. As such there is no such thing as common subexpression elimination, partial evaluation and the like.

We should be kind to the language and provide assistance where necessary.

Eg.
     set x [expr {[lindex $M 1 2] * 45}]
     set y [expr {[lindex $M 1 2] * 63}]
Could become:
     set p [lindex $M 1 2]
     set x [expr {$p * 45}]
     set y [expr {$p * 63}]
My preference:
     set x [expr {[set p [lindex $M 1 2]] * 45}]
     set y [expr {$p * 63}]

While ''lindex'' may not be much of an overhead , one only has to change the above to use ''[[averyslowproc ...]]'' and the benefit becomes self evident.

The latter example is the way specialisation would occur without affecting the execution of the code.

[Memoizing] is such a simple technique that any proc that is infact ''functional'' that may be called numerous times during the life of an application should be ''memoized'' by default.

We have seen that we can auto memoize a proc by including the command '''MEMOIZE'' as the first line of a proc. We could also simply add the line:
	memoize MyProc
at the end of the source file.


'''Colour lookup cache'''

While the natural string based approach to colour generation is to use "#$r$g$b" there is no need to do this every frame when the colours are a fixed selection. We precompute the colors and then select the appropriate one from a list. This also allows Tk to convert the object into a colour type for faster processing.


It would be nice if Tcl would support a numeric specification for colours such as a three number list, so that we don't have to resort to string concatination all the time.


'''Flatten lists.'''

While the new syntax of ''Tcl'' allows for accessing nested lists from ''lindex'' like we can access arrays in ''C'', this is not quick when we must perform multiple ''lindex'' commands to access data.

While the notation of a ''list of lists'' models the way we view the data such as vertex arrays, we find that we are repeatedly converting the sublist back into it's seperate components (eg x y z) to process the data. This means constructs like:
	foreach v $lvtx {
		foreach {x y z} $v {break}
		...

With a flattened list we can use one foreach with:
	foreach {x y z} $flat_lvtx {
		...

'''Foreach extension'''

Now the lists are flattened we can place the list scanning into one ''foreach'' statement.

Most Tcl programmers are familiar with scanning a single value with foreach, but the command is much more flexible than that.

A standard loop may look something like:
	foreach v $vertex_list {
		foreach {x y z} $v {break}
		foreach f $face {
		...

We can now merge all the above into one giant loop:
    foreach {x y z} $::flat_lnv kv $::flat_lvtx p $lpoly 
	...

We still need to process each vertex with a seperate loop with:
	foreach {x y z} $kv {
		...
This is because we have a variable length of vertices and a fixed length for normals and faces.


'''Specialisation'''

In only 50 lines of raw code we specialise three produres each frame. For objects with a small number of faces this reduces the performance due to the time taken to byte code the new procs.

Note that in this demonstration we create a new proc from the existing one. This is solely for the purpose of being able to run with the optimisations turned on or off for comparison. 
This has meant some minor changes to some procs to select the appropriate proc to call. There is nothing in this technique that would stop the changes from being applied to the original procs. This is the desired goal so that we do not need to adopt special coding conventions to allow specialisation to be effective.

I have also inserted special comment makers in the code to make it trivial for the specialisation procs to locate the correct code fragment to extract. Again this not necesarry for this technique per see, and we could have appled analysis to the code to make the necessary changes needed for specialisation. 

In this application we specialise the Projection and VectorMatrix code. As a next level we specialise the Rendering code to inline the vertex projection code as well as take out the Shading code when rendering as wire frame.

'''Cull then transfrom vertices'''

Since we are going to cull the back facing polygons, there is not much point in transforming the vertices for these polygons.
We compute the face normal z component and remove the poly by moving to a place off screen. For visible faces we then continue with the x and y projection.

Since vertices are shared amoung faces we now need to know which ones we have transformed so that we don't do it twice.

For this we use a associative array to provide memoisation of the vertex transformation. We could also have used a list but the multiple ''lindex'' commands would be slower than the single ''foreach''.


'''Advantageous Changes'''

Some code was changed to make the rendering faster that does less work than the previous version. This makes a somewhat unfair comparison of end result but the code dropped does not make a huge difference to the overall time.

The changes are:

   1. '''Culled faces are removed.'''
In the previous version they were reduced to a point at the center of the poly. If we wished to retain this feature then a more optimal way of doing this would be to create an array of vertices at the 3d center of the poly and transform this point and use this point as the center rather than looping through the transformed vertices to compute the center point once per frame.

   1. '''Light vector.'''
We don't use DotProduct to compute the light normal since we know the light normal is {1 1 -1}, We could optimise this and only update the transformation when the light source moves.

   1. '''Transform BarryCenter3D for culling'''
Depth sorting will fail under some conditions regardless of how accurately we compute depth. Rather than having to project the 
center we just use the ''z'' value from the last vertex. The detrimental effects of this are seen on the ''teapot spout''.

'''RESULTS'''

Totaling the number of milliseconds it takes to render a frame we find that:

   1. For all but the simplest of polyhedra, the specialising of the matrix transformation provides large performance gains.
   1. At up to 100 faces, the specalising of the Main display proc takes as long to byte code as is saved by inlining the projection code.
   1. Over 100 faces there is no contest as the specialised code gains increase with an increasing number of polygons.
   1. At over 2000 polygons the time taken to display the polygons in the Tk canvas starts to dominate the loop time and the apparant frame rate does not increase.


'''CONCLUSIONS'''

With the changes made we realise up to a three times performance increase without having to optimise for the particular type of application.

While some aspects of specialisation decrease performance, we can choose to apply specialisation based on input data at run time. So we can get the maximal speed benefit for all types of input data.
   ie  if num faces > $breakpoint  Specialise else do not.

While compiled languages make the decision to optimise at compile time and must have decision logic in the processing loop to select different code paths, in Tcl we have the opportunity to make the decision at run time. 

Because of the ''code as data'' model of Tcl we can analyse the data given at this instant in time and then rewrite our processing routines to best match the given data.

With the ability to introspect our environment we can apply huristics to the running application and dynamicaly change the processing and evaluate performance of those changes. 
 
The technique of specialisation can be applied to existing code thus allowing the programmer to write their code using a structured approach without having to think in terms of how to make it efficient.

The days of writing programs to process data are over. As programmers we need to focus more on writing programs that create programs that process data.

----
[Category Example] | [Category Education] | [Category 3D Graphics]