Version 96 of Scripted Compiler

Updated 2007-10-16 07:16:55 by wowpowerleveling

Note: Aged discussions surrounding the topic are stored in Scripted Compiler Discussions. This page contains distilled information and fresh discussions (See the section at the end).


To incite some discussion and controversy.

Basic question: With machines getting faster and memory cheaper, would it make sense to write compilers which only have bits and pieces written in C, Ada, ..., and their main components (like complex manipulation of whatever data structures are needed to represent and optimize code) are actually written in a scripting language, like Tcl, Python, ... ?

And secondary, if we have such a compiler for, for example, the C language, does make it sense to package it up as an extension for the scripting language the majority of it is written in, for example, Tcl ?

Have fun thinking about this -- AK :)


Note that this is not so much about taking an existing compiler and making it scriptable, and thus allowing others to change its behaviour, but more about making use of highlevel data structures available in scripting languages to make the implementation of algorithms for data-flow analysis and such more ... understandable and/or maintainable.

Also note that this not about configuration files, but about implementing parsers, etc. for system languages like C. In a wider sense such are also useful in tools like Source Navigator which extract x-ref information out of sources.


Sub projects to think of ...


Related pages in this Wiki:


Other references

  • jcw has his notes about a similar topic at: Why compilers are doomed [L1 ].
  • Another way we might want to take this appears here: Generating Code at Run Time With Reflection.Emit [L2 ] (old, broken link [L3 ]).
  • DKF notes that the language SML uses an internal compiler, though its source format is not C but SML itself. OTOH, it does mean that building the binary is interesting, especially on supported binary architectures...

Note the obvious connection to Starkit's.

In the context of using a scripted compiler to extend the interpreter of a scripting language there are three main components:

  1. The interpreter itself, possible written in a combination of its scripting language and a system language. Has a mechanism for loading script code, and shared libraries as defined by the OS it is running on.
  2. The compiler package. Takes files containing code written in a system language, and/or files written in a mixture of the scripting language and the system language and compiles them. In Tcl this compiler can evolve out of Critcl. The compiler is able to generate three different results, listed below.
  3. A package for loading slim binaries. This package provides a command reading a slim binary, compiling its contents into (in-memory) machine code and linking that into the interpreter.

Compiler results:

  1. Slim Binaries. Such files contain data near to machine code, but not quite. Easy to compile (or map) to machine code, hence very efficient at runtime, but also portable. If the source is a combination of scripting and system language code the slim binaries could either contain the script code, or the portable bytecode used by the interpreter.
  2. In-memory machine code. This can be achived by a combination of the last item and the package to load slim binaries. For efficiency we just have to create a path where it is not necessary to write the slim binary to a file before mapping it to machine code.
  3. A binary library containing machine code in a format native to the target processor and OS. Note emphasis on target processor. Cross-compilation is well within our long-range goals.

Reasoning and scenarios behind the above:

  1. The interpreter core (Component 1), a mixture of a system language and its own language compiles itself, either for the processor/OS it is currently running on, or for a different processor/OS. The second case is standard cross-compiling, i.e. porting the core to a new platform, after the compiler is extended to generate code for that platform. The first case makes sense too. It can be used to have the intepreter core pick up changes in the compiler, like better optimization algorithms. It is these two cases for which we need the compiler (Component 2) to be able to native binary libraries (Result 3).
  2. A scenario for package authors: A package containing a mixture of a system language and its own language is loaded, and the system language parts are compiled into in-memory machine code for use by the other parts. This requires result 2, and, of course, the compiler itself.
  3. Extending the above to deployment it makes sense, IMHO, to precompile the system language parts into a dense portable encoding like slim binaries which can be shipped everywhere, are still as fast as machine code and do not have the overhead of truly parsing the system language as in the scenario above. In this scenario we do not need the full-fledged optimizing compiler package (FFOCP) at the target, only a loader package for the slim binaries, i.e. component 3. Actually the FFOCP would be detrimental as the overhead of optimizing would negate the gain we get from having to load only a small file.

The above are the scenarios I thought of when I wrote up the lists of required packages and compiler results. For a new scenario I thought of recently see below.

  1. If the target host of a deployed package has the FFOCP too it not only could use slim binaries quickly mapped to machine code, but also a native library generated by the FFOCP in the spare time, or in batch mode, containing higher optimized machine code than generated by the loader.

AK: It is not clear if the gain to be had by using higher optimized machine code is outweighed by having to load a large native binary library. The research regarding slim binaries suggests that this is not so. At least for shortly running processes, IMHO. For long-running processes the initial overhead could easily be matched by the gains in speed we get. The problem is to determine the break-even point, i.e. the point where it makes sense to switch from one to the other.

I should note that the researchers in the area of slim binaries also research the field of optimizing the machine code of highly used procedures at runtime, in parallel to the actual application, using spare cycles in a low-priority thread. The above scenario could be seen as an evolution of this where the optimization results are written to disk for sharing with future instances of any application using the same procedures.

jcw: Thoughts... Maybe components 1+2 and result 3 are sufficient as first step? Also: binary code would be a great way to secure part of an app, which in turn could then supply keys for decoding the rest. With slim binaries, it gets even better because the same "code" runs cross-platform.

AK: 1+2/3 is essentially Critcl without an external compiler, and as such a logical first-step. But note that for result 3 we might need result 2 as source.


The relationship between a scripted compiler and Critcl.

  • Critcl enables inline C, i.e. C constructs in Tcl. It currently relies on an external compilers (gcc) to perform the translation to machine code.
  • Given that it can certainly make use of a embedded scripted compiler instead of forking out. The components 1+2/3 above are essentially Critcl without an external compiler
  • On the other hand a scripted compiler can make use of Critcl to allow recoding of speed-critical parts in C.
  • In the end we can have a Critcl with embedded scripted compiler which has fallbacks to C for critical parts which when lifts itself from pure-Tcl to a combination of Tcl and C/machinecode. This is


Comments, notes, and discussion

DKF: How would you go about debugging such a beast? Without debugging... Well, let's just say that's highly scary, shall we?

AK: Testsuites for the components, trace log (package require log), data structure dumps (tree's, graph's) fed into visualization tools, symbolic debuggers, like in TclPro.

TP I find tcc interesting. It might be small enough for a self-contained Tcl extension C compiler that wouldn't require exec'ing gcc. Current downside is tcc generates x86 only.

TP Another cool project to keep in mind for a Tcl compiler backend, LLVM http://llvm.cs.uiuc.edu/

AK See also LuaJIT [L4 ].

RC I am currently using Python to implement a C-to-VHDL optimizing compiler for FPGAs. I definitely recommend using "scripting" languages, at a minimum, as the glue languages between different algorithms, although the built-in data structures of scripting languages (Python has lists, tuples, and dicts) certainly make some of the hairier optimization routines much easier to understand. (For more info, you can see [L5 ]).

TJC compiler is a compiler for the TclJava project. It is written in Tcl and produces Java from Tcl. -TP

tclc compiles a Tcl script to a Tcl extension.


Zarutian adds a link to the online book The Art Of Assembly Language [L6 ]. Comments on the book appear here [L7 ].


Category Discussion

Common to most MMORPGs, [L8 ] is a means of quickly gaining

experience and getting your character to the higher levels in a very short span of time. In World

of Warcraft there are many techniques that can help you to reach your leveling goals. The few that

are listed here work great and if you get into the habit of using them over time you will begin to

level very quickly.

One of the easiest ways to level your character is to get in with a group of higher level players.

You will receive more experience as they will be fighting higher level monsters than you would be

able to handle on your own. Simply befriend a player who is at a higher level than you and get

invited into their group. This is one of the easiest and most common ways of leveling up quickly.

Sometimes a balanced group of two or three is much more efficient than soloing. This is

particularly true when a Quest requires killing a certain number of monsters. Simply quest with

groups when you feel it is necessary and fight solo whenever you feel you may be held back or

hindered by them. In other words, use your intuitive sense to decide which is most efficient for

you at any given time.

There is some confusion as to whether questing or grinding is best for [L9 ]. I feel that this is a matter of personal preference. Some people actually enjoy the mindless tedium of spending countless hours grinding away at mobs of monsters for experience. While others prefer to mix things up with the excitement of faster leveling and story telling that comes with Questing. You will earn more experience and level quicker in a shorter amount of gameplay time through Questing. It all depends upon how you like spending your time while playing World of Warcraft. However, if you are wanting to Power Level then Questing is the definitely the quicker route.

Never be afraid to drop Quests that are overly long. Quests that require a ridiculous amount of

traveling or time to complete are useless to players that are trying to

[L10 ]. If you are taking

Quests in order to level up more quickly the last thing you will want to do is waste a ridiculous

amount of time on an overly long and complicated Quest. There are quite literally thousands of

Quests to choose from in World of Warcraft so move on to those that are finished quickly and

require little traveling.

[L11 ] is an excellent way of

preventing yourself from becoming stuck in the middle levels as many players tend to do later on

in World of Warcraft. Getting stuck like this can cause the game to become monotonous and boring

for some. For players who want to avoid this problem, [L12 ] is

the obvious choice. If you require more information or help, there are many online resources

available that can provide you with more detailed strategies concerning [L13 ] in World of Warcraft. Miles Tyler is an avid gamer and World of Warcraft enthusiast. He enjoys WoW and also loves helping others to enhance their WoW gaming experience.