Scripted Compiler proposes the writing most parts of a compiler in a scripting language, with the goal of inciting some discussion and controversy on the topic.
Aged discussions surrounding the topic are stored in Scripted Compiler Discussions. This page contains distilled information and fresh discussions (See the section at the end).
With machines getting faster and memory cheaper, would it make sense to write compilers which only have bits and pieces written in C, Ada, ..., and their main components (like complex manipulation of whatever data structures are needed to represent and optimize code) are actually written in a scripting language, like Tcl, Python, ... ?
And secondary, if we have such a compiler for, for example, the C language, does make it sense to package it up as an extension for the scripting language the majority of it is written in, for example, Tcl ?
Have fun thinking about this -- AK :)
Note that this is not so much about taking an existing compiler and making it scriptable, and thus allowing others to change its behaviour, but more about making use of high-level data structures available in scripting languages to make the implementation of algorithms for data-flow analysis and such more ... understandable and/or maintainable.
Also note that this not about configuration files, but about implementing parsers, etc. for system languages like C. In a wider sense such are also useful in tools like Source Navigator which extract x-ref information out of sources.
Sub projects to think of ... | Examples |
---|---|
Scripted Lexing | Lexing C, Lexing SQL |
Scripted Parsing | Parsing C, Parsing SQL |
Scripted Code Generation | none yet |
Note the obvious connection to Starkits.
In the context of using a scripted compiler to extend the interpreter of a scripting language there are three main components:
Compiler results:
Reasoning and scenarios behind the above:
The above are the scenarios I thought of when I wrote up the lists of required packages and compiler results. For a new scenario I thought of recently see below.
AK: It is not clear if the gain to be had by using higher optimized machine code is outweighed by having to load a large native binary library. The research regarding slim binaries suggests that this is not so. At least for shortly running processes, IMHO. For long-running processes the initial overhead could easily be matched by the gains in speed we get. The problem is to determine the break-even point, i.e. the point where it makes sense to switch from one to the other.
I should note that the researchers in the area of slim binaries also research the field of optimizing the machine code of highly used procedures at runtime, in parallel to the actual application, using spare cycles in a low-priority thread. The above scenario could be seen as an evolution of this where the optimization results are written to disk for sharing with future instances of any application using the same procedures.
jcw: Thoughts... Maybe components 1+2 and result 3 are sufficient as first step? Also: binary code would be a great way to secure part of an app, which in turn could then supply keys for decoding the rest. With slim binaries, it gets even better because the same "code" runs cross-platform.
AK: 1+2/3 is essentially Critcl without an external compiler, and as such a logical first-step. But note that for result 3 we might need result 2 as source.
The relationship between a scripted compiler and Critcl.
DKF: How would you go about debugging such a beast? Without debugging... Well, let's just say that's highly scary, shall we?
AK: Testsuites for the components, trace log (package require log), data structure dumps (tree's, graph's) fed into visualization tools, symbolic debuggers, like in TclPro.
TP: I find tcc interesting. It might be small enough for a self-contained Tcl extension C compiler that wouldn't require exec'ing gcc. Current downside is tcc generates x86 only.
TP: Another cool project to keep in mind for a Tcl compiler backend, LLVM
RC: I am currently using Python to implement a C-to-VHDL optimizing compiler for FPGAs. I definitely recommend using "scripting" languages, at a minimum, as the glue languages between different algorithms, although the built-in data structures of scripting languages (Python has lists, tuples, and dicts) certainly make some of the hairier optimization routines much easier to understand. (For more info, you can see [L2 ]).
TJC compiler is a compiler for the TclJava project. It is written in Tcl and produces Java from Tcl. -TP
tclc compiles a Tcl script to a Tcl extension.
Zarutian adds a link to the online book The Art Of Assembly Language . Comments on the book appear here [L3 ].