To incite some discussion and controversy.
Basic question: With machines getting faster and memory cheaper, would it make sense to write compilers which only have bits and pieces written in C, Ada, ..., and their main components (like complex manipulation of whatever data structures are needed to represent and optimize code) are actually written in a scripting language, like Tcl, Python, ... ?
And secondary, if we have such a compiler for, for example, C, does make it sense to package it up as an extension for the scripting language the majority of it is written in, for example, Tcl ?
Have fun thinking about this -- AK :)
Concrete sub projects ...
Absolutely IMO... (see also my notes at [L1 ]) -jcw
Lightning (written in C) is GPL, but the concept could be used to generate machine code from within Tcl, plus some bits to write ELF, COFF, etc. libraries the low-level backend would be done. - AK
Too much machinery - I'd generate raw into a bytearray. Store it that way too. ELF is for linkers and other arcane concepts <wink> -jcw
Do not forget pre-compiling of binaries. Even if the compiler is in an extension I do not want to deliver it with every application. Especially not wrapped applications. - AK.
Absolutely. Which is why I said: "store it that way too". -jcw
Hm, sort of like tbcload, but machine code ? ... A mixture of bytecode and machine code might be of interest. A loader format containing binary data in a script language wrapper ... Oh, I just remember Oberon's slim binaries [L2 ]. These store a portable AST of the input (not bytecode) and use a simple compiler/mapper to machine code when loading. Lower-level than Tcl bytecodes, and are still as efficient as machine code, because in the end they are machine-code. I think I will start a new thread below to collect and merge the thoughts into something cohesive. - AK.
DOS C compilers used to offer in-line assembly language as a matter or course. Why shouldn't a scripting language offer in-line C as a matter of course? - WHD
We already have inline-C with CriTcl, but that calls an external compiler, gcc for now. The secondary question is more about calling the/a scripted compiler, directly loaded into the language interpreter. - AK
True. It would be nice to have CriTcl's functionality while dispensing with any kind of external compiler. But other than providing an easy way to inline C in a Tcl script, what good would a scripted compiler be? E.g., how would you use scripts to change the compiler's behavior? - WHD
I was less thinking about changing its behaviour and more of making use of highlevel data structures available in scripting languages to make the implementation of algorithms for data-flow analysis and such more ... understandable and/or maintainable. - AK.
AM Because of Critcl I am working on a package that abstracts the concept of a compiler and a linker away from the platform-dependencies. This way Critcl will be able to support "any" compiler/linker without the user (or the Critcl programmer) having to jump through hoops.
Vince That sounds great!
NEM There is Babel, by Paul Duffin. However, like Feather, it may take a while to get hold of any code from Paul.
Merging the above threads and thoughts into a more cohesive picture:
Note the obvious connection to Starkit's.
In the context of using a scripted compiler to extend the interpreter of a scripting language I see three main components:
Compiler results:
Reasoning and scenarios behind the above:
The above are the scenarios I thought of when I wrote up the lists of required packages and compiler results. For a new scenario I thought of recently see below.
It is not clear if the gain to be had by using higher optimized machine code is outweighed by having to load a large native binary library. The research regarding slim binaries suggests that this is not so. At least for shortly running processes, IMHO. For long-running processes the initial overhead could easily be matched by the gains in speed we get. The problem is to determine the break-even point, i.e. the point where it makes sense to switch from one to the other.
I should note that the researchers in the area of slim binaries also research the field of optimizing the machine code of highly used procedures at runtime, in parallel to the actual application, using spare cycles in a low-priority thread. The above scenario could be seen as an evolution of this where the optimization results are written to disk for sharing with future instances of any application using the same procedures.
-- AK
Thoughts... Maybe components 1+2 and result 3 are sufficient as first step? Also: binary code would be a great way to secure part of an app, which in turn could then supply keys for decoding the rest. With slim binaries, it gets even better because the same "code" runs cross-platform. -jcw
1+2/3 is essentially CriTcl without an external compiler, and as such a logical first-step. But note that for result 3 we might need result 2 as source. - AK.
Comments on slim binaries: I was unable to find a specification of the slim binaries used by Oberon. The best I got from goog'ling the web, news:comp.lang.oberon , and news:comp.compiler were the authors (Michael Franz, Thomas Kistler) simply refering to the Oberon sources, a defunct reference to a slim binary decompiler [L4 ], and a proclamation of the author of said decompiler that the format as-is is a total hack and writing up a formal specification for it would make no sense [L5 ]. All of that was in 1999 and I can't seem to find newer references.
So if the we truly want to use something like slim binaries we might have to come up with our own variant. >:(
-- AK
I remember reading about the AST that the Oberon slim binaries use. You might also want to check out the Oberon books, I think Wirth collab'ed with someone on one of 'em. I'm sure there are papers on it on the web... check out citeseer. Ro
The papers are easy to find. I have now a whole collection of them. But a specification, or a more detailed explanation of their algorithm ? Nope. Zilch. ... Yesterday I read a paper which mentionend a prototype written in Python. I have now sent a mail to one of the authors of said paper, asking for the sources. We'll see. - AK.
Well hopefully you'll get an answer. Thanks for keeping us apprised of the situation ;) - Ro
Ok, none of the people I have sent mail to have answered so far. Seems we are on our own in this -- AK.
Comments on parsing C and other languages. The text processing capabilities of scripting languages can make this part easier as well. string map, regsub, split, etc. are powerful tools. Use the first, for example to the detect the various types of tokens and insert marker characters for separation. After this pre-tokenization phase we can listify the string into the token-list via split. Easy, fast, ... This takes care of the lexer. Parsing ... Yeti, or similar - AK.
I've never really put much time in learning all that arcana... especially since Tcl makes it dead easy to write config files as Tcl ;) Use the source Luke ;) - Ro
Well, the comment above are not about config-files, but the scripted compiler for C, etc. Also useful in Tools like Source Navigator which extract xref information out of sources. - AK.
AK Aug 14 2002 -- see Lexing C for an example
AK Aug 17 2002 -- see Parsing C for more examples
Comments on other approaches -- this is not unlike what Armin Rigo talked about at OSCON 2002 - see http://homepages.ulb.ac.be/~arigo/psyco/ for his home page for Psyco. In his talk, he particularly emphasized that rewriting most of the Python C internals in Python would, when run through the Psyco compiler, actually run faster than the current C implementation. Ideas worth sharing. DavidAscher
Notes: http://sourceforge.net/docman/display_doc.php?docid=8148&group_id=41036
Psyco seems to be a mix of generic translation to machine code + specialization of said machine code to concrete values in variables and function parameters.
The language SML uses an internal compiler, though its source format is not C but SML itself. OTOH, it does mean that building the binary is interesting, especially on supported binary architectures... DKF.
Aku, another way you might want to take this appears here [L6 ].
The PhD thesis of Michael Franz, which was the key work for Slim Binaries, is called "Code-Generation On-the-Fly: A Key to Portable Software" can be found here: http://www.inf.ethz.ch/research/publications/data/diss/th10497.ps.gz [Alternate link [L7 ] ] An essential reference from that paper is a text about Universal Symbol Files, pdf [L8 ]
Here is a reference to a an abstract of a MS thesis (german Diploma Thesis :-) implementing SlimBinaries on the Intel platform: http://www.cs.inf.ethz.ch/group/wirth/stud_work/96mfda01/ the referenced PostScript is in german and not a very good structured PostScript :-).
AK: Reference list @ http://greywire.8m.com/greywire/portable.htm
JBR: Can someone explain how slim binaries is different from forth?
AK: My understanding of SB's: They contain the abstract syntax tree of the sources in compressed form. At loading time the AST is uncompressed and compiled to machine code, on the fly. The AST never resides completely in memory. Forth code is not an AST, right ? Just a series of words which, when executed, perform the desired application.
JBR: Yes, forth is a series of "words", either bytecodes, addresses or jmp instructions depending on the implimentation, which manipulate the virtual CPU. I didn't understand the acronym AST. When a forth compiler is invoked the list can be radically optimized as target machine code fairly easily. I envisioned the scripted compiler as a classic translator C -> RPN -> machine code. RPN is essentially an AST laid down depth first order. - RPN in Tcl
DKF: How would you go about debugging such a beast? Without debugging... Well, let's just say that's highly scary, shall we?
AK: Testsuites for the components, trace log (package require log), data structure dumps (tree's, graph's) fed into visualization tools, symbolic debuggers, like in TclPro.
Some additional references: slim binaries [L9 ], Oberon and slim binaries [L10 ], Thesis contrasting Java bytecodes with slim binaries citing Franz[L11 ] -- escargo