[Peter Newman] 9 January 2005 ---------------------- [Unified Programming Language] The language parsers are what parse and run the program scripts. Obviously, each language (Perl, Tcl, C, etc) or variant thereof has its own parser/interpreter. One defect with current versions of Tcl and Perl, is that the parser/interpreter parses and then immediately executes the source code. I know that's not totally true; there's an intermediate byte-code step in between. But in effect it's what happens. The problem with this is that it makes writing compilers, syntax checkers and language convertors etc. very difficult. Because every such tool not only has to compile and syntax check (or whatever it does), it first has to parse the source code - interpreting it exactly as the real parser does. This is by no means an easy thing to do. So UPL divides the whole parsing and execution thing up. There is:- * '''The Parser''' - which parses the source code - and converts into a stream of tokens - probably in tree format - that describes the commands/objects and their arguments found. That tokenised form of the program can then be passed directly to one of the tools below - or saved to disk for processing/analysis later. * '''The Byte Code Compiler''' - which converts the tokenised source code into byte-code form - which can be either saved to disk for later execution - or executed directly. * '''The Executioner''' - which executes the byte code - received either on-the-fly from the byte-code compiler - or read in from a file on disk. * '''The C Dumper''' - which converts the tokenised form of the program into C code - which can then be compiled. And then there are some optional things:- * '''The Reverse Parser''' - which converts the tokenised form of the program back into source code form. * '''The Reverse Byte Code Compiler''' - which converts byte code back into tokenised form. * '''The Obfuscator''' - which converts the variable and function names in the tokenised form of the program into meaningless trash - and then calls the '''Reverse Parser''' to convert this (hopefully now incomprehensible mess) back into source code form. * '''The Language Translators''' - If different languages were able to share the same tokenised and/or byte code forms - then automatically translating between them becomes possible. * '''The Syntax Checker''' - which analyses the tokenised form of the program. ---- [PWQ] ''10 Jan 05'' I would suggest a '''Tree Executioner''' to eliminate the need to have a bytecode compiler. [DKF]: What bytecodes do you define? Should there be a mechanism for modules to define new bytecodes? If that's the case, how do you stop collisions between bytecodes defined by different modules when you transport a saved bytecode sequence from one system to another? [LV] And how do you construct application safety, so that virii don't generate dangerous bytecodes which trash a system? ---- [Peter Newman] 11 January 2005: I don't know. Haven't thought about those issues yet. Obviously it can be done. But with this spec. the idea was to start at the top with a very high-level overview of the main features and goals of the language. And then gradually work down to spec out and then code up the details (coding say in Tcl first, and then C, once we're satisfied with the results). Also, all those components suggested above are the sorts of things that are found in current implementations of scripting languages like Tcl and Perl. But '''UPL''' is modular - with every component an optional part. So if somone wanted the parser to directly execute it's own output, they're free to write such a beast. Similarly, the '''Tree Executioner''. There's nothing wrong executing the tokenised form of the program either. And if multiple components are used, there's nothing wrong with different sets of components using different APIs. In other words, there can be as many different structures/definitions of the tokenised and byte-coded forms of the program, as people care to implement. Our language should be a dynamic, living beast; capable of evolvng and improving all the time. ---- [Category Discussion]