Peter Newman 2005-01-08:
Instead, every data type and command/function is a stand-alone, script level programmer selectable entity.
In other words, every programmer can create their own personal programming language - by selecting the data types and commands/functions they want.
These are selected by editing UPL: The Bootstrap File.
LV: Define what you mean by core. I know some people use the term to be the code that is compiled together to create a language. Are you saying that one won't be able to download code that compiles to create UPL?
Peter Newman: See below. And Yes you will be able to download the code that creates UPL.
DKF: You have to have some core because that core is responsible for the bootstrap process.
Peter Newman: Yeah you're right. Also I suppose the API between different components can be be considered part of the core too. But note that NO scripting language data types or functions/commands are part of the core. That's the important thing.
aricb: Peter, this idea that "the core is the root of all evil" seems to be one of your main motivations for proposing UPL. But I have yet to see a convincing argument for this position. The two complaints I seem to remember hearing from you are: the core is too big, and the core is too complex because different parts of the core are too interdependent. I suppose these are matters of taste, but I don't see how the current proposal for UPL resolves either of these concerns. Okay, UPL has a minimal core, but somewhere in there is code to implement every programming language known to mankind. These pieces may or may not be interdependent, but on one or the other level there is going to be a lot of redundancy, which implies complexity of a different kind and a size a few orders of magnitude greater than the Tcl core will ever reach. Even if all I want out of UPL is Tcl, I can't imagine that the end result (UPL minimal core + Tcl module + whatever else it takes to make it run) will be smaller than today's Tcl.
Maybe this is an unfair assessment of your position on the core and/or I'm missing something key about UPL. If so, can you explain why the core is the root of all evil and how UPL fixes it?
DKF: It just occurred to me that we've already got the kind of core Peter's after.
It just happens to be machine-readable instead of human-readable. It's usually called something like crt0.o ...
WHD: I might add that the different parts of the core are interdependent for a reason. Consider the regular expression module, which is used all over the core by lots of different commands. With the proposed UPL architecture, you'd have to choose one of these possibilities to get the same functionality:
PWQ 2005-01-12: while I don't want to answer for Peter, It would be both facile and trivial to engineer a system that allowed regexp to be a module that is included or not but at the same time be shared by those commands (ie lsearch string glob) that can use it (if present).
Peter Newman 2005-01-12: I've got a feeling that there might be a little bit of confusion as to what is meant by the term core.
To me, the core is the script level Tcl commands (set, expr, string, file, etc, etc); data types (scalar (variables with a single value), list, array, dict, etc), and; variable scopes (local and global); etc - that the script level programmer sees.
But it doesn't include what - from now on I'll refer to as the kernel - which in Tcl/Tk comprises:-
LV: You do realize, I assume, that you are not going to be able to implement the script level Tcl commands without some underlying language that, eventually, gets turned into machine code. Generally that is done in C because C code moves between compilers and platforms easier than C# or C++. One might, in theory, be able to do what you want in Java, but the underlying concept of a set of code that implement script level functionality still exists.
There is no removing the C API, unless your plan is to replace it with Java or something. Certainly, you can HIDE the C API by not documenting it. Or you could make it so obtuse (like at least one major OS vendor who has a lion's share of the market world wide) that people are afraid to use it.
But an API is just a social contract that to accomplish a task, one can depend on passing certain arguments to a function to get something to happen. There only "complexity" removed is removing the ability to write truely new script functions... which seems to be counter to the usefulness of a language.
So in UPL Tcl/Tk there will still be a kernel - which the core script-level commands will no doubt make use of to do their magic.
The difference is simply that, instead of compiling all the core commands into a single monolithic executable (with supporting DLL/so's), each core command or data type will be a completely independent stand-alone module (in dll/so format,) which the script-level programmer can include or exclude from their own personal version of the language as they see fit.
I've got the feeling that some people thought I was suggesting we throw away the Tcl and Tk C API's - which isn't the case.
aricb: I'm fairly certain I understand your proposal, and I think your ideas that are interesting. But I personally don't see a need for UPL; I disagree what I understand are the motivations for UPL, and I also think that the proposal as it now stands doesn't adequately address your concerns with the current state of Tcl/Tk. I'm hoping you'll explain in more detail what you feel are the shortcomings of Tcl/Tk and how UPL fixes them.
The size of the core has less relevance than the size of the language as a whole, don't you think? Somewhere, somehow, you've got to implement a useful language, whether in one big piece or several little ones.
The "monolithic" model has some distinct advantages over the "many small pieces" model. Code fragments that are applicable to multiple parts of the core need only be present once in the monolith. For example, several core commands internally use hash tables. With a monolithic core, you need only code up one hash table framework and all commands that need it have access to it. Under your proposal, either you have to provide a separate hash table framework as part of each "command module" that uses one, or you have to make each relevant module depend on a separate hash table module. If you go with the first plan, your code mushrooms and becomes highly redundant. If you go with the second plan, your code not only retains the interdependencies you were concerned about, but makes matters worse by requiring one more module for everybody to deal with.
I understand that you want to be able to cobble together your own custom language for each program you write. That sounds painful to me, but to each his own. But somewhere, either on your computer or on the end user's computer or both, is going to be not just the language you've put together, but all the commands of all the languages in UPL. Otherwise, how can you write or run a UPL program? And that truckload of code, whether it's in the core or elsewhere, is going to be huge.
But even if you figure out some way so that the only code that has to be on anybody's computer is the code that provides whatever language you put together, if that language is going to allow the programmer to do something useful, at best it is going to approach the size of the Tcl/Tk core, and I suspect it is going to be much bigger.
So my understanding is that the UPL core is going to be small, the UPL library is going to be gargantuan, and any given practical command set built with UPL (prerequisite for any useful program written in UPL) is going to range from Tcl/Tk-size to very much bigger. And to write a UPL program, in addition to the time you spend writing the program, you're going to have to invest the time to piece together a working language, as opposed to sitting down and writing a Tcl/Tk program, assured that the entire pre-packaged, fully functional core is at your disposal. As I see it, UPL is less efficient in terms of time and space. How is this better than what we have now?
I don't mean to be rude or to belittle the considerable time and effort you've put into this. But I can't "buy" UPL until I understand what's wrong with Tcl/Tk and how UPL fixes it (without introducing so many more problems that I'm forced to stay with the lesser of two evils :).
DKF: The other problem with doing lots of DLLs is that it becomes quite a bit slower to start the system up (where that counts as the point when, say, you get the splash screen up or start accepting network connections if you're a server app.)
Peter Newman 2005-01-13: Thanks aricb. I've obviously got a lot more explaining to do. But it's going to take some time (as in weeks/months). I've only got limited time; so must chip away at it slowly.
USCode 2005-01-15: "I Have A Dream": I know exactly (I think... ;-) what Peter is proposing here as I've been thinking about the same thing myself the last few months. I call it my "I Have A Dream" language. For me it's a matter of packaging. I currently distribute all my applications via Freewrap or Starkits, with my script and the Tcl/Tk environment all wrapped up in one nice executable. It's a matter of convenience for my users and makes my life easier too. While Freewrap and Starkits take advantage of UPX compression, the required Tcl/Tk environment that must be wrapped with my script is still significantly larger than need be. If I don't use a particular command, then I don't want it to be part of my executable. Ideally, I'd like to wrap the minimal UPL core along with a small package for every command I need (no more, no less), my scripts and any other packages or resources needed. All wrapped up in one nice, tidy and minimal package. If a command such as eval is utilized then you might need to include all UPL commands. Perhaps even some kind of preprocessor could auto-generate the required list of commands by parsing your script code? This would be utilized when you're ready to distribute your final application, however during development you could just use for example, a "package require TclTk" command that would make all installed commands available for convenience. Small executables and portable source code, yet with the power of a scripting language. It's all a distribution mindset. =)
? I don't know Tcl/Tk internals but given it's current command-based design, would something like this be reasonably do-able with a large part of the current codeline? Just a matter of breaking it up into the appropriate pieces?
Lars H: I suspect that reducing the Tcl command set but not the underlying C library (as seems to be the meaning of Peter Newman's "no core, but a kernel") won't buy you much in terms of overall size. Perhaps (but here I'm just guessing) something like a 9% reduction in executable size if you drop 90% of the language commands. In order to do better, he would have to go beyond reasoning in terms of script level commands.
aricb: In theory I understand the desire for a smaller executable. But in practice, given the current state of affairs, I don't see that it's an issue. First of all, if you use starkits rather than starpacks, it's not an issue at all because the core isn't in the kit (on the other hand, you need some other way to get the appropriate executable to your end user; how hard that is depends on who the end users are). But even if you use starpacks, how much can you save, for example, by throwing out the canvas? Remember that as much as possible, the canvas takes advantage of code provided by other widgets within Tk. Sure, there's a lot of canvas-only stuff there, too, but it's not like the wheel was reinvented for every widget; and what that means is that you may not get the benefit you anticipated if you remove the canvas (or some other given set of widgets).
The Windows UPX versions of tclkit 8.4.9 are 923 kb with Tk and 461 kb without it. So Tk is taking up 462 kb. How much of that is the canvas?
I think we can make an educated guess. Looking at the .o files generated from compiling Tk, it looks like the canvas code accounts for a little less than 13% of the size of the compiled code. (By the same method, the text widget accounts for about 11.6% of total Tk code). Let's assume the parts of Tk that aren't compiled (Tcl scripts) won't significantly lower these percentages. Let's also assume that the size of each .o file is roughly proportional to the size of the corresponding compressed code in the tclkit binary. By this reasoning, the canvas takes up around 60 kb of your starpack. When you add any actual functionality to your starpack, 60 kb is going to represent an even smaller portion of your executable. For example, I wrapped the canonical example starkit, fractal.kit, as a starpack; it came to 945 kb. If our assumptions are correct, the canvas is about 6% of that starpack. This number is probably most relevant if you're distributing your program via the Internet--according to download estimate times here [L1 ], with a 9.6Kbps modem, the canvas is going to take 55 seconds to download (out of 14 minutes and 24 seconds to download your whole program).
55 seconds is the worst-case scenario. Most of your users probably have internet connections at least three times that fast. 60kbs of hard drive space is inconsequential. Being forced to distribute your program on a CD instead of a floppy is arguably inconsequential, as the cost is virtually identical. And we're assuming that we could actually get rid of the canvas. A lot of scripts that require Tk can't.
Have you actually encountered a situation where it would make a significant difference that your starpack be 60 kb smaller? If so, I'd like to hear about it.
PWQ 2005-01-17: it is unfortunate that Peter composed a number of disparate ideas in the one post. Most people cannot see the wood from the trees. I propose another track for his base idea in Tcl Core - Is All