Version 11 of How much of Tcl is Fluff?

Updated 2015-02-13 09:48:57 by plwork

Interesting read up at http://developers.slashdot.org/story/15/02/11/1744246/your-java-code-is-mostly-fluff-new-research-finds - this study comes to the interesting conclusion that the average Java program (and, by implication, the average C++, C, or whatever variant you might think of) less than 5% of the code actually implements the job the program is meant to do. The rest, say they, is "fluff" - boilerplate - set up, strip down, marshaling, demarshaling, and so on. This strikes me as way lopsided, but in looking back at my C++ code I think I see why they come to this conclusion - a lot of code is spent getting ready to do things, or cleaning up after doing things, but not very much is actually doing the real task. A lot of this is GUI code to get information in and out of a program, and the toolkits in such languages are very verbose, usually requiring as much as one whole line of code for every parameter. Tcl is far more concise and to-the-point, the UI in particular usually takes far less code and work. Does anyone have a handle on this evaluation might come up on Tcl? My initial reaction is 50% or more to do the work, but there is a fair amount of overhead in the form of [expr {...}] and the like that put gingerbread around code that isn't needed in Java or other languages - but then, expr's aren't a heavy part of Tcl code jobs, either. Comments?

PL 2012-02-12: one of the goals of Tcl was to hide fluff inside the C implementation level. Programs written in lower-level languages will necessarily contain a lot of fluff; higher-level / scripting languages allow the programs written in them to seem more fluff-free since the fluff is within module code or inside the language runtime.

(PL) 2012-02-13: I may have to retracts my comments after a first reading of the actual paper, as opposed to the article about the paper. The paper doesn't seem to be about what we call "fluff" here, but rather about uniqueness of code fragments and the ability to measure it by identifying a minimal set of tokens. I'll read it a couple of times and try to comment on it.


arjen - 2015-02-12 08:14:41

I do a fair amount of calculations in Tcl (data manipulation, not-too-voluminous numerical simulations) and I would say that the fraction of fluff is rather limited - depending of course on what exactly you consider to be fluff. Boilerplate-ish stuff includes: getting arguments from the command-line, setting up a window, importing packages, initialising variables. One could argue that [expr] is fluff as is [set] - x = y is of course more concise than set x $y, but think of all the semicolons that are required in C-like languages!

(PL): AFAICT, fluff is code that doesn't directly contribute to the core "action" inside a Java metod, but still can't be edited out without breaking the method. It's not about excessive syntax as such: variable assignment is variable assignment regardless of whether your language makes you type x = y or set x $y (but if you have to declare the variable first, that's fluff).


Aside: reinventing very high-level programming languages

(PL): The article mentions that the researchers ponder applications for this research including "keyword-based programming", which is basically to say that they are reinventing languages like AWK some forty years later...

arjen - 2015-02-12 12:40:44

I seem to have heard of such reinventions more often, usually regarding wheels ... (Mind you: a good idea deserves being reinvented from time to time)

(PL): Reinventing isn't bad in itself, but is likely to be ineffective and misguided if you don't have any idea of prior art. The researchers may of course be more knowledgeable than this article makes them out to be.

Larry Smith I noticed that, too. Oddly enough, I think Tcl may be much more in line with "keyword-based programming" than not. Tcl has a lot of "keywords" but none of them are really "reserved" in the sense that if, else, while and so on are in C languages. In Tcl, they occur in specific patterns which makes recalling them and using them much easier on the brain cells - leaving more to think about the actual programming problem. And it reduces the overall amount of "cruft".

(PL): Yes. Again, the central idea for Tcl is to have a language (a "very high-level language") that lets the programmer concentrate on the "core action" of the code, while the "fluff" is hidden within the C implementation level. John Ousterhout talks about this kind of "core-to-fluff" ratio in some paper, I believe (well, sort of: in "Scripting: Higher-Level Programming for the 21st Century" he uses a "code ratio" comparison that simply compares line counts in C/C++/Java code to line counts in corresponding Tcl code; while what this actually measures is code bloat, the terseness of C/C++/Java ought to mean that the bloat is due to the presence of fluff). As we all know, not even Tcl can get rid of all the fluff, but a Tcl program does come close to the programming ideal they are talking about here. A language like AWK comes even closer to being fluff-free, but the trade-off is less generality.