How much of Tcl is Fluff?

Interesting read up at http://developers.slashdot.org/story/15/02/11/1744246/your-java-code-is-mostly-fluff-new-research-finds - this study comes to the interesting conclusion that the average Java program (and, by implication, the average C++, C, or whatever variant you might think of) less than 5% of the code actually implements the job the program is meant to do. The rest, say they, is "fluff" - boilerplate - set up, strip down, marshaling, demarshaling, and so on. This strikes me as way lopsided, but in looking back at my C++ code I think I see why they come to this conclusion - a lot of code is spent getting ready to do things, or cleaning up after doing things, but not very much is actually doing the real task. A lot of this is GUI code to get information in and out of a program, and the toolkits in such languages are very verbose, usually requiring as much as one whole line of code for every parameter. Tcl is far more concise and to-the-point, the UI in particular usually takes far less code and work. Does anyone have a handle on this evaluation might come up on Tcl? My initial reaction is 50% or more to do the work, but there is a fair amount of overhead in the form of [expr {...}] and the like that put gingerbread around code that isn't needed in Java or other languages - but then, expr's aren't a heavy part of Tcl code jobs, either. Comments?

2015-02-15: The paper in question isn't actually about "fluff" in the sense used here (the ITWorld article about the paper does talk about "fluff", but the article writer seems to have misunderstood the paper's content). The "5% of the code" isn't the part of the code directly implementing the solution, but the part of the code that makes a method unique. The research deals with methods designed to find "minimal distinguishing subset" ("MINSET") of "words" (more like tokens, really) in Java methods that are sufficient to determine the "meaning" of the code within the method. The writers use a somewhat confusing terminology where the words that make up the MINSET are referred to as "wheat", while the rest of the words are "chaff". This usage certainly seems to suggest "core action" versus "fluff", but this interpretation is explicitly rejected by the authors. While the authors also deny that the research is about practical code searches, the theoretical foundations of code searching and indexing is their main interest.

Intuitively, I'd say this paper doesn't offer much of interest to the Tcl community since Tcl source code is more dynamic and less regular than e.g. Java code. To my knowledge, Tcl projects also have far fewer lines of code than typical C++/Java projects, so the need for effective code searches should be smaller.


arjen - 2015-02-12 08:14:41

I do a fair amount of calculations in Tcl (data manipulation, not-too-voluminous numerical simulations) and I would say that the fraction of fluff is rather limited - depending of course on what exactly you consider to be fluff. Boilerplate-ish stuff includes: getting arguments from the command-line, setting up a window, importing packages, initialising variables. One could argue that [expr] is fluff as is [set] - x = y is of course more concise than set x $y, but think of all the semicolons that are required in C-like languages!


dkf - 2015-02-15 16:59:00

I do a fair amount of programming in C and Java, and Tcl certainly has less fluff than them. In particular, in C and Java you do a lot of marshalling between types (such as with Tcl_GetIntFromObj) which we mostly avoid in Tcl because we tend to just pass the word tokens around directly. This avoidance of impedance mismatches (because we're not type-strict) makes high-level programming a lot easier. It also tends to drive the type-strict people absolutely wild, as it indicates that their way of doing things might not be the best…

AMG: I have had extensive discussion with somebody who admires Tcl (to the point of having designed and implemented a quasi-Tcl language) but cannot abide EIAS. (I say he admires, but he doesn't truly understand.) I explained EIAS to him dozens of times from every approach I could think of, not with the intention of "converting" him, rather to address his questions and criticisms. He conceded all my points but still remained uneasy to the point of refusing to adopt EIAS simply because it feels wrong to him, even though he cannot explain why.


Larry Smith = 2018-09-28

One of the things that drew me to Tcl, and away (FAR away) from Java is this very issue. Tcl has very, very little "fluff" in this sense of the word, whereas Java is ENDLESS fluff, the boilerplate was enough to drive me nucking futz! It is literally more verbose than COBOL - which is, of course, the first programming language ever invented where the compiled assembler was actually more concise than the source code was. And Java was worse. Partly, this is a legacy of C, and presents similar problems with C++, where you spend most of your time impedance matching - the libraries tend to be haphazard in design, and parameters from/to one to/from another often don't match the way it would be convenient for them to. Also, since Java does not have an extensible syntax, the places where you might try to factor out some particularly nasty bit of ugliness result in smatters of "paste" where the original code was, where you have to set up or marshal for your cleanup - so the problem is not solvable in principle because it seems to propagate wherever you try to clean it up. Short of re-designing the whole thing from scratch (and I often want to) I see no solution.

Tcl is short, to the point, and dense in terms of code and logic. While ensembles do bloat the code a bit, they at least make it more readable, and they are much easier to sink under some factoring that hides the cruft while making the higher-level code more readable. I do feel this is somewhat due to EIAS. AMG's friend might consider this: EIAS takes note that everything that goes into a program, and everything that comes out, came from or is going to, a human being. And a string is the "universal" data type because it is the language of talking to humans. Yes, you can use graphical representations - but even those have to be generated from data, and stored and moved around as data sets. Tcl becomes more portable and more forthright with its duck typing than even highly polymorphic languages - it is easier to read than even, say Smalltalk. Almost anything I have seen written in Tcl results in a program easier to read/follow/modify than any other language.