** Announcement ** The official announcement for the Cloverfield project can be found there: [Cloverfield - Announcement] ---- ** Goals ** Along with the general goals listed in the above announcement, here are a few more specific technical goals: *** Language *** Improve the Tcl language syntax on several points to address common criticisms as well as implement missing features. For example : * 'Fix' the comments; * Auto-expand the first word of a command recursively. This will simplify currying and can give great results if namespaces become regular commands (spaces would thus become a valid namespace separator); * Improve variable access : allow e.g. `$$var`, and subscript access such as `var[[1]]` or `var(a)` along with interfaces (see Data structures below); * Allow variable references using the syntax `$&var`. This can fill the gap between current value/reference access semantics, e.g. `lindex` vs `lappend`, and solve many mutability vs. immutability problems; * Add a new quoting rule using parentheses, and drop the `list` command as we know it. For example, `(a b $c)` should be equivalent to `[[list a b $c]]`. The semantics of quotes and braces is preserved (minus changes needed for e.g. comments). Incidentally, this is the same syntax as LISP. * Extend the metasyntax pioneered by the argument expansion operator. This is the most controversial syntax change, but is unfortunately needed by the nature of some changes, like references or LISP-like delayed evaluation. * Define a syntax for specifying references. This can be used for example to serialize circular references, or keep references to variables that go out of scope; for example, `{ref self}(a {ref self}{})` specifies a list whose second element points to its parent. For more detailed information, see [Cloverfield - Tridekalogue] *** Data structures *** Use [rope]s as the internal string representation. Ropes will use B-trees of immutable strings. This will give fast concatenation, slicing, insertion, and should dramatically reduce the memory usage and data copying. Use interfaces (à la [Feather]) instead of Tcl_Obj. This should eliminate most cases of shimmering. *** Runtime *** Implement the runtime on existing virtual machines. Primary target is [LLVM]. Secondary target could be Java, .NET, Parrot. LLVM is the most interesting solution since it gives access to JIT compiling, platform independence, native performances, and allow total control over the internal model (contrary to JVM). Moreover, other languages such as C or C++ are already supported, which means that we could get cross-platform [Critcl]-like features for free. To achieve the goal of VM independence, internal data structures should be sufficiently high level. Provide a VM-less, purely interpreted reference platform for embedded and small footprint solutions. The runtime should provide advanced execution modes such as coroutines, stackless, lightweight threads, etc. See [Radical reform of the execution engine] for some ideas. ---- ** Related information ** See [Cloverfield - The Gathering] for all other pages related to language improvement. ---- ** General Discussion ** [George Peter Staplin]: Hi FB! I think you have some good ideas. I've read some of your code for [TkGS]. I'm hoping that you can get developers behind this project, and it doesn't become moribund. I am interested. Cloverfield is a good name, and I think it gets away from many old misconceptions about Tcl. [FB]: Thank you! Yes, I hope Cloverfield will get more attention. TkGS scope was a bit too narrow to really get developers on the project. But I've learned a lot working on it, even if the project never completed due to lack of free time (building a family needs a lot of commitment). Anyway I think it is a bit obsolete now, since most of the work involved the creation a new graphic layer, and I feel that Cairo would do the job perfectly. I even had the project to port Tk to Cairo a few months ago, but given the success of [Tile] I came to the conclusion that Tk no longer needed significant improvements (at least for now), whereas Tcl was losing ground, so I moved on to what became Cloverfield. About the name: I chose Cloverfield only a couple of days ago after realizing that the date for the announcement was 18-1-08. But prior to that I made a list of possible names, see: [Cloverfield - Alternate names] ---- [Lars H]: Wow, I think there isn't a single thing on that list that doesn't strike me as completely wrong for Tcl. Fascinating! Well, as long as you're just starting it up as a separate project I suppose I can happily ignore all of it… ''[FB]: I don't want to sound too harsh, but after reading your contributions to all the discussions I found on this wiki regarding language improvements, you seem to be very conservative when it comes to anything that might impact the [Dodekalogue]. Frankly, I don't think that auto-expansion of leading words is totally un-Tclish, given that the same suggestion have been made by several reputable Tclers such as [DKF] of [NEM], and that it currently requires `unknown` hacks to work (see [Let unknown know] and [An implicit {*}$ prefix]). And the use of parentheses has been debated in [Tcl 9.0 WishList], see #67. I understand you were against this change, but you should also concede that this change would greatly improve readability.'' [Lars H]: No, it would not noticably improve readability. ''[FB]: Let's have a look at the following code:'' # Tcl: set l [list \ [list 1 2 $somevar] \ [list 3 4 [someproc]] \ ] # Cloverfield: set l ( (1 2 $somevar) (3 4 [someproc]) ) ''Don't you agree that the latter form is more readable? Moreover the former is more error-prone (you can easily forget a backslash), and this is a simplistic case. Removing the noise created by `list` and backslashes leaves only meaningful data. Add new-style comments and you get a more declarative way of defining data vs. the old procedural style. I also feel that the new style would be faster to parse and interpret, because the whole tree is now a single word, versus a collection of subwords in the former case (and this is a very important condition for the proposed reference declaration syntax).'' ---- [LES] Dude, if this entire business is about making Tcl more popular (and it looks like it is), a little more effort spent on Tk/Tile and a handful of useful and good-looking desktop applications would probably be a lot more effective than any sort of fragmentation. Fragmentation is usually a very good strategy to '''stifle''' an endeavor. [KJN] Better desktop applications would clearly be an asset. The [OLPC] adopted [GTK] as its main graphics toolkit, even though Tk is a much better fit for a resource-constrained system such as OLPC; but it had no choice, because GTK has the applications (Abiword, Gnumeric, Firefox...), and Tk does not. However, it is always worth thinking about what we would like Tcl to become. Most suggestions will be explored and eventually rejected (see pages on this Wiki for many examples); a few will be adopted, after lengthy debate. Fragmentation has not occurred in the past (except for a few brave souls who still use Tcl 7 or even 6 because of their smaller footprint) - there aren't enough of us to maintain two major codebases. ---- '''Ian@''': When casting about for a language that isn't object orientated, I happened upon Tcl. Apart from perl, which also seems to have become frozen in time, this seems to be the only one. But it's like stepping back in time. As far as I can see, there's one book that's modern (2004) and all the rest are from about 1997. I presume when they say they work with windows they mean 98? There doesn't seem to be an equivalent of CPAN, eggs, gems either, or if there is, it's not obvious. But that goes for the whole language, not obvious. It's like its pitched a C programmers and that's it. The only thing that will work is a compelling application that takes a share of mind. Other than that it will remain an overlooked language. It's problems are, no books, no tutorials, no obvious way to do things. Perhaps it's had its day, legacy only, isn't that the way of it? ---- ** Specific subtopics ** *** Data structures *** **** Ropes **** [DKF]: Experience with strings-implemented-as-trees in the past makes me point out that you'd better make sure that you take care to keep the trees balanced. Otherwise you'll have terrible performance. And using C arrays of characters seems to actually work quite well in practice... [FB]: flat strings (ie C arrays of chars) give good performances in Tcl's current context. Object sharing, COW, the lack of references, and the impossibility to build circular structures, all these factors suit flat strings perfectly. However, when you introduce references and mutability, you cannot use COW semantics anymore, because changing an object's value implies invalidating the string rep of all objects that reference it. This can represent a huge performance hit, as data sharing are obviously more likely with languages that allow references. With [rope] structures, you only have to invalidate the substring that has changed, by rebuilding the tree (or one of its leaves in the simplest cases). Moreover some platforms like Java only provide immutable strings, and allowing string mutability implies a huge performance hit. This can be a serious problem if we want to implement a Tcl interpreter over the JVM. In this case, a [rope] structure can be modified but the underlying data is stored in immutable string. You can read the following paper for more info on a real world rope implementation, especially section 'C Cords': http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf ---- **** See also **** * [Tcl9 and annotated strings]. ---- *** Comments *** [FB]: There is a slight misunderstanding about comments in Cloverfield. The new ''word comment'' `{#}` syntax is not at all meant to replace the existing syntax, but to complement it. Besides, "fixing" the comment only involves changing the way braces are matched and allowing them at the beginning of words, so that comments work in a less surprising way. For example: proc foo {} { # The following works as expected: the close brace after true doesn't close the proc. #if {true} { if {false} { switch $v { # This is a comment. default { } } } {#}{This comments out a whole block of code someproc $somevar return [foo] # Will loop forever! (notice the lack of semicolon) } return } [AMG]: Regarding the #if comment: I see rule [[5]] causes Cloverfield to treat all braces on that line the same way Tcl treats braces preceded by backslashes. In a compiler I wrote that accepts a Tcl-like language, in order to find the end of a brace-quoted word, I count opening and closing braces; backslashed braces do not contribute to the brace count. I see that you also updated the brace-counting rule to skip braces contained inside double quotes. You also skip braces quoted using the raw data word modifier, which is pretty much a new thing to this Wiki; I'm not prepared to discuss it yet! ''[FB]: In short, `{data}` is roughly equivalent to XML's CDATA. The goal is to ease the inclusion of foreign data into Tcl to improve its status as a glue language. Mixed with [Critcl] and [tcc]-like features it would simplify its use as a [Scripted compiler]. Currently, when defining data in a foreign format, you have to properly quote the significant characters, which can be tedious and lead to [Quoting hell].'' In Cloverfield, the following code works fine, but in Tcl it has a mismatched brace: proc foo {} { puts "}" } In Tcl, it's weird (but understandable) that the following code: proc foo {} { puts -nonewline "digits = {" for {set x 0} {$x < 10} {incr x} { puts -nonewline " $x" } puts " }" } works great unless I delete either the first or the last puts. But in Cloverfield, it's fine. Am I understanding that right? ''[FB]: Yes. I've tried to remove the localities of the brace and comment syntaxes so that they work in the least astonishing way.'' ---- [Lars H]: I'm a bit surprised, though, that you choose to label [{expand}]-style syntax as the most controversial part — I quite agree it would be a technical necessity after dropping [Everything Is A String] — but perhaps this just means that it is the thing that isn't directly borrowed from some other language. ''[FB]: Well, I felt this was controversial wrt. the debate that preceded the adoption of this syntax change. Many Tclers were concerned that this would open a pandora box. Regarding my suggestions, I tried to limit them to cases where no alternative was viable, i.e. when the changes interfered with the way Tcl interprets the data. I think I managed to get fair compromises. However dropping [EIAS] is totally out of question, because it is at the heart of the Tcl way. On the contrary, if you re-read my suggestions carefully,'' — [Lars H]: I can't say I care enough to bother. — ''you'll see that I took great care to enforce this principle to solve some hairy problems (e.g. the representation of circular structures and references).'' — [Lars H]: IMNSHO references are the data structure equivalent of assembly language, and I'm glad Tcl frees me of the dangers inherent in these. — Also, I'm a bit surprised about the way in which you propose to "fix" comments; I can't recall ever seeing any requests for ''comment words'' in commands, and it is already perfectly possible to put comments even ''inside'' words, using command substitution: proc \# {args} {} ; # No-op command, for comments. $w[\# {That's the window name}].toolbar[\# {This is a frame}].fire[\# {The actual button widget}] configure -repeatinterval[\# {Delays in auto mode}] 10[\# {A veritable machine gun, this button; 6000 shots per minute}] ''([MS] hopes you don't have a comment like '[[exec rm -rf ~]]')'' ''[FB]: have you ever had to explain to a newcomer why the following code:'' #if {true} { if {false} { puts something } ''works in an interactive shell but not from source or inside a proc? And why comments inside `switch` blocks sometimes don't work, or give unexpected results?'' — [Lars H]: This is really a [RTFM]. Being a teacher, I would suggest the beginner to (under supervision) apply the [dodekalogue] to the scripts in question. It's not a difficult exercise, but quite instructive. — ''To be successful Tcl needs to follow the principle of least astonishment whenever possible.'' — [Lars H]: On the contrary, most of Tcl's perceived defects are the result of literates in other languages not being astonished enough by the specific nature of Tcl that they learn the language properly, and instead struggle with analogies to other languages; in this case that braces work as in [C] ''et al.'' — ''To be more specific, the problem is not really to "fix" comments but to "fix" the way braces are matched. Hence the changes I proposed to brace matching. The ''comment word'' thing is needed for `switch`-like cases. But my proposal takes great care not to break the [EIAS] principle. OTOH, I don't find the way you choose to implement comments in words to be neither Tclish nor readable (let alone dangerous).'' — [Lars H]: Not according to your understanding of Tcl, you mean? (My experience only goes back to Tcl 7, but I'm pretty sure this has been possible much earlier than that, so it's implicit quite deeply in the core of the language.) I'd be the first to admit that it is '''near-unreadable''', but `[[\# {...}]]` isn't much worse than `{#}{...}` in that department, and it was rather this that was the point: how did this "fix" anything? — (If I recall correctly, [dkf] has mentioned in discussions of the [K] combinator that $somevar[[command-returning-empty-string]] does not even cause shimmering in Tcl 8.5, hence no quality degradation of code.) Alternatively, one can do that with ordinary comments, provided one inserts the necessary newlines: $w[# That's the window name ].toolbar[# This is a frame ].fire[# The actual button widget ] configure -repeatinterval[# Delays in auto mode ] 10[# A veritable machine gun, this button; 6000 shots per minute ] But perhaps the point is rather to allow comments in lists written as strings? ''That'' I would often have found useful, but Cloverfield rather seems to turn away from this practice. ''[FB]: on the contrary, that's exactly what Cloverfield proposes. The idea is to modify the way comments are parsed in braced strings. For example :'' # Tcl: set v {# The next brace close the string } set v {" The next brace close the string } set v {# This isn't a comment} # and this is just excess arguments set v {# This isn't a comment}; # but this is # Cloverfield: set v {# The next brace doesn't close the string } "neither does this one }" but this one does } set v {"# This is not a comment"} set v {\# Neither is this} # but this is! ''This introduces an incompatibility but at the benefit of a less surprising behavior. But EIAS is preserved, as the comment is not stripped off the string, but only alter the way braces are matched.'' [AMG]: It is my understanding from [Cloverfield - Tridekalogue] that Cloverfield comments are words preceded by the {#} modifier, which isn't represented in the above example. Oh wait, I see what you're getting at. You ([FB]) also changed the way "line comments" work. Whereas Tcl only recognizes a comment when # appears as the first character of the first word of a command, Cloverfield recognizes a comment wherever # is the first character of any word. Watch out for [uplevel]! :^) ''— [Lars H]: Ah, I missed that; saw only the {#} style of comment. OK, I concede that's more a kind of "fixing" that I can imagine having been requested. Still doesn't lead to the behaviour shown in the first example above, but there could be other rules yet on commenting lurking in that tridekalogue. However, I don't care enough to look. —'' [FB]: Exactly. If you re-read the Tridekalogue, you'll see that "fixing" the comments only need slight amendments to the existing Dodekalogue: rule [[6]] becomes [[5]] and changes the way braces are matched, and rule [[10]] allows the hash character wherever the beginning of a word is expected. And your code above would still work as expected. — ''[FB]: Yes. Changes to line comments are necessary to make them work in the least surprising way. To do so # must be allowed as the first char of a word, to allow for in-list comments (case in point: `switch`). Unfortunately `uplevel` and friends are collateral victims of this choice (but URL fragments are not), as they will now need proper quoting of the #, for example with backslash. Which many editors do automatically anyway (because their Tcl parser usually fail to recognize comments properly, QED), and which is pretty harmless compared to the potential gain. But this single change is sufficient to turn Cloverfield into a distinct language because it impacts one of the fundamental rules of the [Dodekalogue].'' ''The new line comment rules allow the following code:'' dict create { FirstName John LastName Smith DateOfBirth 1/18/08 # In mm/dd/yy format } ''As for the so-called "word comment", I think it is a bit of a misnomer, but chosen for overall consistency of the meta-syntax (''word modifiers''). The goal is not to allow the commenting of individual words (which seem pretty useless), but rather to make comments be recognized as individual words which are subsequently ignored. The typical use case for word comments would actual be block commenting. See below for examples.'' But for the sake of completeness, here's (what I think is) a Cloverfield {#} comment: set v {#}{This is a comment} value I guess the following is the Tcl 8.5 analog for "in-line" comments. It's quite similar to something proposed above, except it doesn't work by appending empty string to existing words. It uses [{*}] to produce zero words, which makes it very much like Cloverfield's {#}. proc # {comment} {} set v {*}[\# {This is a comment}] value In an attempt to defang [[exec rm -rf ~]], I encourage the caller to brace the comment text. I do this by making [[#]] only accept one argument. ''[FB]: exactly, the only difference being the substitution rules, as `{#}` would skip all the substitution phase.'' I also added to your above comment example. I hope you don't mind. ''[FB]: I've added what I think was a missing semicolon before the hash in the last Tcl example. Back to word comments, here is an example of block commenting. Note that braces must be properly balanced inside comment blocks, as the commented words must follow all formatting rules.'' proc foo {} { {#}{This is a comment that spans multiple lines. Regular parsing rules apply, e.g. #} this brace doesn't close the block but this one does} # In the following code the call to bar was commented out and replaced by baz. # Typical use case: debugging sessions. return {#}[bar] [baz] } ---- [KJN] In the example above, I can see what Cloverfield is trying to do with === set v {# The next brace doesn't close the string } "neither does this one }" but this one does } === and it is more appealing than Tcl if the braced quantity is code; but if the braced quantity is data, this is like allowing a comment inside a quoted string, which is a bit painful. Tcl has an intrinsic problem, which is: 1. We uses braces to delimit a data "word" 2. Often that data is to be interpreted as code 3. The flexibility of Tcl means that when a braced word is first parsed we cannot in general know whether the contents are intended to be executed as code I don't think the comments/bracing problem is fixable: unless you throw away the power of Tcl, or add complexity, all you can do is swap one kind of pain for another. In this case, the suggested parsing rules for Cloverfield remove the pain from braced code, but they transfer it to braced data: sometimes the programmer will want to brace an arbitrary data string which is to be interpreted verbatim, without "comments". ''[FB]: In this case you may want to use Cloverfield's raw data word modifier `{data}`, which is designed to allow the inclusion of arbitrary data (see also my comments higher on this page). That way you can have your cake (sensible brace parsing) and eat it (eliminate quoting hell) :'' set s {data}SomeArbitraryTag the rest is ignored $}[{"\ #include int main() { const char string[] = "$\[} {data}SomeOtherTagWontMatchTheAboveOne"; return 0; } this is also ignored $}[{"\ until the tag SomeArbitraryTag # here we're back to the interpreter. ''Basically you can use the `{data}` modifier to enclose arbitrary data between a user-defined tag, in a similar way to MIME multi-part messages. The exact syntax is not final, but the concept is powerful. The goal of Cloverfield is obviously not to add, but remove complexity, moreover only a global proposal could address all these issues altogether; each proposal taken individually would only provide marginal gain, if any.'' ''This concept is known as [here document] or [heredoc] in other languages such as [Perl] or [Python].'' [KJN]: Here's a suggestion that adds complexity, and probably has more negatives than positives: interpret braces with the Cloverfield rules; but also allow «guillemets/chevrons» to delimit words using the existing Tcl rules for braces. In all cases, the Tcl idea of words is preserved; but programmers will be discouraged from thinking of braced text as a {quoted string}, and will still have a mechanism for quoting literal strings without substitution. Naturally «chevrons» are not as pleasant as ASCII delimiters, but I think we have run out of suitable ASCII codes, except possibly `these' ''[FB]: I'd rather suggest that chevrons (or whatever) remove all Tcl-sensitive syntax and take verbatim data. In this case their behavior is close to my proposed `{data}`. The problem is that there must be a way to properly escape existing chevrons in the included data (=> quoting hell), notwithstanding the fact that they are hard to input on regular keyboards. But conceptually both ideas are similar.'' [KJN]: I don't like the word modifiers very much (''see Word modifiers section below''). How many rules could you ''remove'' from Cloverfield, and still fix the problems that you want to? ''[FB]: regarding comments and literal strings, rewriting the brace matching rules and allowing `{data}`-like syntax is all that's needed. The other rules are orthogonal and target other issues. Anyway the two previous enhancements are incompatible changes and need a major version bump. But the merit of Cloverfield is to introduce the concept of word modifier to solve a range of problems where the typical solution would use ad-hoc techniques like opaque tokens (=> no more EIAS) or string manipulations (=> shimmering).'' ''[FB]: Moved word-modifier-related discussion to Word modifiers section below'' [KJN]: If rules for chevrons correspond to those for Tcl braces, then chevrons would enclose verbatim data, and unmatched chevrons would have to be escaped. One of the places where my chevron suggestion unwinds is the treatment of lists: if a list is represented as a string, should it use chevrons or braces as delimiters? I'm not sure there are any easy answers - the more I use Tcl, the more I appreciate how the different rules fit together (but at the cost of the comment/bracing hell). It is difficult to fix the comment/bracing hell without introducing a lot of extra complexity. But it is well worth trying! ---- [KJN]: Problems with [Cloverfield - Tridekalogue] re comment/bracing hell. Rule 5 for finding the matching brace, specifically, the rule to skip any braces in comments. The rule is to follow rule 10 when identifying a comment. This seems to require that the braced text must be parsed into commands and words, in order to identify when '#' occurs at a place where the first character of a word is expected. The parsing of embedded braced text must be recursive, because embedded braces are required to match, and a comment string may occur at any depth in the nested braces. However the Tridekalogue does not require that braced text must be parsable into commands and words, only that braces match. A problem therefore arises when the recursive parsing encounters text that does not have Tcl/Cloverfield word form, e.g. someCmd {"xx data not commands { # but these are commands } pwd set a foo # where is matching bracket? Is the next bracket surplus? Is there a missing " } If a nested braced data string does not have code form, it may not be possible for the parser to decide how to match the braces. To avoid this problem, I think Cloverfield is forced to require that a braced string must be parsable into commands and words (i.e. into Cloverfield code, except that the commands need not be defined). Also, to eliminate comment hell, Cloverfield cannot escape parsing the contents of a quoted string in the same way, because code such as the following may occur: if {$a eq $b} { proc foo {bar} " # a comment } puts \"Error foo in \$bar\" " } If we modify Cloverfield by adding rules for quoted strings, this seems to solve both these problems, but adds problems of its own: * a string such as "{" is now illegal, but may be rendered "\{" * braced text must have matching quotes as well as matching braces, making the string {"} illegal, but expressible as "\"" * if we need to define a non-code-like data string, we will have to use either quotes and escapes, or string processing, or the {data} word modifier (rule 11). It seems to me that Cloverfield removes the pain for braced code, but increases the pain for braced data. ---- *** Word modifiers *** [KJN] I'm not keen on the word modifiers - except {*}. {*} was accepted into 8.5 because, despite introducing syntax, it simplified a lot of code. I see now the purpose served by {data} (providing a way to define a literal string, since Cloverfield braces no longer do that - ''[FB]: Yes they still do, only brace matching rules change, and "comments" are not stripped off the string but remain part of it'') , but the other word modifiers in Cloverfield introduce a lot of syntax. Also, in some cases (like {*}) the modifier determines what is done with the word after substitution, while in other cases it modifies the substitution itself. I think this adds too much complexity. [FB]: The merit of [Cloverfield] is to introduce the concept of word modifier to solve a range of problems where the typical solution would use ad-hoc techniques like opaque tokens (=> no more EIAS) or string manipulations (=> shimmering). For example, see [Jim References]. The strings returned by `[[ref]]` are similar to Cloverfield's word modifier, with the difference that Jim's are opaque tokens whereas Cloverfield's are just metasyntax that preserve [EIAS] and prevent the loss of internal rep due to shimmering. Another example is `[null]` and TIP #185 [http://tip.tcl.tk/185], which propose a very similar syntax. Word modifiers are just a consistent syntax to tag words with special meaning. Sometimes to modify the behavior of the parser like `{data`}, word substitution like `{*}` or `{#}`, or evaluation like `{delay}`. I'm afraid there aren't many ways of enhancing the language to solve these problems without introducing a lot of syntax or breaking the semantics, and word modifiers don't introduce new syntactic rules but rather capitalize on the existing `{*}`. Concerning the `{data}` modifier, I've just come across an example that illustrates a strikingly similar syntax enhancement to [Jim] (near the end of the page): set v {<