Cloverfield

Announcement

The official announcement for the Cloverfield project can be found there: Cloverfield - Announcement


SourceForge project and mailing lists

The tcl9 SourceForge project holds the main mailing list for discussion around Cloverfield:


Goals

Along with the general goals listed in the above announcement, here are a few more specific technical goals:

Language

Improve the Tcl language syntax on several points to address common criticisms as well as implement missing features. For example :

  • 'Fix' the comments;
  • Auto-expand the first word of a command recursively. This will simplify currying and can give great results if namespaces become regular commands (spaces would thus become a valid namespace separator);
  • Improve variable access : allow e.g. $$var, and subscript access such as var[1] or var(a) along with interfaces (see Data structures below);
  • Allow variable references using the syntax $&var. This can fill the gap between current value/reference access semantics, e.g. lindex vs lappend, and solve many mutability vs. immutability problems;
  • Add a new quoting rule using parentheses, and drop the list command as we know it. For example, (a b $c) should be equivalent to [list a b $c]. The semantics of quotes and braces is preserved (minus changes needed for e.g. comments). Incidentally, this is the same syntax as LISP.
  • Extend the metasyntax pioneered by the argument expansion operator. This is the most controversial syntax change, but is unfortunately needed by the nature of some changes, like references or LISP-like delayed evaluation.
  • Define a syntax for specifying references. This can be used for example to serialize circular references, or keep references to variables that go out of scope; for example, {ref self}(a {ref self}{}) specifies a list whose second element points to its parent.

For more detailed information, see Cloverfield - Tridekalogue

Data structures

Use ropes as the internal string representation. Ropes will use B-trees of immutable strings. This will give fast concatenation, slicing, insertion, and should dramatically reduce the memory usage and data copying.

Use interfaces (à la Feather) instead of Tcl_Obj. This should eliminate most cases of shimmering.

Runtime

Implement the runtime on existing virtual machines. Primary target is LLVM. Secondary target could be Java, .NET, Parrot. LLVM is the most interesting solution since it gives access to JIT compiling, platform independence, native performances, and allow total control over the internal model (contrary to JVM). Moreover, other languages such as C or C++ are already supported, which means that we could get cross-platform Critcl-like features for free.

To achieve the goal of VM independence, internal data structures should be sufficiently high level.

Provide a VM-less, purely interpreted reference platform for embedded and small footprint solutions.

The runtime should provide advanced execution modes such as coroutines, stackless, lightweight threads, etc. See Radical reform of the execution engine for some ideas.


Related information

See Cloverfield - The Gathering for all other pages related to language improvement.


General Discussion

George Peter Staplin: Hi FB! I think you have some good ideas. I've read some of your code for TkGS. I'm hoping that you can get developers behind this project, and it doesn't become moribund. I am interested. Cloverfield is a good name, and I think it gets away from many old misconceptions about Tcl.

FB: Thank you! Yes, I hope Cloverfield will get more attention. TkGS scope was a bit too narrow to really get developers on the project. But I've learned a lot working on it, even if the project never completed due to lack of free time (building a family needs a lot of commitment). Anyway I think it is a bit obsolete now, since most of the work involved the creation a new graphic layer, and I feel that Cairo would do the job perfectly. I even had the project to port Tk to Cairo a few months ago, but given the success of Tile I came to the conclusion that Tk no longer needed significant improvements (at least for now), whereas Tcl was losing ground, so I moved on to what became Cloverfield.

About the name: I chose Cloverfield only a couple of days ago after realizing that the date for the announcement was 18-1-08. But prior to that I made a list of possible names, see: Cloverfield - Alternate names


Lars H: Wow, I think there isn't a single thing on that list that doesn't strike me as completely wrong for Tcl. Fascinating! Well, as long as you're just starting it up as a separate project I suppose I can happily ignore all of it…

FB: I don't want to sound too harsh, but after reading your contributions to all the discussions I found on this wiki regarding language improvements, you seem to be very conservative when it comes to anything that might impact the Dodekalogue. Frankly, I don't think that auto-expansion of leading words is totally un-Tclish, given that the same suggestion have been made by several reputable Tclers such as DKF of NEM, and that it currently requires unknown hacks to work (see Let unknown know and An implicit {*}$ prefix). And the use of parentheses has been debated in Tcl 9.0 WishList, see #67. I understand you were against this change, but you should also concede that this change would greatly improve readability.

Lars H: No, it would not noticably improve readability.

FB: Let's have a look at the following code:

# Tcl:
set l [list \
    [list 1 2 $somevar] \
    [list 3 4 [someproc]] \
]

# Cloverfield:
set l (
    (1 2 $somevar)
    (3 4 [someproc])
)

Don't you agree that the latter form is more readable? Moreover the former is more error-prone (you can easily forget a backslash), and this is a simplistic case. Removing the noise created by list and backslashes leaves only meaningful data. Add new-style comments and you get a more declarative way of defining data vs. the old procedural style. I also feel that the new style would be faster to parse and interpret, because the whole tree is now a single word, versus a collection of subwords in the former case (and this is a very important condition for the proposed reference declaration syntax).


LES Dude, if this entire business is about making Tcl more popular (and it looks like it is), a little more effort spent on Tk/Tile and a handful of useful and good-looking desktop applications would probably be a lot more effective than any sort of fragmentation. Fragmentation is usually a very good strategy to stifle an endeavor.

KJN Better desktop applications would clearly be an asset. The OLPC adopted GTK as its main graphics toolkit, even though Tk is a much better fit for a resource-constrained system such as OLPC; but it had no choice, because GTK has the applications (Abiword, Gnumeric, Firefox...), and Tk does not.

However, it is always worth thinking about what we would like Tcl to become. Most suggestions will be explored and eventually rejected (see pages on this Wiki for many examples); a few will be adopted, after lengthy debate. Fragmentation has not occurred in the past (except for a few brave souls who still use Tcl 7 or even 6 because of their smaller footprint) - there aren't enough of us to maintain two major codebases.


Specific subtopics

Data structures

Ropes

DKF: Experience with strings-implemented-as-trees in the past makes me point out that you'd better make sure that you take care to keep the trees balanced. Otherwise you'll have terrible performance. And using C arrays of characters seems to actually work quite well in practice...

FB: flat strings (ie C arrays of chars) give good performances in Tcl's current context. Object sharing, COW, the lack of references, and the impossibility to build circular structures, all these factors suit flat strings perfectly. However, when you introduce references and mutability, you cannot use COW semantics anymore, because changing an object's value implies invalidating the string rep of all objects that reference it. This can represent a huge performance hit, as data sharing are obviously more likely with languages that allow references. With rope structures, you only have to invalidate the substring that has changed, by rebuilding the tree (or one of its leaves in the simplest cases).

Moreover some platforms like Java only provide immutable strings, and allowing string mutability implies a huge performance hit. This can be a serious problem if we want to implement a Tcl interpreter over the JVM. In this case, a rope structure can be modified but the underlying data is stored in immutable string.

For more information on a real-world implementation, read the paper "Ropes: an Alternative to Strings" by Hans-J Boehm, Russ Atkinson and Michael Plass (especially the section "C Cords"): http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf

FB 20090112: I've created a rope package in C. A preview of the code is available here:

http://sourceforge.net/project/showfiles.php?group_id=216988


See also


Comments

FB: There is a slight misunderstanding about comments in Cloverfield. The new word comment {#} syntax is not at all meant to replace the existing syntax, but to complement it. Besides, "fixing" the comment only involves changing the way braces are matched and allowing them at the beginning of words, so that comments work in a less surprising way. For example:

proc foo {} {
    # The following works as expected: the close brace after true doesn't close the proc.
    #if {true} {
    if {false} {
        switch $v {
        # This is a comment.
            default {
            }
        }
    }
    
    {#}{This comments out a whole block of code
    someproc $somevar
    return [foo] # Will loop forever! (notice the lack of semicolon)
    }
    return
}

AMG: Regarding the #if comment: I see rule [5] causes Cloverfield to treat all braces on that line the same way Tcl treats braces preceded by backslashes. In a compiler I wrote that accepts a Tcl-like language, in order to find the end of a brace-quoted word, I count opening and closing braces; backslashed braces do not contribute to the brace count.

I see that you also updated the brace-counting rule to skip braces contained inside double quotes. You also skip braces quoted using the raw data word modifier, which is pretty much a new thing to this Wiki; I'm not prepared to discuss it yet!

FB: In short, {data} is roughly equivalent to XML's CDATA. The goal is to ease the inclusion of foreign data into Tcl to improve its status as a glue language. Mixed with Critcl and tcc-like features it would simplify its use as a Scripted compiler. Currently, when defining data in a foreign format, you have to properly quote the significant characters, which can be tedious and lead to Quoting hell.

In Cloverfield, the following code works fine, but in Tcl it has a mismatched brace:

proc foo {} {
    puts "}"
}

In Tcl, it's weird (but understandable) that the following code:

proc foo {} {
    puts -nonewline "digits = {"
    for {set x 0} {$x < 10} {incr x} {
        puts -nonewline " $x"
    }
    puts " }"
}

works great unless I delete either the first or the last puts. But in Cloverfield, it's fine. Am I understanding that right?

FB: Yes. I've tried to remove the localities of the brace and comment syntaxes so that they work in the least astonishing way.


Lars H: I'm a bit surprised, though, that you choose to label {expand}-style syntax as the most controversial part — I quite agree it would be a technical necessity after dropping Everything Is A String — but perhaps this just means that it is the thing that isn't directly borrowed from some other language.

FB: Well, I felt this was controversial wrt. the debate that preceded the adoption of this syntax change. Many Tclers were concerned that this would open a pandora box. Regarding my suggestions, I tried to limit them to cases where no alternative was viable, i.e. when the changes interfered with the way Tcl interprets the data. I think I managed to get fair compromises. However dropping EIAS is totally out of question, because it is at the heart of the Tcl way. On the contrary, if you re-read my suggestions carefully,Lars H: I can't say I care enough to bother. — you'll see that I took great care to enforce this principle to solve some hairy problems (e.g. the representation of circular structures and references).Lars H: IMNSHO references are the data structure equivalent of assembly language, and I'm glad Tcl frees me of the dangers inherent in these. —

Also, I'm a bit surprised about the way in which you propose to "fix" comments; I can't recall ever seeing any requests for comment words in commands, and it is already perfectly possible to put comments even inside words, using command substitution:

proc \# {args} {} ; # No-op command, for comments.
$w[\# {That's the window name}].toolbar[\# {This is a frame}].fire[\# {The actual button widget}] configure -repeatinterval[\# {Delays in auto mode}] 10[\# {A veritable machine gun, this button; 6000 shots per minute}]

(MS hopes you don't have a comment like '[exec rm -rf ~]')

FB: have you ever had to explain to a newcomer why the following code:

#if {true} {
if {false} {
    puts something 
}

works in an interactive shell but not from source or inside a proc? And why comments inside switch blocks sometimes don't work, or give unexpected results?Lars H: This is really a RTFM. Being a teacher, I would suggest the beginner to (under supervision) apply the dodekalogue to the scripts in question. It's not a difficult exercise, but quite instructive. — To be successful Tcl needs to follow the principle of least astonishment whenever possible.Lars H: On the contrary, most of Tcl's perceived defects are the result of literates in other languages not being astonished enough by the specific nature of Tcl that they learn the language properly, and instead struggle with analogies to other languages; in this case that braces work as in C et al.

To be more specific, the problem is not really to "fix" comments but to "fix" the way braces are matched. Hence the changes I proposed to brace matching. The comment word thing is needed for switch-like cases. But my proposal takes great care not to break the EIAS principle. OTOH, I don't find the way you choose to implement comments in words to be neither Tclish nor readable (let alone dangerous).Lars H: Not according to your understanding of Tcl, you mean? (My experience only goes back to Tcl 7, but I'm pretty sure this has been possible much earlier than that, so it's implicit quite deeply in the core of the language.) I'd be the first to admit that it is near-unreadable, but [\# {...}] isn't much worse than {#}{...} in that department, and it was rather this that was the point: how did this "fix" anything? —

(If I recall correctly, dkf has mentioned in discussions of the K combinator that $somevar[command-returning-empty-string] does not even cause shimmering in Tcl 8.5, hence no quality degradation of code.) Alternatively, one can do that with ordinary comments, provided one inserts the necessary newlines:

$w[# That's the window name
].toolbar[# This is a frame
].fire[# The actual button widget
] configure -repeatinterval[# Delays in auto mode
] 10[# A veritable machine gun, this button; 6000 shots per minute
]

But perhaps the point is rather to allow comments in lists written as strings? That I would often have found useful, but Cloverfield rather seems to turn away from this practice.

FB: on the contrary, that's exactly what Cloverfield proposes. The idea is to modify the way comments are parsed in braced strings. For example :

# Tcl:
set v {# The next brace close the string }
set v {" The next brace close the string }
set v {# This isn't a comment}  # and this is just excess arguments
set v {# This isn't a comment}; # but this is

# Cloverfield:
set v {# The next brace doesn't close the string }
       "neither does this one }"
       but this one does }
set v {"# This is not a comment"}
set v {\# Neither is this}   # but this is!

This introduces an incompatibility but at the benefit of a less surprising behavior. But EIAS is preserved, as the comment is not stripped off the string, but only alter the way braces are matched.

AMG: It is my understanding from Cloverfield - Tridekalogue that Cloverfield comments are words preceded by the {#} modifier, which isn't represented in the above example. Oh wait, I see what you're getting at. You (FB) also changed the way "line comments" work. Whereas Tcl only recognizes a comment when # appears as the first character of the first word of a command, Cloverfield recognizes a comment wherever # is the first character of any word. Watch out for uplevel! :^)

Lars H: Ah, I missed that; saw only the {#} style of comment. OK, I concede that's more a kind of "fixing" that I can imagine having been requested. Still doesn't lead to the behaviour shown in the first example above, but there could be other rules yet on commenting lurking in that tridekalogue. However, I don't care enough to look. — FB: Exactly. If you re-read the Tridekalogue, you'll see that "fixing" the comments only need slight amendments to the existing Dodekalogue: rule [6] becomes [5] and changes the way braces are matched, and rule [10] allows the hash character wherever the beginning of a word is expected. And your code above would still work as expected. —

FB: Yes. Changes to line comments are necessary to make them work in the least surprising way. To do so # must be allowed as the first char of a word, to allow for in-list comments (case in point: switch). Unfortunately uplevel and friends are collateral victims of this choice (but URL fragments are not), as they will now need proper quoting of the #, for example with backslash. Which many editors do automatically anyway (because their Tcl parser usually fail to recognize comments properly, QED), and which is pretty harmless compared to the potential gain. But this single change is sufficient to turn Cloverfield into a distinct language because it impacts one of the fundamental rules of the Dodekalogue.

The new line comment rules allow the following code:

dict create {
    FirstName John
    LastName Smith
    DateOfBirth 1/18/08 # In mm/dd/yy format
}

As for the so-called "word comment", I think it is a bit of a misnomer, but chosen for overall consistency of the meta-syntax (word modifiers). The goal is not to allow the commenting of individual words (which seem pretty useless), but rather to make comments be recognized as individual words which are subsequently ignored. The typical use case for word comments would actual be block commenting. See below for examples.

But for the sake of completeness, here's (what I think is) a Cloverfield {#} comment:

set v {#}{This is a comment} value

I guess the following is the Tcl 8.5 analog for "in-line" comments. It's quite similar to something proposed above, except it doesn't work by appending empty string to existing words. It uses {*} to produce zero words, which makes it very much like Cloverfield's {#}.

proc # {comment} {}
set v {*}[\# {This is a comment}] value

In an attempt to defang [exec rm -rf ~], I encourage the caller to brace the comment text. I do this by making [#] only accept one argument.

FB: exactly, the only difference being the substitution rules, as {#} would skip all the substitution phase.

I also added to your above comment example. I hope you don't mind.

FB: I've added what I think was a missing semicolon before the hash in the last Tcl example. Back to word comments, here is an example of block commenting. Note that braces must be properly balanced inside comment blocks, as the commented words must follow all formatting rules.

proc foo {} {
    {#}{This is a comment
    that spans multiple lines.
    Regular parsing rules apply, e.g. #} this brace doesn't close the block
    but this one does}

    # In the following code the call to bar was commented out and replaced by baz.
    # Typical use case: debugging sessions.
    return {#}[bar] [baz]
}

KJN In the example above, I can see what Cloverfield is trying to do with

set v {# The next brace doesn't close the string }
       "neither does this one }"
       but this one does }

and it is more appealing than Tcl if the braced quantity is code; but if the braced quantity is data, this is like allowing a comment inside a quoted string, which is a bit painful.

Tcl has an intrinsic problem, which is:

  1. We uses braces to delimit a data "word"
  2. Often that data is to be interpreted as code
  3. The flexibility of Tcl means that when a braced word is first parsed we cannot in general know whether the contents are intended to be executed as code

I don't think the comments/bracing problem is fixable: unless you throw away the power of Tcl, or add complexity, all you can do is swap one kind of pain for another. In this case, the suggested parsing rules for Cloverfield remove the pain from braced code, but they transfer it to braced data: sometimes the programmer will want to brace an arbitrary data string which is to be interpreted verbatim, without "comments".

FB: In this case you may want to use Cloverfield's raw data word modifier {data}, which is designed to allow the inclusion of arbitrary data (see also my comments higher on this page). That way you can have your cake (sensible brace parsing) and eat it (eliminate quoting hell) :

set s {data}SomeArbitraryTag the rest is ignored $}[{"\
#include <stdio.h>
int main() {
    const char string[] = "$\[} {data}SomeOtherTagWontMatchTheAboveOne";
    return 0;
}
this is also ignored $}[{"\ until the tag SomeArbitraryTag # here we're back to the interpreter.

Basically you can use the {data} modifier to enclose arbitrary data between a user-defined tag, in a similar way to MIME multi-part messages. The exact syntax is not final, but the concept is powerful. The goal of Cloverfield is obviously not to add, but remove complexity, moreover only a global proposal could address all these issues altogether; each proposal taken individually would only provide marginal gain, if any.

This concept is known as here document or heredoc in other languages such as Perl or Python.

KJN: Here's a suggestion that adds complexity, and probably has more negatives than positives: interpret braces with the Cloverfield rules; but also allow «guillemets/chevrons» to delimit words using the existing Tcl rules for braces.

In all cases, the Tcl idea of words is preserved; but programmers will be discouraged from thinking of braced text as a {quoted string}, and will still have a mechanism for quoting literal strings without substitution.

Naturally «chevrons» are not as pleasant as ASCII delimiters, but I think we have run out of suitable ASCII codes, except possibly `these'

FB: I'd rather suggest that chevrons (or whatever) remove all Tcl-sensitive syntax and take verbatim data. In this case their behavior is close to my proposed {data}. The problem is that there must be a way to properly escape existing chevrons in the included data (=> quoting hell), notwithstanding the fact that they are hard to input on regular keyboards. But conceptually both ideas are similar.

KJN: I don't like the word modifiers very much (see Word modifiers section below). How many rules could you remove from Cloverfield, and still fix the problems that you want to?

FB: regarding comments and literal strings, rewriting the brace matching rules and allowing {data}-like syntax is all that's needed. The other rules are orthogonal and target other issues. Anyway the two previous enhancements are incompatible changes and need a major version bump. But the merit of Cloverfield is to introduce the concept of word modifier to solve a range of problems where the typical solution would use ad-hoc techniques like opaque tokens (=> no more EIAS) or string manipulations (=> shimmering).

FB: Moved word-modifier-related discussion to Word modifiers section below

KJN: If rules for chevrons correspond to those for Tcl braces, then chevrons would enclose verbatim data, and unmatched chevrons would have to be escaped.

One of the places where my chevron suggestion unwinds is the treatment of lists: if a list is represented as a string, should it use chevrons or braces as delimiters?

I'm not sure there are any easy answers - the more I use Tcl, the more I appreciate how the different rules fit together (but at the cost of the comment/bracing hell). It is difficult to fix the comment/bracing hell without introducing a lot of extra complexity. But it is well worth trying!


KJN: Problems with Cloverfield - Tridekalogue re comment/bracing hell.

Rule 5 for finding the matching brace, specifically, the rule to skip any braces in comments. The rule is to follow rule 10 when identifying a comment. This seems to require that the braced text must be parsed into commands and words, in order to identify when '#' occurs at a place where the first character of a word is expected. The parsing of embedded braced text must be recursive, because embedded braces are required to match, and a comment string may occur at any depth in the nested braces. However the Tridekalogue does not require that braced text must be parsable into commands and words, only that braces match. A problem therefore arises when the recursive parsing encounters text that does not have Tcl/Cloverfield word form, e.g.

someCmd {"xx data not commands {
    # but these are commands }
pwd
set a foo
# where is matching bracket?  Is the next bracket surplus?  Is there a missing "
}

FB: This example is simple actually. The first double quote just after the opening brace suspends brace matching until the next double quote, i.e. the one just before the closing brace. The "comment" isn't one because it's between double quotes. So Tcl and Cloverfield give the same result. However the following behaves differently in Tcl and Cloverfield (the leading double quote was moved after the second brace):

someCmd {xx data not commands {"
     # but these are commands }
pwd
set a foo
# where is matching bracket?  Is the next bracket surplus?  Is there a missing "
}

Here Cloverfield needs a second closing brace because the second opening brace is outside the couple of double quotes. The same for the following (the double quote was moved on the last line):

someCmd {xx data not commands {
     # but these are commands }
pwd
set a foo
"# where is matching bracket?  Is the next bracket surplus?  Is there a missing "
}

But this time because the first closing brace that matches the second opening brace in Tcl is commented out in Cloverfield.

BTW, EIAS is preserved, meaning that whatever the commented chars may be, they are still part of the expression:

% set l {
    1 2
    # 3 4
    5 6
}
% llength $l; # Gives 7 in Tcl and 4 in Cloverfield, as the list parser recognizes the comments.
4
% lindex [split $l \n] 2; # Returns the "commented" line because it is part of the string rep.
    # 3 4

KJN: If a nested braced data string does not have code form, it may not be possible for the parser to decide how to match the braces. To avoid this problem, I think Cloverfield is forced to require that a braced string must be parsable into commands and words (i.e. into Cloverfield code, except that the commands need not be defined).

FB: That's a good point. However this does not necessarily imply that braced expressions be parsed into commands and words, only that beginning of words be recognized. For example, a double-quote character does not start a string when in the middle of a word, and thus does not suspend brace matching. However I'm conscious that rule [5] may sound a bit ambiguous and needs to be ironed out. A reference parser implementation should help sorting things out along with a good test suite. (KJN: that's a good idea - why not write the reference parser in Tcl, it should not be too difficult. FB: that was the plan ;-)) Anyway the goal of these new rules is to turn the language rules from parser-friendly into human-friendly: the rules should follow the principle of least astonishment in reading order.

KJN: Also, to eliminate comment hell, Cloverfield cannot escape parsing the contents of a quoted string in the same way, because code such as the following may occur:

if {$a eq $b} {
    proc foo {bar} "
        # a comment }
        puts \"Error foo in \$bar\"
    "
}

FB: The above doesn't work in Tcl but works as expected (if we trust the indentation) in Cloverfield: the if block contains the whole proc body, and the latter also behaves as expected (it puts the right message, skipping the commented line).

KJN: try instead

if {$a eq $b} {
    proc foo {bar} "
        # a "comment"
        puts \"Error foo in \$bar\"
    "
}

FB: The braced expression is valid in both Tcl and Cloverfield, because it follows both sets of rules:

  • In Tcl, the first brace matches the last one, because the inner braces around "bar" are well-balanced.
  • Ditto for Cloverfield, which also requires that the double quotes be well-balanced.

However, both Tcl and Cloverfield fail to evaluate the expression, because there are extra characters after the first close-quote (here, "comment"). Tcl and Cloverfield rules aren't that different, really ;-) They should only diverge in fringe cases or in very predictable ways (e.g. when using quotes and hashes in braced expressions).

KJN: This is my point - if Cloverfield aims to "fix" comments, then it should not fail with that code, or with

foreach name $list {
     proc $name {} "
         # Using "quotes" for the proc body sets the return value when the proc is defined
         return $name
     "
}

FB: Cloverfields aims to "fix" comments within braces (notice how I always put "fix" between quotes, are comments are not really broken but rather parser-friendly vs. human-friendly). However the comments are not recognized within quoted strings (and I don't know of any mainstream language that allows that). Rule [5] specifies that characters (not just braces) lose their significance between quotes. So the above code will fail anyway because of the extra chars after the quotes, however let's consider the following code:

foreach name $list {
    proc $name {} "
        # Using " quotes " for the proc body sets the return value when the proc is defined
        return $name
    "
}

Here the only differences are the extra spaces around the quotes string on the "commented line" (which isn't one). But now proc complains of extra arguments. Indeed, it is given 5 args instead of the expected 3:

  • 3rd one (the expected body) is "\n # Using " (notice the "comment")
  • 4th one is "quotes"
  • 5th one is "for the proc body sets the return value when the proc is defined\n return $name\n"

KJN: If we modify Cloverfield by adding rules for quoted strings, this seems to solve both these problems, but adds problems of its own:

  • a string such as "{" is now illegal, but may be rendered "\{"

FB: On the contrary. In Tcl the string "{" is legal at top level but not in braced code (e.g. in a proc), however it is always legal in Cloverfield. That's an area where the latter improves the overall consistency of the language.

KJN: No, the string is legal in Tcl at any level, but if it is inside braced code it may have an unexpected interpretation when that code is parsed. It is legal in Cloverfield too, but not in a Cloverfield that is modified as suggested above.

FB: I missed the "modified Cloverfield" part, sorry for the confusion. Anyway, let's consider the following code:

# Legal in both Tcl and Cloverfield.
set v "{"

# Legal in Cloverfield, but illegal in Tcl (requires an extra closing brace, and fails when evaluated anyway).
proc foo {} {
    set v "{"
}

KJN: bracing hell is that the code above is not illegal in Tcl, but incomplete: as you said, Tcl expects another close brace. If Cloverfield is modified as I have suggested to 'fix' comments inside quoted strings, then the code above becomes a problem - I think the modified-Cloverfield parser should interpret both the quotes as 'open quotes' and fail at the last '}' because it is a close-brace with no open-brace inside a quoted string. This is an improvement over Tcl, because at least the parser fails close to the problematic code, and not at the end of the file when it cannot find the 'missing' close-brace. Hunting for missing braces has sometimes driven me to drink. FB: But why and how exactly would you want to modify the Cloverfield rules? I fail to see the problem it tries to solve. Do you mean braces should also be balanced in quoted strings?

# Legal in both.
proc foo {} {
    set v "\{"
}

So the string "{" cannot be used verbatim in braced expressions with Tcl, it needs to be escaped as "\{". However both forms are valid in Cloverfield and give the same value whatever the context.

  • braced text must have matching quotes as well as matching braces, making the string {"} illegal, but expressible as "\""

FB: True. And the same for all words that start with #, such as uplevel arguments, which will need to be escaped or put between quotes. As a side note, this will make most editors happier, since they often require hashes to be escaped in order to avoid rendering the line remainder as comment, but of course this is NOT the rationale for this change, just a side effect.

KJN: the Cloverfield rules do not clearly state that {"} is illegal, but this follows from analysis of the rules. Your idea of writing a reference parser is a good one, because cases like this would become obvious either when defining the parser or when running the test cases.

FB: Indeed. The test case will be a nice combination of test-oriented development along with a tutorial for the new rules. Ultimately, all the test cases will be fine-tuned to give the least surprising result from a human POV, and could provide side-by-side comparison with Tcl as a porting guide. Anyway, I believe that existing code can be rewritten in such a way that it complies with both sets of rules in most cases (as with the above example).

  • if we need to define a non-code-like data string, we will have to use either quotes and escapes, or string processing, or the {data} word modifier (rule 11).

It seems to me that Cloverfield removes the pain for braced code, but increases the pain for braced data.

FB: When handling "problematic" data, the use of heredoc features is strongly encouraged in most languages. Problem is, Tcl doesn't have an heredoc feature, but Cloverfield does provide such a feature with {data}. If we make the (reasonable IMHO) assumption that most braced expressions in Tcl are not arbitrary data but code, then this problem disappears anyway. Moreover Cloverfield provides a third quoting rule [6] using parentheses that combines some of the effects of quotes and braces (and paves the way for the mythical list refactoring).


Word modifiers

KJN I'm not keen on the word modifiers - except {*}. {*} was accepted into 8.5 because, despite introducing syntax, it simplified a lot of code. I see now the purpose served by {data} (providing a way to define a literal string, since Cloverfield braces no longer do that - FB: Yes they still do, only brace matching rules change, and "comments" are not stripped off the string but remain part of it) , but the other word modifiers in Cloverfield introduce a lot of syntax. Also, in some cases (like {*}) the modifier determines what is done with the word after substitution, while in other cases it modifies the substitution itself. I think this adds too much complexity.

FB: The merit of Cloverfield is to introduce the concept of word modifier to solve a range of problems where the typical solution would use ad-hoc techniques like opaque tokens (=> no more EIAS) or string manipulations (=> shimmering). For example, see Jim References. The strings returned by [ref] are similar to Cloverfield's word modifier, with the difference that Jim's are opaque tokens whereas Cloverfield's are just metasyntax that preserve EIAS and prevent the loss of internal rep due to shimmering. Another example is null and TIP #185 [L1 ], which propose a very similar syntax. Word modifiers are just a consistent syntax to tag words with special meaning. Sometimes to modify the behavior of the parser like {data}, word substitution like {*} or {#}, or evaluation like {delay}. I'm afraid there aren't many ways of enhancing the language to solve these problems without introducing a lot of syntax or breaking the semantics, and word modifiers don't introduce new syntactic rules but rather capitalize on the existing {*}.

Concerning the {data} modifier, I've just come across an example that illustrates a strikingly similar syntax enhancement to Jim (near the end of the page):

set v {<<EOT}{
blah blah...
any old stuff: $$$ {{{ [[[ \\\ ]]] etc.
EOT}

This quoting style is known as heredoc, or here document, so it seems that my intuition was right (see also Quoting Heaven!). Given the properties of the Tcl parser, I'm afraid there isn't much alternative to a word-modifier-like syntax enhancement. The Jim patch follows the same path as Cloverfield by modifying the brace matching rules.


KJN: Is Cloverfield compatible with EIAS? - FB Yes, EIAS is the most important Tcl principle - If so, what is the string representation of the following values?

  1. {null}{any word} # Rule 11 states that this is "a special null value which is distinct from any other value, including the empty string"

FB: null is the only value that has no string representation, which means that it can't be puts, or escape from the reach of the interpreter. On this point it resembles Smalltalk's nil. The only way to create such a value is when evaluating an expression prefixed by the {null} modifier. For example, the string "1 2 {null}3 4", when fed to list, gives a list whose 3rd element is null. The only gray area in the Tridekalogue is how to handle side effects. Does {null} totally discard all that follows, or does the word get evaluated as usual, but its value replaced by null? I.e., does {null}[puts foo] outputs foo? Does {null}[puts [incr foo]]] increments variable foo? Word modifiers such as {null} need more complete specifications.

  1. {meta foo}bar # this value has data {bar} and metadata {foo}; are both expressed by its string representation?

FB: the metadata is not part of a word's string representation; puts {meta foo}bar prints bar. The only way to get a word's metadata is to prefix it with {meta}. Metadata are transient, arbitrary client data that can be associated with words, notably for debugging purpose. Library code should avoid using metadata and leave it to the application.

  1. {delay}$bar # what is the string representation of this value that forces substitution when the value is read?

FB: Think of {delay} as a "safer" future. The latter are objects created with no string rep, whose expressions are evaluated when the string rep is required. The problem is that string rep generation can occur in unpredictable ways (e.g when shimmering). Whereas {delay} expressions are only evaluated when the word is explicitly accessed. So the string rep is the original word until its value is queried. For example, the string "foo {delay}[bar]", when fed to list, gives a list where the second element is a delayed call to bar. It can be turned into a dict where the delayed expression is the value associated with foo, causing shimmering, but without evaluating the delayed expression. However a list or dict operation on this element will do. So here we have a string rep that may change dynamically within a larger string, and that's the main reason for using ropes as the primary internal string representation.

However, like with {null}, there are still gray areas in the spec. For example, does {delay} delay the evaluation of the topmost word, or all the subwords? E.g. does {delay}[foo [bar]]] also delays [bar] or is it substituted in the delayed expression? I'd favor the latter. Also, how does it behave with variables: does it delay the variable substitutions, or do variables have to contain an expression to be evaluated (à la subst)? And the same for other substitution rules. The rules for {delay} must be chosen to that they allow Scheme-style lazy lists, closures, and tail call optimizations.

set v 1
proc foo {} {
    set v 2
    return [list {delay}[set v] {delay}"$v" {delay}$v]
}
foo ; # Should it return {1 1 1}, {1 2 1} or {1 2 2}?
proc bar {} {
    set e {[set v]}
    return {delay}$e 
}
bar ; # Should it return "1" or "[set v]" ?

set w 2
set x v
proc baz {} {
    set x w
    return {delay}[set $x]
}
baz ; # Should it return 1 or 2?

See also


References

See Cloverfield - References