syntax highlighting

AMG: My preferred method of Tcl syntax highlighting is to only highlight the syntax as defined by the dodekalogue.

  • $Variable ${subst}$itut(ions)
  • [Brackets] (not including the enclosed text)
  • {Braces} (not including the enclosed text)
  • "Quotes" and "text between quotes"
  • \Backslash sequences
  • Line continuation\
  • {*} Expand operator
  • Semicolons;
  • # Comments
  • Numbers (even though they're not in the dodekalogue)
  • Illegal text following close quote or brace

Within a "quoted" word, text inside [brackets] should revert to the normal top-level syntax highlighting and should not be colored the same as the other text between quotes.

Another note about quotes and braces: they should not be highlighted if they don't appear as the first character of the word. For example, puts abc"hel lo" shouldn't have any special highlighting, since it does not actually have any quoting. It is in fact illegal, unless there's a channel called abc"hel :^) My point is that it would be bad for the syntax highlighter to confuse the programmer into thinking he's using quotes correctly.

puts "this text [concat "contains a" subst$itut::ion]" ;# comment
     122222222223       122222222221      455555555531 6777777777
1: quotes
2: quoted text
3: brackets
4: variable dollar sign, braces, or parentheses
5: variable name or array element name
6: semicolon
7: comment

I wonder if namespace qualifiers should be highlighted the same as variable names or if they should be shown a little differently. Try it and see, I guess.

Part of the reason I like this approach is that it doesn't get hung up worrying about whether any given word is a "keyword". Tcl doesn't have keywords anyhow. For example, is "set" a keyword? How should "set set set" be highlighted? Or, is "yieldm" a keyword? It's (currently) unsupported, but some programs (Wub, at least) use it anyway. Is "image" a keyword? It's Tk only, and maybe the program doesn't use Tk. And so on.

I suggest that this Wiki adopt this minimal approach to syntax highlighting, especially considering that it contains many preformatted code blocks that don't actually contain Tcl code. It's very distracting to have random words like "for" and "variable" be highlighted.


jdc - 2010-06-15 03:55:59

Do you have code doing highlighting this way?

AMG: Yes, but it's a Vim syntax highlight script, and it has a few bugs.

jdc Maybe add a screenshot?

AMG: Okay, good idea:

Syntax highlighting tcl.vim screenshot

This doesn't implement everything in my list, for example it doesn't color the braces and semicolon. It does color numbers and namespace delimiters, I see, but it doesn't separately color the variable name and the dollar sign.

Here's the syntax highlight script, if you want to experiment: [L1 ] [L2 ]


AMG: I think jdc's change to Textjam [L3 ] is an interesting case study in problems caused by incorrect syntax highlighting. He had to change the code to "keep emacs Happy", presumably because Emacs gets confused by the fact that the line had three double quotes in it. (GNU Emacs 23.2.1 doesn't seem to have this problem, by the way.) Actually he just added a comment, but that still counts as a change. ;^) However, whether or not Emacs gets confused, the current Wiki syntax highlighter definitely gets it wrong. It fails to recognize that the string is not quoted, and it continues the quoted region all the way to the "closing" double quote in the comment. This results in the words "keep emacs Happy" being highlighted as if they were ordinary arguments to a command, rather than comment text, greatly confusing the reader.

This weekend I hope to spend some time putting together a proper syntax highlighter for this Wiki. In my opinion, the one we have isn't doing us any favors. The syntax highlighting it implements reflects many of the incorrect assumptions about Tcl that I have observed in the wild. In particular, it assumes sh-like quoting, where quotes can start and stop and restart anywhere, and single quotes are allowed. Comments have similar problems. I think having correct syntax highlighting* would help to correct these misconceptions. (*) Having the correct highlighting on this Wiki would be nice, but really we need to get the correct highlighting into popular text editors.

AMG: I had a look at the implementation of the syntax highlighter we've got, and I feel rather bewildered. It's JavaScript, I should have expected this. :^) I know how to write a simple Tcl parser (we have several on this Wiki), but I'm not comfortable doing it (or anything) in JavaScript. I'm hesitant to use SHJS as a basis for my work, because I'd be obligated use the GPLv3 license for the result.