Version 30 of hat0

Updated 2010-03-28 05:03:19 by AMG

hat0 writes the brick engine.

hat0 has also written a simple ssh launcher and a pure-tcl BMP reader/writer.

Here is a link to a page about a new packaging system called Ultimate Package Blast-o-rama.

And here is a link to a little package which can be used to preload DLLs on a Windows-based machine: dllfix


AMG: Happy belated birthday!

hat0: thank you very much!!


It's March of 2010. I've been asking around today, what sort of vision people have for Tcl, five years out.

Alex Ferrieux says, "freeze as perfect, like /bin/sh"

Colin says, "if I have any vision at all, it's about making Tcl smaller by making its innards more introspective and able to be implemented in Tcl. If you can generate bytecode from Tcl, then you can write all (or some, or many) of the compilation of Tcl commands to Bytecode *in* Tcl." Benefit is an advantage in speed of writing/improving the standard library. Drawback is a period of lack of stability in the core.

SEH: Are you soliciting further comment?

hat0 Heck yeah! Please, speak your mind, I'd love to hear your thoughts/ideas.


A five-year plan for Tcl

These are my thoughts about things that could improve tcl, or various different directions tcl could go, so that it is not just another language in 2015, but a language of growing importance/relevance.

Note to anyone who sees this page: this isn't supposed to be another cloverfield or "tcl 9"-styled effort. I'm not interested in telling anyone what to do, or trying to force my ideas onto anyone. This is my own personal brainstorm, and all are welcome to add to it.

Syntax and syntactic sugar

- some syntax- or command-level concurrency (e.g. a foreach that parallelizes the body?)

AMG: Parallel foreach? I'd leave foreach and friends alone, instead introducing a pair of commands to (1) execute a script in a separate thread and (2) execute a script and collect information from or about the threads spawned inside that script. Further levels of nesting, inside the parent thread or in the newly spawned threads, can make this confusing. So this is a truly half-baked idea. accumulate and collect is a possible source of inspiration.

hat0: #2 seems like a good option if it could be made to parallelize the usual constructs very simply, maybe even as simply as prepending the command name to your ordinary command, like so: foreach x list { bleee } -> runmany foreach x list { bleee } ... As for further levels of nesting getting confusing, sure, but just don't do that. :)

SEH: do coroutines provide part of what you're looking for?

AMG: Coroutines share an interpreter but do not run simultaneously. Threads run simultaneously but do not share an interpreter. hat0 is looking for simultaneous execution, so I don't know how coroutines would help. Coroutines offer interleaved execution; that's probably the best way to describe it.

hat0: That is correct. Also, I'm interested in a very simple way to access and manage the simultaneous execution (such as a single command prepended to the looping construct). That is to say, if the concurrency is achieved through a mechanism that requires some auxiliary code for 90% of use cases, but allows the remaining 10% to happen at all, I'd rather scrap the 10% and auxiliary code and make the 90% accessible.

- some syntactic sugar on "expr"?

AMG: I support expr sugar because expr is a critical command that is often misused (see: Brace your expr-essions). The sugar would be the recommended way to do math, and the expr command would be retained for situations where the math expression is intentionally the result of substitution. This closely parallels the [set var] versus $var situation: one-argument set is useful when the variable name is computed, and (for both one- and two-argument set) it's a common bug to prefix the variable name with $ when it doesn't need one.

hat0: Nice parallel/explanation!

- extend the {*} operator in various discussed ways ({n} as sugar for "lindex", etc)

AMG: {n} for lindex? What forms can n take? Is it anything that's a valid index argument to lindex? Is arbitrary substitution allowed? Is it a math expression? (In my opinion, Cloverfield naming of list elements is a cleaner and more flexible alternative.)

hat0: I'll take another look at Cloverfield. My first thought is that n is limited to variable interpolation, no script execution, and n can specify a number, range, or combination of them, e.g. {1} or {2-end} or {1, 3-end}.

AMG: Many of my views on Cloverfield aren't published on this wiki, so I'll write a little bit on references. It's great to be able to substitute in the such-and-such element of a list, a dict, or a nested mix, but it's also important to be able to identify that element to a command like [set]. (This way [set] subsumes [dict set], [lset], etc.) FB and I differ on the notation for identifying an element without actually substituting in its value. I think avoiding expr-like performance and security problems requires a special notation for naming variables or elements thereof. When the interpreter sees this notation (leading &), it enters variable parse mode, but instead of immediately substituting in the value (leading $), it constructs a Tcl_Obj to contain the parse tree information. Later, the command may pass that reference object to a function like Tcl_ObjSetVar2(), which processes the parse tree embedded in the reference object to find the actual named element. For example:

set &num (en (zero one two) es (cero uno dos) fr (zéro un deux))
set &num(de) (null eins zwei)
set &num(en){end+1} three
set &var num
set &key fr
set &idx 2
set &"$var"($key){$idx}     # returns "deux"
puts $"$var"($key){$idx}    # prints "deux"

I put this information here as a possible source of inspiration. Take from it what you will.

hat0: Interesting, very interesting. I like the idea of references, and of claiming the & punctuation to note them. The last two examples seem a little punctuation-soup-ish to me, but I could get used to it, especially considering the flexibility presented.

AMG: C++ uses & in a variable or parameter declaration to signify that it be an indirection that is transparently constructed and dereferenced. C uses & and * in an expression to explicitly construct and dereference an indirection. My idea is a mixture of the two: construction is explicit by putting & in the expression, and dereferencing is transparent. I also have an idea for pointers which have to be explicitly dereferenced, so they have a string representation that is independent of the pointed-to value. My recollection is getting fuzzy here, but maybe there was also another difference: references perhaps referred directly to the value (e.g. a variable name is a reference), but pointers referred to the variable name (they're capable of dangling, and they don't affect reference counts). I think my pointer notation was to construct with a leading @ (instead of $) and to dereference with a trailing @, which could be intermixed with other access notations.

set &langs (@num(en) @num(es) @num(fr) @num(de))
puts $langs{3}          # prints "@num(de)"
puts $langs(3)@         # prints "null eins zwei"
puts $langs(3)@{1}      # prints "eins"

As for punctuation soup, I was just showing off a variety of methods of indirection, including having the variable name being the result of substitution. This flexibility makes it possible to always use $ substitution instead of single-argument set.

hat0: Sure, I didn't mean to suggest that it was gratuitous, just that I personally would probably err on the side of simplicity and reduced expressiveness.

- deprecate now and eventually remove arrays (confusing, unnecessary, dicts are better, etc)

AMG: I have a question about removing support for arrays. Arrays support traces on individual elements, whereas dicts do not. Do you have any thoughts on this? Do we need a means for attaching traces to dict elements? That can get really hairy really fast. I'll summarize the difference between arrays and dicts: an array is a quasi-variable that is a collection of variables, whereas a dict is a value which is itself a collection of values. (I say "quasi-variable" because arrays don't have all properties of variables; for instance, they don't have values.)

hat0 Agreed, traces on arrays/array elements are useful. I'd suggest extending traces to work on dicts/dict elements.

AMG: This is a fundamentally different kind of trace, since it's attached to a value rather than a variable. Variables are named and distinct, values are anonymous and shared. But obviously you don't want the trace to be on all shared copies of a value; maybe attaching a trace causes it to become unshareable, as if you had modified it. However, this seems messy and wasteful of memory. And in order to create the trace and to usefully process a traced event, you'd need a way of naming values, despite them being anonymous in principal. The only way I can think of is to special case the data types you'd like to support (dict and "top-level" variable values, but consider lists as well). Then the name (or a reference thereto) would have to be placed in the Tcl_Obj struct. (By the way, Cloverfield, or at least my vision of it (I've been out of contact with FB, so we've surely diverged), includes the ability to name list and dict elements.)

hat0: How does Jim do it, do you know off-hand? I seem to recall that Jim does use dicts for arrays. There's probably a solution that, at the outset, seems inelegant but in practice works out all right.

AMG: Jim doesn't have traces.

hat0: Ah ha.

- AMG: alternative syntax to [list a b c] for constructing lists

hat0: you mean a syntactical equivalent to the functionality of the list command (with expected substitutions and all)?

AMG: Yeah, I had in mind using matched parentheses as a quoting mechanism which preserves word boundaries. The trouble is that it wouldn't produce a pure list, because both a string and list internal representation would be generated. Maybe it would be okay to drop the string representation. I don't know; I never asked FB why he wanted to preserve the string representation.

hat0: I like it. I'm not too keen on worrying about internals right now.. from a programmer's perspective, some syntactical sugar on [list] is nice.

AMG: This could facilitate resurrection of TIP 251 [L1 ], minus the hack described in the "Incompatibilities" section which should stay dead.

- some syntactic sugar to enable dicts to be treated like structs (e.g. set value $dict.key)

SEH: You may find Tclx's keyed lists interesting.

hat0: Yes, so true, the keyed lists are a good approach, given the syntax as it is now. I'm not afraid to add another syntax rule (placing an additional restriction on variable names/dict key names), though, in order to gain a direct access to the dict value for a given key. Simplifying access to values within dicts--that is, treating them like values like named variables--is a good thing in my opinion.

How can tcl/Tk be improved for developers? how can this be a more developer-friendly system?

- improve error reporting in bad "expr" expressions

- improve error reporting in general

AMG: One thing I've dreamed about is the ability to identify the origin of any given Tcl_Obj. Is it a literal found in a particular line of a source file? Is it the product of a concatenation that happened on a certain line? Was it generated by a command (e.g. [read] or [expr]) somewhere in the sources? Etc. Might be interesting. (Might be expensive!)

hat0: Maybe Tcl_Obj-tracing is enabled/disabled via a pragma, so that the expense would be programmer-determined.

AMG I imagine it would be expensive even when disabled, if the pragma had to be checked all the time. If it was a compile-time option, it would be expensive to test and maintain, and it would be almost useless in practice because it would never be enabled when you need it.

hat0: Agreed that it wouldn't make sense as a compile-time option, but I'm not convinced it'd be too expensive as a run-time option. Then again, I don't know what all would be involved.

- fix the unbalanced brace issue

AMG: Fixing the unbalanced brace issue would require greatly increasing the complexity of the rules for determining whether or not any given brace counts toward the opening/closing brace count. See Cloverfield - Parser for code which skips braces inside double quotes and comments.

- work alongside or replace javascript as an in-browser DOM-manipulation language?

AMG: I'd love to have Tcl be native in all popular browsers! But I fear we missed that boat a long, long time ago. However, I will settle for having Tcl be native in the browser running on my computer. Here's a neat idea: if there's a Tcl+DOM plugin, then we can write Tcl/Tk applications that require or bundle the plugin and embed browsers in the GUI, and the script running outside the browser can interact with the script running inside the browser, and as a side effect users of the applications will have the plugin and will be able to access Web sites that make use of it.

hat0: I don't think the ship has sailed. Or, I guess I'd say, there could be another one coming along. There's a ton of mindshare for JavaScript right now, but in the early 2000s there was a perl->php transition for doing server-side web programming. I'm not holding my breath for any sort of victory here, just acknowledging that anything's possible five years out.

Performance/build considerations

- modularization of the core (e.g. easily-discarded timezones, etc)

- some sort of script-level marker that code block X has no traces/doesn't redefine commands and can therefore be more heavily optimized

- some sort of JIT compilation?

AMG: Don't we already have JIT compilation? (Script is converted to bytecode as it is executed.) You imply that we don't have JIT, so I assume your definition of JIT is to convert from script or bytecode to native machine code. Would this simply consist of concatenating machine code blocks corresponding to the bytecodes? This would cut down on overhead time spent branching through the bytecode execution dispatch loop (which I have never examined), but you need to compare the magnitude of the overhead time with the useful execution time. If 1% of the total time is overhead, you won't gain much of a speedup, and the cost will be a dramatic increase in the size of a compiled script. Also, how would you handle constructs for conditional and repeated execution? C compilers tend to compile such things as body code bracketed by conditional branches, whereas concatenated machine code blocks would (probably) result in function calls or long-distance gotos embedded in a tight loop. To be honest, I'm not sure if this difference would significantly impact performance, so long as the instruction cache can hold more than one contiguous block of code.

- why is Jim so small? what does it lack that is a deal-breaker?

SEH: Speed.

AMG: Here's the result of a simpleminded comparison of [info commands] in Jim 0.51 [L2 ] and Tcl 8.6b1 [L3 ].

Commands in both: append, break, catch, concat, continue, dict, error, eval, exit, expr, for, foreach, format, global, if, incr, info, join, lappend, lindex, linsert, list, llength, load, lrange, lreverse, lset, lsort, package, proc, puts, rename, return, scan, set, source, split, string, subst, switch, tailcall, time, unset, uplevel, upvar, while

Commands in Jim only: *, +, -, /, collect, debug, env, finalize, getref, lambda, lambdaFinalizer, lmap, rand, range, ref, setref

Commands in Tcl only: after, apply, array, auto_execok, auto_import, auto_load, auto_load_index, auto_qualify, binary, case, cd, chan, clock, close, coroutine, encoding, eof, exec, fblocked, fconfigure, fcopy, file, fileevent, flush, gets, glob, history, interp, lassign, lrepeat, lreplace, lsearch, namespace, open, pid, pwd, read, rechan, regexp, regsub, seek, socket, tclLog, tell, throw, trace, try, unknown, unload, update, variable, vlerq, vwait, yield, zlib

hat0: Good point, good way of distinguishing between the two.

Tk

- qt/gnome integration for Tk

- printing for Tk

- Tk gui builder

AMG: We already have a Tk GUI builder. It's called Tcl. :^) Obviously, there are people coming from other environments which strongly encourage or practically require GUI builders which are themselves GUI applications. Having such a tool in Tcl/Tk would be a fine invitation. I would suggest that it prominently show the generated Tcl alongside the Tk GUI being assembled, since I want to showcase how easy it is to directly make a Tk GUI. Also it would be neat if each view updates automatically as the other is edited. But... don't we already have a few programs that do this? Are you suggesting bundling one with the standard Tk distribution? Unless it turns out to be a really simple program (on a par with the demos), I think it would be more appropriate to put it into ActiveTcl.

hat0: I like that idea a lot - to have a GUI builder that shows the Tcl as the interface is built, and vice versa. I strongly dislike all suggestions that "Tk is so easy to build, you just write it!!". I think this attitude makes Tcl/Tk seem primitive and crude, and I think it also leads to very lazy interface design. It's also nice, if you're toying with interface ideas for a program, to be able to drag and drop things to see what best suits your fancy. I'm aware of some of the GUI builder projects, too, but my intent behind that idea is to have a single blessed GUI builder that's included with the core distribution, so that anyone who gets Tcl/Tk automatically gets the dev tools.

SEH: I agree a good visual GUI builder similar to Delphi would be the best possible gateway drug for getting new users to start using Tcl/Tk. Even if experienced Tcl'ers think scripting Tk is the superior way to build GUI's, the influx of new users would be worth it IMO.

hat0: Also, thanks very much for the good feedback/ideas! Keep 'em coming.


Lars H: In my opinion, the syntax ideas are red herrings. If something is awkward to code, then it's typically better to come up with a new little language to help with the task than to complicate the base syntax (which should probably be frozen as perfect).

The idea of a parallel foreach is interesting, but probably not practical. Loop-level parallellisation is rather fine grained, so one would probably want to get the impression of having all threads seeing the same interpreter, as opposed to the current situation where each thread has a separate interpreter. Rewriting Tcl's internals to handle concurrency is going to be difficult (e.g., it would as I understand it be necessary to move away from hash tables). Coroutines could be a first step in this direction however (i.e., separate iterations of the parallel loop could experience the same level of separation as coroutines do).

Looking at it another way, it is rather likely that anything that would benefit from loop-level parallellisation would first benefit from being rewritten in C, and then it's rather the C code that one needs to parallelise.

hat0 It sounds to me like you are in agreement with Alex Ferrieux, in saying, "present-day Tcl is perfect/almost perfect," and that if present-day Tcl doesn't do it right, don't use Tcl (i.e. invent your own little language, or write it in C). While this is certainly a fair attitude to have, and I can sympathize to some extent with it, I guess I don't agree. I don't think that the syntax is perfect, just familiar.

I cringe whenever I see "write it in C" as the recommended course of action. It feels like a copout to me. The developers of other languages/environments seem much less willing to go to C (e.g. Hiphop for PHP, Pyrex for Python, LuaJIT) for performance, seeking out scripting-level performance increases first. As for it being difficult, sure, I don't doubt it, but I'm not really worried about implementation details right now.

Lars H: Just to clarify, I believe "invent you own little language" is an important part of the Tcl way — don't require one language to suit all problems, instead make it easy to mix and match. Core features to make it easier to at the script level create little languages is an interesting area for future development.


Just to clarify..

Just to clarify, the goal of my "wish I had a pony" list on this page is not to say, "this is what the language should look like," but rather, to say, "Tcl should be more popular/widespread in 2015" and ask, what could be done to get there? (If anyone reading this page does not share that goal, then this is probably not the discussion for them.)

I don't think telling programmers new to Tcl, "it's difficult/unlike anything you're used to, and that's just how it is," is a very good way to reach that goal. I do think that adapting the language a little to non-Tcl developer tastes (like some of the syntax changes I mentioned) would help reach that goal.