[NEM] '''21June2004''': Some thoughts on [TOOT], values and types in general. In particular, how TOOT allows ''interpretations'' to be associated with ''representations'' (values) in a general, but flexible manner. Tcl is "untyped" or mono-typed - there is only the string. The "type" of a value is determined by its usage; if it is used as an integer (successfully) then it is one, but it may also be something else (e.g., a string, variable name etc). Other languages are usually typed; values have some type that determines the operations that can be performed on them and helps disambiguate some syntactic forms and allows for polymorhic command/operator overloading. In most languages, "type" is assumed to be an ''intrinsic'' property of a value -- e.g., "2" ''is'' an integer. Tcl takes the view that "type" is an ''extrinsic'' property -- "2" is a string, which may be used as an integer. TOOT expands on this principle. TOOT's view is that the only thing you ever store in a computer is a ''"representation"''. For instance "2" is a string representation of a single character. Note that you never actually store the number 2 in the computer -- that's physically impossible, as numbers are abstract concepts and have no physical presence that could be stored. Even at the lowest machine level, integers are stored as a series of bytes (which in turn are a series of bits, and they themselves are manifest as streams of electrons in the underlying circuitry) that ''represent'' a given number. When you give a representation a type, you are applying an ''"interpretation"'' to that representation. In some languages, you are only allowed to manipulate values that have some interpretation attached. Some languages even require you to declare this interpretation in advance: int a = 1; String foo = new String("Hello, World!"); Object bar = foo; String bars = (String)bar; Note that you can down-cast the value to a weaker interpretation, and later upcast (with the possibility of error) to a stronger interpretation, but only if the new interpretation is compatible with the original interpretation that the value was given. So, the value always has a "type" (an interpretation) associated with it, and you are forced to comply with that interpretation, even if you later change your mind, or disagree with whoever wrote the code that created the value in the first place. In Tcl, values have no intrinsic interpretation. They just are values. Actually, they are strings, but as strings are just sequences of bytes, this representation covers all possible data that a computer can actually represent, and it's useful to have *some* base representation to build on. Now, Tcl commands can impose whatever interpretation they like to this representation, independent of any other commands that operate on the same value. So, for instance, you can do: set a "24" string length $a llength $a expr $a + 2 Everything there works. Values are interpreted in different ways by different commands, without affecting the behaviour of other commands operating on the same value -- "type" is an extrinsic property, and this notion is used to good effect in plenty of Tcl code. The problem with Tcl's approach is that it requires each command to know what interpretation it wants to give a value. This may seem obvious, but there are occasions where you want to perform an operation on a value where the behaviour of that operation (and the interpretation of the value) is defined elsewhere. For instance, suppose I have a value that represents a resource on the internet, which I want to fetch. Some example code might be: set url1 "http://www.foo.com/index.html" set url2 "ftp://ftp.tcl.tk/blah/foo.tar.gz" # Get both URLs http::get $url1 ftp::get $url2 (Assuming implementation of http::get and ftp::get). Now, if we want to wrap this into a general proc that can handle any URL then we might do something like: proc get {url} { regexp {([^:]+):(.*)$} $url -> proto rest switch $proto { http { # Fetch via http } ftp { # Fetch via ftp } } } This shows the problem - we need special purpose code that dissects the value and determines which protocol it refers to -- the "type" of URL it is, and based on this interpretation calls the correct bit of code. In a typed language that supports runtime polymorphic dispatch, such as many OO languages, you might code this, instead as: URL url1 = URL.createURL("http://www.foo.com/index.html"); URL url2 = URL.createURL("ftp://ftp.tcl.tk/blah/foo.tar.gz"); url1.get(); url2.get(); In this example, different "types" of URL subclass would be returned by the factory method in each case (e.g., HttpURL and FtpURL classes), which provide the appropriate get() method. So, the interpretation of the URL is done once at creation time and then subsequent operations do not need to determine this for themselves. In TOOT, you can have the best of both worlds. Values (representations) are not typed by default. However, you can associate a type with a value, to create a ''representation of an interpretation'' (in other words, a new value that associates a type with some other value), which can then be passed around. For instance: set url "http://www.foo.com/index.html" ;# Un-typed representation http::geturl $url ;# Command interprets $url as an HTTP URL set http_url [Url create $url] ;# Returns {HttpUrl: http://www.foo.com/index.html} $http_url get ;# Uses the interpretation So TOOT allows you to take arbitrary representations, and package them up with an indication of how the value should be interpreted. But, crucially, this package becomes a new value with the type explicitly becoming part of the representation, rather than an intrinsic, behind-the-scenes property. This allows you to do clever things, like totally ignore the interpretation given. In TOOT, the interpretation (type) is a prefix that is a command name unique for each type you create. In this scheme, the Tcl ''interpreter'' really is just that - it executes commands that apply operations to a value under some interpretation. For instance: $http_url get # Becomes HttpUrl: http://www.foo.com/index.html get Which applies the operation "get" to the value "http://www.foo.com/index.html" under the interpretation that the value is a URL using the HTTP protocol. Comments, criticism, etc welcome. ---- [[string repeat nod 1000]] ! -[jcw] One more comment from me: REBOL seems to have gone in this direction. It has Tcl's minimalism in syntax (well... almost), but it does associate types with strings. In think it's a bit like Tcl without its dual-rep & shimmering - or to put it in context: REBOL appears to do what TOOT does behind the scenes. The interesting aspect in REBOL is that type also seems to drive the parsing and precedence of parsing, somehow. REBOL's types are in C and not as explicit as TOOT, so perhaps harder to play tricks with it. One thing that does seem to come out of this approach, is conciseness. You get the ability to say "$foo length", without having to specify $foo's type at every turn. So the lesson so far seems to be: we need to base everything around tuples, right? Funny how - coming from a different way - this is OOP again! [NEM] Interesting. I've never looked at REBOL (honest!), but it sounds similar. I'll have a look at what they do there. Regarding tuples: yup, that is the key of TOOT - packaging up a type with a value representation to create an ''interpretation''. Yes, it is OOP in a way (hence the "OO" in TOOT). I must confess to having a soft-spot for OO as an ''idea''. The problem with discussing OO though tends to be the mess of different (and often orthogonal) concepts that are associated with that term. This page, for instance, makes no reference to inheritance (or delegation), mutability of state, or a host of other concepts. Instead, I'm concentrating on how operations on values are interpreted in a given context. Of course, I have ideas for most of the others too, but they'll have to wait for other days! ---- [Lars H]: I like the discussion of intrinsic versus extrinsic interpretations. Sounds like something one should keep in mind when explaining Tcl to people coming from other languages. (But then I'm mostly in favour of the extrinsic approach, so I suppose I would place intrinsic interpretations under "prejudices you should let go of when you program in Tcl".) One place where Tcl does rely on intrinsic interpretations is in [expr], for the distinction between "integer /" and "float /". I find that mostly a bad thing, since it means forgetting a .0 in some other part of the program can have rather unexpected consequences. The URL example got me thinking, though: there seems to be more than one kind of intrinsicality. The scheme part (http, ftp, etc.) of an URL provides information about how the rest of the URL should be interpreted, so it serves as a "type". This type does come with the value, and is thus intrinsic, but it is also rather different from the TOOT interpretations and datatypes in other languages. Whereas URL schemes and the integer/float status of numbers are explicit parts of such values, classical data types rather tend to be (more or less) hidden. (Perhaps easily accessible to the typing system, but not part of its public information.) Thus one could say that there are three possibilities: extrinsic, explicit intrinsic, and implicit intrinsic. [jcw] - Tcl == "everything is a string". TOOT == "everything is an interpretation". TOOT seems to come down to the recipe "look at the first (or second) list item, use it as type to define what to do with the rest. The "HttpUrl: http://www.foo.com/index.html get" example illustrates that there is room for ambiguity and redundancy still. It might be better to use "Url: http://www.foo.com/index.html get". Types can be complex beasts (nested, recursive even), but as far as TOOT is concerned, all that matters is a standard way of extracting a type from a complete value. That type determines what calls/methods there are. In the case of an URL, these may well decide to take the http: etc prefix first, and re-apply the method to the resulting subtype. [NEM] The URL example was probably a bad choice, given that there is a "type" associated with a URL to begin with. Actually, TOOT is beginning to look a bit like URLs, with the new {type: data} syntax I've been leaning towards, so the URL could be represented as {http: //foo.com/blah.html}. In other words, simply introducing a space between the protocol and the path. Of course, you could break it down further, as jcw suggests with {url: {http: $path}} and probably further still (hostname, port, path, query etc). I'd disagree that TOOT == "everything is an interpretation". TOOT allows you to create interpretations, but doesn't force it; you can still pass around untyped values (strings), but you can also add arbitrary nestings of type identifiers to encapsulate an interpretation. You can also dynamically rearrange, add to, or ignore completely the type anotations attached to a value, as they are just normal values themselves (which happen to be command names). So, I guess, to use Lars's terminology, TOOT provides a method to convert extrinsicly typed values to explicitly intrinsicly typed values (what a mouthful!). But you can still discard the type information (''because'' it is explicit) and treat the underlying value as extrinsically typed. [Lars H]: Another aspect I would like to point out is the problem with binary operations. With extrinsic interpretations, this is no harder than unary operations, because it is the ''operation'' which decides how the operands should be interpreted. With intrinsic interpretations, any binary operation will have to deal with ''two'' interpretations, which may in principle be distinct. When the number of possible interpretations is small and fixed (e.g. number types in Tcl) it is usually possible to tabulate each pair of interpretations that may occur, but when that is not the case then things get messy. ---- [[ [Category Concept] | [Category Object Orientation] ]]