''Representation is the essence of programming...'' : Fred Brooks, [http://en.wikipedia.org/wiki/The_Mythical_Man-Month%|%The Mythical Man-Month] At the [script] level, in [Tcl], '''Everything is a String'''. The value of every variable is a string, and every [word%|%argument] to every [command] is a string, and the `[return]` value of every command is also a string. Command names themselves are just strings. This page lays out the implications of the ''everything is a string'' paradigm, as well as the details of working with this paradigm when using the [C] API, where both the string representation and the structured representation of values are exposed. ** Description ** At the script level, [Tcl]'s type system consists of exactly one type: the '''string'''. This is fundamental to the design of the language, and sets Tcl apart from the vast majority of [dynamic language%|%dynamic languages] where there are at least two data types that can be distinguished. The [Dodekalogue] states that a '''[script]''' is string, which implies each '''[command]''' in a script is a string, which in turn implies that each '''[word]''' in each command is a string. It could therefore be said that "Everything is a word", or that "everything is a command" but the fundamental building block is the string. Other languages provide tokens for types such as ''number'', ''identifier'', ''keyword'', ''function'', ''list'', ''dictionary'', or the special value, ''[null]''. Tcl only provides ''string''. In Tcl there are three kinds of substitutions that happen to words at runtime: variable substitution, command substitution, and backslash substitution. Since variable values, procedure results, and interpreted backslash sequences can be substituted into words, all these things must be strings. This design provides a great deal of flexibility and power, and gives Tcl a unique flavour that can take some time to understand and appreciate. Those coming from other language styles might at first feel constrained by things "missing" from Tcl, but climbing the Tcl learning curve often leads to realization that these "limitations" are actually strengths. Everything is a string, but not ''just'' a string. Each command is free to interpret its arguments (which are words) in arbitrary ways. Here are some examples: `[expr]`: interprets words as numbers, operators, or strings. `[lassign]`, `[lindex]`, `[linsert]`, `[lmap]`, `[llength]`, `[lrange]`, `[lreplace]`, `[lsort]`: interpret some or all of their arguments as lists. `[lappend]`, `[lset]`: interpret some of their arguments as the ''names'' of variables whose values are lists `[lassign]`: interprets some of its arguments as the names of variables to create and assign values to `[eval]`: interprets its argument(s) as a script Thus, we see that a string can be interpreted in many ways, e.g., as a number, a list, or even a script. If one desires more types, there are many such systems that have been built in [Tcl], and provide multi-type facilities. See [object orientation] for a list of such systems. That's the nice thing about Tcl. It provides fundamental mechanisms on top of which virtually any programming paradigm can be implemented. ** What's in a String ** [list%|%lists] and [dict%|%dictionaries] are strings that conform to a particular format. Other values are merely tokens (names) that can be used to access and manipulate data structures or resources that are not directly accessible at the script level. The following resources are accessed only by tokens: * variables * [namespace%|%namespaces] * [array%|%arrays] * [proc%|%procedures] * [chan%|%channels] * Tk widgets * encodings * interpreters ** Types in the World of Strings ** Programmers coming from languages where values are typed wonder how get information about the type of a value in Tcl. The answer is, you don't. Instead, each command should understand how a value should be used, and use it appropriately. Where the intended use must depend on additional information about the value, the are myariad ways to design that information into a program. ---- [tcl chatroom] 2013-04-30: [DGP]: I find it useful to think of "types" in Tcl as being subsets of the value universe. So it doesn't make sense to ask what type a value is. Instead, you can identify those types where a value is a member, and where the value is not a member. [CMcC]: Right, subsets, not partitions [DGP]: "Everything is a String" is just the trivial observation that all values are in the same value universe. ** Implementation ** For performance, Tcl internally tracks how the value was most recently used, and stores the relevant internal format (often a [C] data structure) alongside the string representation. Tcl keeps the string representation and the internal format synchronized. Thus, e.g., a list can be modified either by changing its string representation or by using a command like [lappend], which works directly with the internal format version of the value. For performance, Tcl only updates one of the representations when that particular representation is needed, and the other representation is newer. The internal format of a value is not exposed at the script level, does not have any semantic impact on the language, and is just an implementation detail. The internal format simply has no purpose at script level. Two objects with the same string representation are indistinguishable in every other respect at the script level. Tcl handles all the messy details of tracking and synchronizing the script-level values and the internal format(s) of those values so that the user can work in the "seamless" world of words. A user of Tcl's C API will gain an appreciation for the way Tcl values are handled at the C level, each one having both a string interface and a structured interface. ** The Magic of EIAS ** EIAS is one of the grand unifying concepts of Tcl. As [Edsger Dijkstra] noted in [http://www.cs.utexas.edu/~EWD/transcriptions/EWD10xx/EWD1036.html%|%On the cruelty of teaching computer science], a program can be viewed as a formula that must be derived by the programmer, and the only known reliable way of doing that is by symbol manipulation. Hence, we construct mechanical symbol manipulators by means of human symbol manipulation. '''EIAS''' facilitates such a mathematical style of programming by merging the concepts of code and data more completely than even [Lisp], as a Tcl script itself morphs to become its own result. When everything is a string, every kind of data is readily accessible: When some new data type is introduced in a language like [C] or [Java], it usually has to come with its own library for printing values, doing I/O, initialising variables, and often even for copying values. In Tcl all that is immediately available, since it can be done with strings and the new data type is represented using strings. Everything just works! Strings are general. The standard computing models are all readily expressible in terms of strings. What is currently on the tape of a [Turing machine] is a finite string of symbols. [Lambda] calculus is manipulation of [http://www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Post.html%|%Post] production systems model computability by replacing parts of strings with other strings. [NEM] 2010-12-15: One aspect of [EIAS] that is worth consideration is how it has kept Tcl "pure" in some sense. Part of EIAS that is little mentioned is that Tcl's strings are ''immutable''. This means that Tcl's value space is purely [functional programming%|%functional], in the [Haskell] sense. All side-effects are confined to the shadow world of commands and variables and other second-class entities. What this means is that Tcl now possesses some very powerful purely functional data-structures that are somewhat better than those available in other languages. For instance, I cannot think of another popular language that supplies O(1) purely functional dictionaries and lists (arrays) out of the box (or even in the library). Not to mention efficient unicode and binary strings. ** Peeking Behind the Curtain ** [DKF]: See also [representation%|%tcl::unsupported::representation], which can peek behind this veil. If you use this, feel dirty! [AMG]: In testing and debugging high-performance applications I use this to confirm that I'm avoiding [shimmering]. ** EIAS the Misunderstood ** Programmers more familiar with other language are sometimes criticize Tcl's EIAS design, usually because they assume that complex algorithms requiring data structures are not going to be possible in Tcl. What they might be missing is that although they can't directly translate some of their idioms into Tcl, equally powerful Tcl idioms exist, and are waiting to be discovered. By sticking to EIAS, Tcl elegantly disposes of problematic "features" of other langage, such as [C] features that make http://en.wikipedia.org/wiki/Aliasing_(computing)%|%aliasing%|% in a possiblity. The data structures that others may think Tcl is missing are simply expressed in another way, but that is difficult to see at the outset. However [LV] would like to point out that the true philosophy of Tcl says ''Do all that you can in Tcl - but then, do the rest in C/assembly/whatever and create glue and handles to it for Tcl.'' ** Misc ** [Donald Porter] remarked in the [Tcl chatroom]: ''More precisely, every value has a string representation. Tcl arrays are not values; they are special types of variables.'' [lvirden]: I guess there are other things that fit into the same category as arrays - created items like procs, and in tk all sorts of widgets, etc. [aku]: But most have a way to serialize them into a value, and back (array set|get, proc|info body|arg|default) [kennykb]: And the ones that don't have natural serialization generally are managing external resources ([channel] handles are the most obvious example) [Shin The Gin]: If everything was a string, then one could easily [save the whole runtime environment to a file and restore it later]. ---- [RS]: likes the ditty '''"I'm not afraid of anything, if everything is a string"'''. In fact, the Tcl mantra often relieves fears of complexity: anything that can be brought to the prototype "string in, string out", can be nicely done in Tcl. Arabic, Korean? Of course, everything is a Unicode string! Geographic mapping? Just give me a string with the latitudes, longitudes, and whatever other data, and presto - [Tclworld]. Images can in many ways be rendered as strings (XBM, PNM...); one pretty intuitive way is in [strimj - string image routines]. ---- [Todd Coram]: Data typing is an illusion. Everything is a sequence of bytes. Call 'em ints, floats, symbols, strings, whatever. Tcl exposes both code and data to the ''user'' as sequences of bytes (called strings). This is Tcl's choice of abstraction. And its quite a powerful choice IMHO. [BR]: Hm, isn't it actually like that a string is a sequence of characters, and bytes (in Tcl) are just characters with the values 0 - 255? I think that's the model of binary data in Tcl. IOW bytes are not fundamental in Tcl, but characters and strings are. ''Except that characters could be [Unicode] instead of [ASCII].'' ---- 2003-05-13: Recently, '''Bruce Eckel''' in [http://web.archive.org/web/20100209072034/http://mindview.net/WebLog/log-0025%|%Strong Typing vs. Strong Testing], and '''Robert C. Martin''' in [http://www.artima.com/weblogs/viewpost.jsp?thread=4639%|%Are Dynamic Languages Going to Replace Static Languages?] talk about weak typing and dynamic languages. [CL] thinks these two make mistakes, but hasn't time now to explain more. In any case, yes, these are good noteworthy references. ---- [escargo] 2003-05-13: Another way that ''everything is a string'' can be an issue is where a string representation can only be an approximation of what is being represented. The main instances of this that come to mind are floating point numbers (for which there are already some existing wiki pages). There may be other examples as well. What? There is no reason a string can't fully represent a floating point number. And Kevin Kenny has a TIP in the works to ensure that Tcl does always indeed achieve exactness in this case - Roy Terry, But really, it seems a waste of time to make fine points about "everything is a string" which is merely a programmer's cliche and doesn't begin to express the power of Tcl. [escargo] 2003-05-14: Sorry; there was a slip of the finger there. I said "floating point" and what I meant was "real". [Lars H] 2003-05-15: Real numbers are ''beyond'' what is computable. The number of possible outputs from a [Turing machine] (and thus the set of real numbers which one can specify in any way whatsoever) is merely countable, whereas the set of all real numbers is uncountable. But this view does provide an answer to why ''"Everything is a string"'' is such a powerful idea. Many languages (most notably C) take the approach that ''"Everything is a number (with native machine representation) or some fixed aggregation of such numbers"'', but all such representations are limited. In order to support general strings, it is necessary to venture into some scheme of dynamic memory allocation and pointers to allocated objects. The ''string'', on the other hand, achieves the maximal generality of a Turing machine (the tape always has an obvious representation as a string) and thus if something wouldn't be representable as a string, it wouldn't be computable either. ---- [escargo] 2003-05-16: What would it take to make [Tk] [widget]s serializable? I was thinking about [xml2gui] and wondering what it would take to make a widget produce an [XML] description of itself. Further, what would it take to have widgets that contain other widgets produce XML of themselves? This would seem to me to be one useful goal. Another goal would be the converse, what XML would need to be used to create all the Tk widgets (and pack them the right way, etc.)? (This would be a suitable storage format for [GUI Building Tools].) [jcw] 2003-05-17: There already is a serialized form of Tk, able to cope with any complexity of widget hierarchies: the Tcl script that creates them. [jmn]: Yes, but is there a canonical form for it? [escargo]: I am reminded of one-way hashes. You can have a function that given an input can produce a hash value that cannot be used to derive the original input. Just because I have a widget does not make it clear to me that I can derive in an algorithmic way Tk and Tcl code to recreate the widget. Perhaps this is something for the [Tk 9.0 WishList], but I would certainly like to see whatever changes would be necessary to allow this (if it's practical at all). ---- [jcw] 2003-05-17: While [EIAS] is indeed a wonderfully powerful and flexible abstraction, I'd like to point out that [LISP]'ers and [Scheme]'rs have a very similar set of self-contained mechanisms at their disposal, based on "everything is made up of cons cells" (it's more of a mouthful, though...). IMO, "strings" as convention to represent data in a certain way is not inherently different from other representation choices - one could even use neurons and synapses if that were practical. What EIAS does imply is "code is data" and "data can be used as code", which is why one can play so many tricks in Tcl (and in [LISP]). [NEM] 2005-07-25: replying to this a couple of years too late... The difference with [Lisp] is that cons cells aren't universal; as I understand it, some basic data types like numbers are not represented as cons cells. You could build up everything from cons cells, in a similar way to building everything from set theory, but [Lisp] doesn't, and so you can't treat an integer as a list. In Tcl, though, the string is the universal medium of representation, so I can treat an integer as a list (of one element). ---- [FW]: Come to think of it, what are some other typeless languages in the "everything is a string" sense - RS has already submitted and documented thoroughly the antique TRAC in [Playing TRAC]. [JE]: [http://en.wikipedia.org/wiki/MUMPS%|%MUMPS], or '''M''', is another [EIAS] language. [Forth] and BCPL are also typeless, but there the fundamental type is a "cell" or native machine word instead of strings. (BCPL seems to be extinct, but [Forth] and MUMPS are still around. ---- '''If "everything is a string," then how can you tell what's an object?''' [escargo] 2005-07-23: That's what I woke up to this morning. I was thinking that Tcl lacks what I have seen called a "meta-object protocol," something that allows some object-oriented languages (like [Smalltalk]) to do some useful operations on objects and classes. I like [Snit] because of what it allows me to do to compose objects using '''delegation'''. However, if I'm operating in Tcl (or in Tk) and I have an identifier, how can I tell if its value represents an object from an object system like Snit (or any of the other object systems added onto Tcl). And if it is an object, how can I tell which object system it is an object in, so that I can guess what behavior is has (which functions it understands or implements)? The only way I can see something like this working is if there were some agreed-upon standard for names (or [reference]s) such that a classifier (say '''[[string is object ...]]''') could return a yes or no answer. Even better would be one that could tell which object system implemented the object (say '''[[string is objectsystem ...]]'''). This might be possible in a system like [Jim] if the [Jim References] encoded the object system and whether something was an object. Even without add-on object systems, it would be nice to be able to determine if there could be '''[[string is command ...]]''', but that in some respects defeats the purpose of [unknown]. (I'm still fuzzy from sleep, so maybe there is something that does this already, otherwise how would [unknown] get called?) [Lars H] 2005-07-24: I think the best way of pointing out how your analysis here is wrong is to point out that * Tcl has no '''whattype''' command; indeed, it follows from ''everything is a string'' that there cannot be such a command (at least not one that doesn't just return "string" or whatever regardless of input), since nothing but the string may define the value. If you want to type-tag values, then you must include that tag in the value itself. Put another way: Values in Tcl are what you make of them. Values don't "know"[[*]] that they're command names, integers, or variable names--they become whatever you decide to treat them as (or cause an error to be thrown when they cannot be interpreted the way you claimed). A consequence of this is that one shouldn't write programs that just throws all data in a huge bowl and lets [unknown] (or whatever) sort it out later, one should write programs so that there at every point in the program is clear what type of data is going to be passed around. It is sometimes useful to let the type of some argument be "either an A or a B", but then one must also have sorted out whether it may happen, and if so what it means, when data comes along that is both A and B. So what has this to do with objects, then? Everything, since whatever one uses to identify an object is just another string (even though few eyebrows are raised these days when people request magical behaviour from objects--for some reason it seems politically correct to regard objects as nobler than ordinary data). You ''can'' ask an object system whether it recognises a particular string as identifying one its objects (but this assumes the system is implemented in such a way that this is possible), and you could start an object system registry that goes around asking all known object systems whether they recognise a particular object as theirs, but that's about it, and I doubt it would be of much use outside debugging. In a sense, the proper response to "how can you tell what's an object?" should be: * What design error did you make that made you ask that question in the first place? Where did you (or someone else) throw away the information that you now find you need? [[*]] Techincally, on the C level, most Tcl values ''kind of'' know from their [Tcl_Obj] internal representation whether they are command names, integers, variable names, etc., but it is more accurate to describe this information as ''if I'm a command name/integer/whatever, then I'm the name of '''that''' command/integer/whatever'' since this type information is shimmered away whenever the [Tcl_Obj] is used in a different sense. [escargo] 2005-07-25: This is closer to the problem that I felt I was dealing with. In a unified object system (or alternatively, where only one object system is possible), you don't have to speculate about what kind of behavior an arbitrary object might exhibit. In Tcl (especially if you are using [Snit] to delegate to some arbitrary objects), you don't know (and as you pointed out, perhaps ''cannot know'') what object behavior a particular object (for which you have a string to use as a [reference]) might exhibit. If I have a string, I can use '''winfo exists''' to see if it's a [Tk] window. I can use '''info procs''' to see if it is a proc. If it were an ''object'', then I expect that it has some behavior (otherwise what's the point of it being an object). But without knowing more about it, it's not safe to try different probes into its behavior to see what it can do. The irony is that at least some object systems for Tcl provide some kind of [introspection], but I doubt that they provide it the same way, so you can't just use it to find out more about the object. ''Why does this matter?'' - The reason I feel that it matters is that there has to be somewhere where knowledge of the type of object has to be carried around so that you can write your programs correctly. (The ''type'' in this sense being the add-on object system that implements the object.) If you can't determine the type of object from the object itself, then you have to code that information into comments or else invent some other means of doing it. It's not that this can't be done, but it's a wish I have that the answer were within the language itself, either by implementation (e.g., you could deconstruct a reference to determine the object system) or convention (all object systems implemented an '''info''' command that all objects responded to that could, as one of the items that might be returned, respond with the name, and maybe revision level) of the implementing object system. I realize that's not going to happen, but if enough people agreed with the need, then progress could be made in that direction. [NEM] 2005-07-25: This all boils down to fundamental philosophical beliefs about the nature of values and types. What really marks Tcl out from most other languages, and what is at the heart of this debate is not that strings are such wonderful things that they should be used for everything, but rather a recognition that the notion of a "type" of a value is ''extrinsic'' to the value itself. In other words, a type is an indication of some ''interpretation'' of a value. Any representation of a value can have multiple different interpretations, and so to talk of ''the'' type of a value without reference to the particular system doing the interpretation is difficult. Conversely, any abstract type can have multiple possible representations (the key idea of abstraction/encapsulation). So, the connection between values and types is a many-to-many connection. Most languages assume a 1-to-many connection, so each value has a single type which is associated with it by the ''language'', with less categorisation left up to individual commands/functions (although it is not true to say that no choice is left; every function performs an interpretation of its arguments to some degree). Tcl, however, is different in that it performs almost no interpretation of the values it is passed. It does basic tokenization and grouping, but leaves values as they are found: as strings. The only further bit of interpretation that Tcl does is to treat the first word of each line (talking loosely) as the name of a command. (Well, there is also variable substitution and other items in Tcl.n, but we'll ignore those for now). It is then the individual commands which take care of any further interpretation. You can think of this as a form of extreme lazy evaluation: even parsing is left to the last possible moment. So, what are the trade-offs? On the negative side, the fact that Tcl does less interpretation for you means that it makes fewer guarantees (e.g., it's hard to do garbage collection of references if you can't guarantee that X is a reference and Y isn't). Another difficulty, is that it is possible to break abstractions in Tcl: you can always drop down to the level of strings and manipulate the representation of a value, rather than use any higher-level interface. I actually think this is one of Tcl's strengths, but it is a longer argument. You can also get around this by using opaque [handle]s, which hide the representation behind a layer of indirection, that may or may not be introspectable. On the plus side, the fact that Tcl has an ultimate fear of commitment, means that commands have more free reign in deciding how ''they'' will interpret the values. This, I suggest, is the heart of what makes Tcl a good glue language: by not committing to a single interpretation of a value it allows multiple components to make their own, possibly conflicting, interpretations. (As an aside, an interesting parallel can be drawn here with Daniel Dennett's ''Multiple Drafts'' theory of cognition/consciousness). Another way to look at this is to say that by providing a common representation medium you reduce the number of explicit conversions that have to be done. If you have N distinct types, then in order to convert between them you potentially need N!/(N-2)! different conversion functions (i.e. N 2-way permutations, e.g. int2double, double2int, int2string, string2int, etc). If you have a common representational medium, then you can use that as an intermediate, thus reducing the number of conversion functions needed to just 2(N-1), and just two functions are needed for each type: toString and fromString (the string type itself obviously doesn't need these). Can we combine the benefits of both approaches? I think we can. [TOOT] was about doing just this, and [Interpreting TOOT] has my earlier thoughts on the subject. I've been thinking about this some more since, and will hopefully soon have time to write some more code and an essay detailing my further thoughts. For now, I will point at [Monadic TOOT], which contains some clues to a possible way forward. Those who know about monads will know that they are useful for confining effects and enforcing abstraction boundaries. I think we can use the same techniques in Tcl to create packets in which abstractions can be enforced and guarantees can be made, if needed. The other side of this process is [partial evaluation], a TOOT bundle of ======none type: value ====== can be partially evaluated (or partially applied), to yield a new function specialised for that interpretation of that value. This can be optimised and can enforce a type abstraction. [Lars H]: Well put. The part about late "commitment" puts a name on something I think is very important in understanding the strengths of Tcl. I'll see if I can find a good place to put this idea for easy access. [DKF]: Actually, in 8.6 there is `[tcl::unsupported]::[representation]`, which includes type cache information in its result. Don't use it for anything other debugging. Or if you do, feel very naughty. It is ''very'' bad style to write code that depends on types (albeit inevitable for solving certain types of problem in the support of [Java] and [JSON] correctly, alas). ---- [SYStems] 2005-07-23: Those are not very complete thoughts, but. I think to really answer and understand the idiom everything is a string, we need to identify the context, or perspective. A Tcl script is a series, a sequence of statements, each statement receive input 1. A string. 1. An event. perform action on this input and then 1. produce output. 1. cause a side effect. 1. produce output and cause side effect 1. Raise an error Every statement input and/or output is a string, only side effects (and maybe input events) can be NOT A STRING, but all input and output must have a string representation. Each input and ouput, can have a different in-memory representation, or on disk representation. But inside a script it must have a string, or I prefer to say, textual representation. Since everything written in a Tcl script, is textual. A script can be the input of a Tcl command, for example control structure commands. Every input, must have an inline textual representation, this is why an command must have a textual ouput, so that when its substituted it produces a string, the only thing that is good as an inline input. All input and output must have inline textual representation (this is the part I an hesistant about, I am not really sure this is correct, I am using the word inline loosely here I propably mean infile!!) `[[[set]]` is the command used to manage a Tcl script memory, all `[[[set]]` variables must have a string value. This may sound weird, but I write this hoping that Tcl doesn't lose its primary principles. For example, I see many people talking about Tcl variables, Tcl doesn't have variables. `[[[set]]` is a command that has a Tcl interface, gives a Tcl script the notion of variables by associating a name with a string value. Depending on the value-string-representation, [set] will store it differently in-memory. [set] not Tcl. `[[[set]]` for example, doesn't store the variables on disk. A good Tcl'er might create a command that give a Tcl script the notion of persistant data, data stored on disk! Tcl helps guide thinking by recognizing the syntax `$name`, and treating that as `[[[set] name]]` Anyway, back to the fact that set can only associate a name with a string. set is used to store another tcl command's output, and pass it later as an input. : [LV]: Uh - maybe that is how you _want_ it to work. But since I can say `set abc 123` then set doesn't just store another tcl command's output... So we can say that everything inline- a tcl script, anything that can be passed around, a tcl script memory, a tcl script internal environment, must be a string. Or in other words, we can say, that Tcl introduces a new in-tcl context, where everything must have a textual representation. Anything outside a Tcl script, outside the in-tcl context, for example, a command side effect, or an external environment, can be not a string. ** See Also ** [shimmering]: [Tcl_Obj]: [homoiconic]: [Is everything a list?]: [How Tcl is special]: <> Concept | Discussion