Version 5 of Data structures: from the bit to the Web

Updated 2002-12-09 13:42:37

Richard Suchenwirth 2002-12-09 - If we think of structures as "things built from other things", the lowly bit does not qualify as data structure - it is the minimal, atomar part of all others, capable only of expressing two values: 0 or 1, true or false.

Bit sequences of length n can hold 2**n values, which makes them more suitable for most purposes (e.g. 8 bits, a byte, value range 0..255 - see Playing with bits). When joined to sequences again, bit sequences need not be of constant length, if they can be told apart: see Huffman coding, or bit vectors. Bytes are also used to cover all sort of "binary" data, although correctly said, everything composed of bits is binary...

Characters used to fit in 5/6/7/8 bits, but in the course of internationalization, one should use Unicode and UTF-8, where a character needs a bit sequence of 16 (or more) bits, resp. if coded as UTF-8, a byte sequence of 1..3.

Character sequences, or less formally: strings, are extremely useful for all kinds of things. In Tcl, everything is a string - among many other things, every script is a string. Under the hood, a Tcl object may well have another representation too - as pure signed integer (32 or 64 bits wide), or as double-precision float.. But for simplicity, you may still consider Tcl strings that look like numbers as strings.

Lists, sequences of items, which can in turn be strings, numbers, or other lists, can in Tcl be seen as a view on (well-formed) strings, although in memory they're more efficiently implemented. By virtue of recursively embedding lists again, they can be used for trees (e.g. every XML document is a tree - and a string !), tables, stacks and queues and just about any complex structure you can think of. Also, every Tcl command is a list, the first word the command name, the rest its arguments.

Hash tables can be imagined as tables of two columns and arbitrarily many rows - the first column contains the unique key, the second the associated value. Hash tables can be implemented in various ways.

And the Web, you ask? Well, one can see it as an extremely complex structure, that takes hours to describe, but one can also consider it as a black box with hash table functionality: give a string (the URL), receive a sequence of bytes back (very often text, but it could be binary image data etc... or a variation on "404 - Page not found" ;-)


NEM - And here is the web as a Tcl array (hashtable):

 package require http
 array set web {}
 trace add variable web read fetch
 proc fetch {v1 url op} {
     if {[info exists ::web($url)]} {
         # Do nothing
     } else {
         set t [http::geturl $url]
         set ::web($url) [http::data $t]
         http::cleanup $t
     }
 }