Version 101 of Wibble

Updated 2010-05-17 10:55:00 by dzach

Wibble web server

AMG: Wibble is a small, pure-Tcl web server inspired by Wub, DustMote, Coronet, and Templates and subst. One fine day I wanted to put together a site using Wub, but I needed help and couldn't find CMcC or JDC in the Tcl chatroom. The need to hack would not be denied! So I wrote this.

This code is intended to be customized for your application. Start by changing the zone handlers and root directory. Feel free to create your own zone handlers; it's very easy to do so. Just make another proc.

Name

"Wibble" is similar in sound to "Wub", and according to the Jargon File [L1 ], it is one possible pronunciation of "www".

Zone handlers

Zones are analogous to domains in Wub and TclHttpd. Zones are matched against the URI by prefix, with an important exception for directory names. I use the name "zone" instead of "domain" because I don't want any confusion with DNS domain names. Someday I may add support for virtual hosts via the Host: header, which means that DNS domain/host names will be included in zone names.

Handlers can be stacked, so zones are defined by a combination of handlers which are executed in sequence. Each handler receives a request and response dictionaries as its last two arguments. The request dictionary is augmented with configuration options and a few extra parameters. The request dictionary is derived from the HTTP request. The extra parameters indicate the match prefix and suffix, plus the (possible) filesystem path of the requested object. The response dictionary passed to the handler is a tentative response that the handler can update, replace, or simply ignore. The handler then returns using the [nexthandler] or [sendresponse] command. [nexthandler] takes an even number of parameters as arguments, which are alternating request/response pairs to pass to subsequent handlers. [sendresponse] takes only one parameter: the final response dict to send to the client.

Zones also stack. For example, if no handlers for zone /foo return a response, then the handlers for / are tried. Just as the handlers within a zone must be specified in the order they are to be executed, the zones themselves must be specified in order of decreasing specificity. To inhibit this stacking behavior, be sure that a default handler is defined for the zone, e.g. notfound.

The $wibble::zones variable defines the zones and their handlers. $wibble::zones is structured as a dict mapping from each zone prefix to its list of handlers. Each handler is a list of positional arguments (the first of which is the handler command name) and a list of key/value arguments which are merged into the request dictionary. The [wibble::handle] command is used to update this variable.

Statically, the zone handlers form a stack or list. But dynamically (during program execution), the zone handlers can branch from a list into a tree, which is traversed in a breadth-first manner to search for a response to send to the client. The tree branches whenever [nexthandler] is given more than two arguments; each pair forms a new alternative handler stack operating on a modified request/response pair. When [nexthandler] is given zero arguments, the "node" is a leaf node, the tip of a dead branch; the request/response pair that was passed to the handler is removed from consideration.

Request dictionary

  • socket: The name of the Tcl channel that is connected to the client.
  • peerhost: Network address of the client.
  • peerport: TCP port number of the client.
  • method: HTTP method (GET, PUT, POST, HEAD, etc.).
  • uri: HTTP URI, including query string.
  • path: Path of client-requested object, excluding query string.
  • protocol: HTTP/1.0 or HTTP/1.1, whatever the client sent.
  • header: Dictionary of HTTP header fields from the client.
  • rawheader: List of HTTP header lines.
  • query: Dictionary of query string elements.
  • rawquery: Query string in text form.

Extra parameters merged into request dictionary

  • prefix: Zone prefix name.
  • suffix: Client-requested object path, sans prefix and query string.
  • fspath: Object path with root option prepended. Only defined if root option is defined.

Configuration options merged into request dictionary

  • root: Filesystem directory corresponding to zone root directory.
  • indexfile: Name of "index.html" file to append to directory requests.

Support for configuration options varies between zone handlers. Zone handlers can also take positional configuration options by including them in the command argument to [wibble::handle], which is actually a list.

Response dictionary

  • status: The numeric HTTP status to send. 200 is OK, 404 is Not Found, etc.
  • header: Dictionary of HTTP header fields to send to the client.
  • content: Message body to send to the client.
  • contentfile: Name of a file containing the message body to send to the client.

Predefined zone handlers

  • vars: Echo request dictionary plus any extra or optional arguments.
  • dirslash: Redirect directory requests lacking a trailing slash.
  • indexfile: Add indexfile to directory requests.
  • static: Serve static files (not directories).
  • template: Serve data generated from .tmpl files.
  • dirlist: Serve directory listings.
  • notfound: Send 404.

TODO/Wish list

Character encoding

Wibble currently doesn't support character encodings (neither Accept: nor Content-Type: charset); it's hard-coded to only use ISO8859-1. It should at least support UTF-8.

I think the zone handlers should be able to encode their responses however they like, so long as they correctly identify the encoding, then a general facility can convert encodings if necessary. Or I could standardize on UTF-8, but this becomes a problem when serving files from disk which may not be UTF-8. If the client will accept ISO8859-1 and the file is in ISO8859-1, it doesn't make sense to read the file, convert it to UTF-8, convert it back to ISO8859-1, then send it. Just use chan copy.

By the way: I don't like how HTTP crams charset into the Content-Type: header; I'd rather it have a separate header, so that I can parse and update it more easily. There's a lot not to like about HTTP. :^)

Content type

The Wibble static file server doesn't send Content-Type:, so the client has to guess. (Internet Explorer second-guesses the text/plain Content-Type: anyway.) I think a contenttype zone handler might be able to fill the gap, but I'm not sure what heuristics it should use to detect the type.

makr (2009-11-19): A pure-Tcl solution could employ fumagic (fileutil::magic). tmag on the other hand is an extension built on top of libmagic (only available for Unix so far).

AMG: Sounds good, thanks for the pointer. So that's two possible approaches, and I can think of a third one: file extension matching. Since zones stack, it's possible (but untested) to override contenttype (or its arguments) for individual files, by declaring single-file zones. Anyway, I might try implementing all three, once I have made a separate Wiki page for extra zone handlers.

Zone handlers

I should write more zone handlers for sessions, Content-Type:, etc. This page is getting fairly long, so they'll be collected on a separate page of this Wiki. Also the current zone handlers are very bare-bones, particularly dirlist, and maybe could stand to be improved. Originally I wrote them to be as short and simple as possible so that they clearly demonstrate the concept without being cluttered with features.

Caching

I'm not necessarily thinking of caching entire pages; I wonder if I can be a bit more general than that. Perhaps I cache zone handler outcomes, using the incoming request as part of the key. (See memoizing.)

One tricky part is identifying the rest of the key, that is to say, all other inputs that play a part in determining the outcome. A closely related issue is identifying the irrelevant parts of the request. For example, if the socket name affects the outcome, there's no point in caching, since that changes all the time. Except for debugging, I can't think of a reason why the socket name would matter. Likewise a dependency on the peer's network name would impede caching, and sometimes that dependency exists legitimately.

I have to be very careful with this one, because caching can incur overhead that exceeds the cost of recalculating from scratch every time. In my line of work (real-time simulation), this is nearly always the case. For that reason I don't want to experiment with caching until performance becomes a problem and other optimizations cease to bear fruit.

SSL

I really don't know how this is done...

File uploads

I'll be needing this for a project, but I haven't done any research on it yet.

Others?

Insert feature request here.


Sample index.html.tmpl

% dict set response header content-type text/html
<html><head><title>$uri</title></head><body>
% set rand [expr {rand()}]
% if {$rand > 0.5} {
random=[format %.3f $rand] &gt; 0.5<br/>
% } else {
random=[format %.3f $rand] &lt;= 0.5<br/>
% }
time/date=[clock format [clock seconds]]<br/>
milliseconds=[clock milliseconds]<br/>
clicks=[clock clicks]<br/>
% if {![dict exists $query noiframe]} {
<iframe src="/?noiframe" width="100%"/>
% }
</body></html>

Wibble source code

#!/bin/sh
#
# Wibble - a pure-Tcl Web server.  http://wiki.tcl.tk/23626
# Copyright 2009 Andy Goth.  mailto:unununium/at/aircanopy/dot/net
# Available under the Tcl/Tk license.  http://tcl.tk/software/tcltk/license.html
#
# The next line restarts with tclsh.\
exec tclsh "$0" ${1+"$@"}

package require Tcl 8.6
package provide wibble 0.1

# Define the wibble namespace.
namespace eval wibble {
    variable zones {}
}

# Echo request dictionary.
proc wibble::vars {request response} {
    dict set response status 200
    dict set response header content-type text/html
    dict set response content {<html><body><table border="1">}
    dict for {key val} $request {
        if {$key in {header query}} {
            set newval ""
            dict for {subkey subval} $val {
                append newval "<b>[list $subkey]</b> [list $subval] "
            }
            set val $newval
        }
        dict append response content <tr><td><b>$key</b></td><td>$val</td></tr>
    }
    dict append response content </table></body></html>\n
    sendresponse $response
}

# Redirect when a directory is requested without a trailing slash.
proc wibble::dirslash {request response} {
    dict with request {
        if {[file isdirectory $fspath]
         && [string index $suffix end] ni {/ ""}} {
            dict set response status 301
            dict set response header location $path/$rawquery
            sendresponse $response
        } else {
            nexthandler $request $response
        }
    }
}

# Rewrite directory requests to search for an indexfile.
proc wibble::indexfile {request response} {
    dict with request {
        if {[file isdirectory $fspath]} {
            if {[string index $path end] ne "/"} {
                append path /
            }
            set newrequest $request
            dict set newrequest path $path$indexfile
            nexthandler $newrequest $response $request $response
        } else {
            nexthandler $request $response
        }
    }
}

# Generate directory listings.
proc wibble::dirlist {request response} {
    dict with request {
        if {![file isdirectory $fspath]} {
            # Pass if the requested object is not a directory or doesn't exist.
            nexthandler $request $response
        } elseif {[file readable $fspath]} {
            # If the directory is readable, generate a listing.
            dict set response status 200
            dict set response header content-type text/html
            dict set response content <html><body>
            foreach elem [concat [list ..]\
                    [lsort [glob -nocomplain -tails -directory $fspath *]]] {
                dict append response content "<a href=\"$elem\">$elem</a><br />"
            }
            dict append response content </body></html>\n
            sendresponse $response
        } else {
            # But if it isn't readable, generate a 403.
            dict set response status 403
            dict set response header content-type text/plain
            dict set response content Forbidden\n
            sendresponse $response
        }
    }
}

# Process templates.
proc wibble::template {request response} {
    dict with request {
        if {[file readable $fspath.tmpl]} {
            dict set response status 200
            dict set response header content-type text/plain
            dict set response content ""
            set chan [open $fspath.tmpl]
            applytemplate "dict append response content" [read $chan]
            chan close $chan
            sendresponse $response
        } else {
            nexthandler $request $response
        }
    }
}

# Send static files.
proc wibble::static {request response} {
    dict with request {
        if {![file isdirectory $fspath] && [file exists $fspath]} {
            dict set response status 200
            dict set response contentfile $fspath
            sendresponse $response
        } else {
            nexthandler $request $response
        }
    }
}

# Send a 404.
proc wibble::notfound {request response} {
    dict set response status 404
    dict set response header content-type text/plain
    dict set response content "can't find [dict get $request uri]\n"
    sendresponse $response
}

# Apply a template.
proc wibble::applytemplate {command template} {
    set script ""
    set pos 0
    foreach pair [regexp -line -all -inline -indices {^%.*$} $template] {
        lassign $pair from to
        set str [string range $template $pos [expr {$from - 2}]]
        if {$str ne ""} {
            append script "$command \[" [list subst $str\n] \]\n
        }
        append script [string range $template [expr {$from + 1}] $to]\n
        set pos [expr {$to + 2}]
    }
    set str [string range $template $pos end]
    if {$str ne ""} {
        append script "$command \[" [list subst $str] \]
    }
    uplevel 1 $script
}

# Get a line of data from a channel.
proc wibble::getline {chan} {
    while {1} {
        if {[chan gets $chan line] >= 0} {
            return $line
        } elseif {[chan pending input $chan] > 4096} {
            if {[chan gets $chan line] >= 0} {
                return $line
            } else {
                error "line length greater than 4096"
            }
        } elseif {[chan eof $chan]} {
            chan close $chan
            return -level [info level]
        } else {
            yield
        }
    }
}

# Get a block of data from a channel.
proc wibble::getblock {chan size} {
    while {1} {
        set chunklet [chan read $chan $size]
        set size [expr {$size - [string length $chunklet]}]
        append chunk $chunklet
        if {$size == 0} {
            return $chunk
        } elseif {[chan eof $chan]} {
            chan close $chan
            return -level [info level]
        } else {
            yield
        }
    }
}

# Decode hexadecimal URL encoding.
proc wibble::unhex {str} {
    set pos 0
    while {[regexp -indices -start $pos {%([[:xdigit:]]{2})} $str range code]} {
        set char [binary format H2 [string range $str {*}$code]]
        set str [string replace $str {*}$range $char]
        set pos [expr {[lindex $range 0] + 1}]
    }
    return $str
}

# Advance to the next zone handler using the specified request/response list.
proc wibble::nexthandler {args} {
    return -level 2 $args
}

# Send a response to the client.
proc wibble::sendresponse {response} {
    return -level 2 [list $response]
}

# Register a zone handler.
proc wibble::handle {zone command args} {
    variable zones
    dict lappend zones $zone [list $command $args]
}

# Get an HTTP request from a client.
proc wibble::getrequest {chan peerhost peerport} {
    # The HTTP header uses CR/LF line breaks.
    chan configure $chan -translation crlf

    # Parse the first line.
    regexp {^\s*(\S*)\s+(\S*)\s+(.*?)\s*$} [getline $chan] _ method uri protocol
    regexp {^([^?]*)(\?.*)?$} $uri _ path query
    set path [regsub -all {(?:/|^)\.(?=/|$)} [unhex $path] /]
    while {[regexp -indices {(?:/[^/]*/+|^[^/]*/+|^)\.\.(?=/|$)} $path range]} {
        set path [string replace $path {*}$range ""]
    }
    set path [regsub -all {//+} /$path /]

    # Start building the request structure.
    set request [dict create socket $chan peerhost $peerhost peerport\
        $peerport method $method uri $uri path $path protocol $protocol\
        header {} rawheader {} query {} rawquery $query]

    # Parse the headers.
    while {[set line [getline $chan]] ne ""} {
        dict lappend request rawheader $line
        if {[regexp {^\s*([^:]*)\s*:\s*(.*?)\s*$} $line _ key val]
         || ([info exists key] && [regexp {^\s*(.*?)\s*$} $line _ val])} {
            set key [string tolower $key]
            if {[dict exists $request header $key]} {
                set val [dict get $request header $key]\n$val
            }
            dict set request header $key $val
        }
    }

    # Parse the query string.
    foreach elem [split [string range $query 1 end] &] {
        regexp {^([^=]*)(?:=(.*))?$} $elem _ key val
        dict set request query [unhex [string map {+ " "} $key]]\
                               [unhex [string map {+ " "} $val]]
    }

    # Get the request body, if there is one.
    if {$method in {POST PUT}} {
        if {[dict exists $request header transfer-encoding]
         && [dict get $request header transfer-encoding] eq "chunked"} {
            # Receive chunked request body.
            set data ""
            while {[scan [getline $chan] %x length] == 1 && $length > 0} {
                chan configure $chan -translation binary
                append data [getblock $chan $length]
                chan configure $chan -translation crlf
            }
        } else {
            # Receive non-chunked request body.
            chan configure $chan -translation binary
            set data [getblock $chan [dict get $request header content-length]]
            chan configure $chan -translation crlf
        }
        dict set request content $data
    }

    return $request
}

# Get a response from the zone handlers.
proc wibble::getresponse {request} {
    variable zones
    set state [list $request [dict create status 500 content "Zone error\n"]]
    dict set fallback status 501
    dict set fallback content "not implemented: [dict get $request uri]\n"
    dict set fallback header content-type text/plain

    # Process all zones.
    dict for {prefix handlers} $zones {
        set match $prefix
        if {[string index $match end] ne "/"} {
            append match /
        }

        # Process all handlers in this zone.
        foreach handler $handlers {
            lassign $handler command options

            # Try all request/response pairs against this handler.
            set i 0
            foreach {request response} $state {
                # Skip this request if it's not for the current zone.
                set path [dict get $request path]
                if {$path ne $prefix && ![string equal\
                        -length [string length $match] $match $path]} {
                    continue
                }

                # Inject a few extra keys into the request dict.
                dict set request prefix $prefix
                dict set request suffix [string range $path\
                                        [string length $prefix] end]
                if {[dict exists $options root]} {
                    dict set request fspath\
                        [dict get $options root]/[dict get $request suffix]
                }
                set request [dict merge $request $options]

                # Invoke the handler and process its outcome.
                set outcome [{*}$command $request $response]
                if {[llength $outcome] == 1} {
                    # A response has been obtained.  Return it.
                    return [lindex $outcome 0]
                } elseif {[llength $outcome] % 2 == 0} {
                    # Filter out extra keys from the new request dicts.
                    for {set j 0} {$j < [llength $outcome]} {incr j 2} {
                        lset outcome $j [dict remove [lindex $outcome $j]\
                                prefix suffix fspath {*}[dict keys $options]]
                    }

                    # Update the state tree and continue processing.
                    set state [lreplace $state $i $i+1 {*}$outcome]
                } else {
                    error "invalid zone handler outcome"
                }
                incr i 2
            }
        }
    }

    # Return 501 as default response.
    return $fallback
}

# Main connection processing loop.
proc wibble::process {socket peerhost peerport} {
    try {
        chan configure $socket -blocking 0
        while {1} {
            # Get request from client, then formulate a response to the reqeust.
            set request [getrequest $socket $peerhost $peerport]
            set response [getresponse $request]

            # Get the content size.
            if {[dict exists $response contentfile]} {
                set size [file size [dict get $response contentfile]]
                if {[dict get $request method] ne "HEAD"} {
                    # Open the channel now, to catch errors early.
                    set file [open [dict get $response contentfile]]
                    chan configure $file -translation binary
                }
            } elseif {[dict exists $response content]} {
                dict set response content [encoding convertto iso8859-1\
                        [dict get $response content]]
                set size [string length [dict get $response content]]
            } else {
                set size 0
            }

            # Try to parse the Range request header if present.
            set begin 0
            set end [expr {$size - 1}]
            if {[dict exists $request header range]
             && [regexp {^bytes=(\d*)-(\d*)$} [dict get $request header range]\
                        _ begin end]
             && [dict get $response status] == 200} {
                dict set response status 206
                if {$begin eq "" || $begin >= $size} {
                    set begin 0
                }
                if {$end eq "" || $end >= $size || $end < $begin} {
                    set end [expr {$size - 1}]
                }
            }

            # Add content-length and content-range response headers.
            dict set response header content-length [expr {$end - $begin + 1}]
            if {[dict get $response status] == 206} {
                dict set response header content-range "bytes $begin-$end/$size"
            }

            # Send the response header to the client.
            chan puts $socket "HTTP/1.1 [dict get $response status]"
            dict for {key val} [dict get $response header] {
                set normalizedkey [lsearch -exact -sorted -inline -nocase {
                    Accept-Ranges Age Allow Cache-Control Connection
                    Content-Disposition Content-Encoding Content-Language
                    Content-Length Content-Location Content-MD5 Content-Range
                    Content-Type Date ETag Expires Last-Modified Location Pragma
                    Proxy-Authenticate Retry-After Server Set-Cookie Trailer
                    Transfer-Encoding Upgrade Vary Via Warning WWW-Authenticate
                } $key]
                if {$normalizedkey ne ""} {
                    set key $normalizedkey
                }
                foreach line [split $val \n] {
                    chan puts $socket "$key: $line"
                }
            }
            chan puts $socket ""

            # If requested, send the response content to the client.
            if {[dict get $request method] ne "HEAD"} {
                chan configure $socket -translation binary
                if {[dict exists $response contentfile]} {
                    # Send response content from a file.
                    chan seek $file $begin
                    chan copy $file $socket -size [expr {$end - $begin + 1}]
                    chan close $file
                } elseif {[dict exists $response content]} {
                    # Send buffered response content.
                    chan puts -nonewline $socket [string range\
                            [dict get $response content] $begin $end]
                }
            }

            # Flush the outgoing buffer.
            chan flush $socket
        }
    } on error {"" options} {
        # Log errors and report them to the client, if possible.
        variable errorcount
        incr errorcount
        set message "*** INTERNAL SERVER ERROR (BEGIN #$errorcount) ***\n"
        append message "time: [clock format [clock seconds]]\n"
        append message "address: $peerhost\n"
        if {[info exists request]} {
            dict for {key val} $request {
                if {$key eq "content" && [string length $val] > 256} {
                    append message "request $key (len=[string length $val])\n"
                } elseif {$key in {header query}} {
                    dict for {subkey subval} $val {
                        append message "request $key $subkey: $subval\n"
                    }
                } else {
                    append message "request $key: $val\n"
                }
            }
        }
        append message "errorinfo: [dict get $options -errorinfo]\n"
        append message "*** INTERNAL SERVER ERROR (END #$errorcount) ***\n"
        log $message
        catch {
            set message [encoding convertto iso8859-1 $message]
            chan configure $socket -translation crlf
            chan puts $socket "HTTP/1.1 500 Internal Server Error"
            chan puts $socket "Content-Type: text/plain; charset=utf-8"
            chan puts $socket "Content-Length: [string length $message]"
            chan puts $socket "Connection: close"
            chan puts $socket ""
            chan configure $socket -translation binary
            chan puts -nonewline $socket $message
        }
    } finally {
        catch {chan close $socket}
    }
}

# Accept an incoming connection.
proc wibble::accept {socket peerhost peerport} {
    chan event $socket readable [namespace code $socket]
    coroutine $socket process $socket $peerhost $peerport
}

# Listen for incoming connections.
proc wibble::listen {port} {
    socket -server [namespace code accept] $port
}

# Log an error.  Feel free to replace this procedure as needed.
proc wibble::log {message} {
    chan puts -nonewline stderr $message
}

# Demonstrate Wibble if being run directly.
if {$argv0 eq [info script]} {
    # Guess the root directory.
    set root [file normalize [file dirname [info script]]]

    # Define zone handlers.
    wibble::handle /vars vars
    wibble::handle / dirslash root $root
    wibble::handle / indexfile root $root indexfile index.html
    wibble::handle / static root $root
    wibble::handle / template root $root
    wibble::handle / dirlist root $root
    wibble::handle / notfound

    # Start a server and enter the event loop.
    catch {
        wibble::listen 8080
        vwait forever
    }
}

# vim: set sts=4 sw=4 tw=80 et ft=tcl:

Discussion

CMcC likes this, it's neat and minimal, but flexible and fully functional. A couple of observations as they arise: all the header keys have to be set to a standard case after you parse them out of the request stream, as the spec doesn't require a client to use a standard case.

AMG: HTTP allows case insensitivity? Damn. Case insensitivity will be the death of us all! HTTP specifications (or at least clients) probably require a specific case for the server, which unfortunately is neither all-lowercase nor all-uppercase. What a pain!

CMcC AFAIK you can return whatever you like (case-wise) from the server, so no ... no requirement. It's all case-insensitive for the field names.

AMG: Still, I must be able to deal with clients which assume HTTP is case sensitive and that it requires the case style shown in the examples. Most folks only read the examples, so they draw this conclusion: [L2 ]. Just look at Wibble itself! It fails when the client uses unexpected letter case in the request headers! I didn't spot where the specification allowed case independence, and none of the examples suggested this to me.

AMG: Update: I now force all request headers to lowercase and normalize all known response headers to the "standard" case observed at [L3 ]. Unrecognized response headers are left untouched.

CMcC: That's consistent with the networking principle (be tolerant of what you accept, consistent in what you provide.) Wub, FWIW, sends everything out in lowercase (IIRC) on the principle of 'screw 'em if they can't take a joke.'

CMcC: Also, I'm not sure what you do with multiple query args which have the same name, you have to be prepared for that, and do you handle query args without value? Unsure.

AMG: An earlier revision supported multiple query arguments with the same name, plus query arguments without values. Then I decided those two features weren't really important to me, that it was simpler to just require that sites made using Wibble wouldn't depend on those features. But if you can give me a compelling application for multiple like-named and argument-less queries, I'll re-add support. For now, later query arguments replace like-named earlier query arguments, and query arguments without value are taken as having empty string as the value. My earlier solution was for queries with arguments to be in a key-value list, then for all query argument names (even those without values) to be in a second list, sorted by the order in which they appeared in the URI.

CMcC yeah, it's much easier if you can ignore that requirement. Wibble's free to do that, Wub's not (sadly.)

AMG: How does Wub make use of argument-less query arguments and repeated query argument names?

CMcC: it's all set up in the Query module, a decoded query is parsed into a dict which contains a list of metadata and values per query element, accessors return the number of values, the metadata, and the specified element. Of course, most of the time in use I just flatten the dict into an a-list and ignore all the detail.

CMcC: Adding virtual host support is trivial as you've noted, you just need to combine the host with your zone regexp.

I note you don't fall-back to HTTP/1.0 (not really necessary, I guess,)

AMG: I have the beginnings of support for falling back to HTTP/1.0, in that I remember the protocol advertised by the client. In the future I can use that information.

CMcC I really wouldn't bother - there's no real need to support HTTP/1.0 IMHO - the only existing client still using it is the Tcl client (and that should be fixed soon.) Again, Wub doesn't have the option of taking the sensible path.

AMG: I'll have to check a couple streaming music players to see if they all grok HTTP/1.1. They would have to if they support seeking.

CMcC: nor do you stop processing input on POST/PUT as the spec requires (you ought to make sure this is done, as some things require it.) Your pipeline processing requires run-to-completion of each request processing, I think, but there are definitely cases where you would not want this (they're not common, but when you get such a requirement there's no way around it, I think) so that's a limitation, although not a show-stopper.

AMG: I don't have any experience with POST/PUT. I just put in the few rudiments I could figure out from the HTTP specification. I'll have to read up on POST/PUT in more detail.

CMcC the spec says you can't process any requests (and the client oughtn't to send any requests) on a pipeline until the POST/PUT is handled completely. It's subtle, but it's (just) conceivable that something could be screwed up by it. Since your approach is to do things which make sense for most apps, you could probably get away with it by just documenting the behaviour.

AMG: Wibble processes each request immediately after reading it. Its main loop is: get request, compute response, send response, repeat. Computing a response for a POST/PUT necessarily involves committing its side effects to whatever database is used. So subsequent responses remain unread, waiting in the receive buffer, until Wibble is completely finished with the POST/PUT, or any other type of request.

CMcC: it's more for situations like when you have asynchronous background type processing. Updating a DB, for example.

AMG: I don't think I'll be doing any asynchronous background processing within a single connection. Someday I do plan to process multiple connections in parallel as separate processes or threads. But that's a completely separate issue.

CMcC: I like the way zone handlers stack, but the necessity of returning a list is less good, IMHO - I prefer the default case to be easy. I'd consider using a [return -code] to modify the return behaviour, or perhaps using a pseudo response key element to specify further processing.

AMG: I haven't done much with [return -code], so it hadn't occurred to me. That's an interesting idea, thanks. I think I'll change it to return the operation as the return code and the operand as the return value.

CMcC yah, you might want to map the normal error codes from Tcl (including Tcl_OK) to reasonable values (e.g. Tcl_OK=>200, Tcl_Error=>500)

AMG: I wound up using [return -opcode] (wrapped by the [operation] sugar proc), which puts a custom "-opcode" key in the -options dict, then I receive this opcode using catch. The purpose of [return -code] is already defined, and it requires integers or a predefined enumeration, so I decided not to mess with it. Also the purpose of this general mechanism is not to report errors or status, but rather to tell the response generator what operation to take next: modify request, send response, etc. I do map error to HTTP 500 using try {...} on error {...}, then I print the error options dictionary to both stderr and (if possible) to the client socket. On error, I always close the client socket, forcing tenacious clients to reconnect, which is almost like rebooting a computer following a crash.

CMcC: I think the idea of starting a new dictionary for response is pretty good (as it means you don't have to filter out the old stuff,) but I'm not sure that doing so retains enough data for fully processing the original request. Do you pass the dict to the zone handlers converted to a list? That's not so good, as it causes the dict to shimmer.

AMG: Both the request and response dictionaries are available everywhere in the code. They're just in separate variables. Yeah, I convert the dict to a list by means of {*} into args. If that causes shimmering, I'll change the zone handlers to accept a single normal argument. By the way, extra non-dict arguments can be passed to the zone handler by making the command name a list. This makes it possible to use namespace ensemble commands, etc. as zone handlers.

AMG: Update: I have made this change. The shimmering is eliminated.

CMcC: I'm not sure that the idea of jamming new requests into the pipeline is a good one.

AMG: It was the best solution I could think of for handling index.html in the face of the template generation. With the example setup, if there is a file called index.html in the directory being requested, static will serve it straight from disk. If not, template will try to make one from index.html.tmpl. And--- very important!--- if that doesn't work, dirlist will generate a listing. If indexfile simply replaced requests for a directory with requests for index.html, dirlist could never trigger. And if indexfile only did this replacement if index.html actually existed on disk, template would not be used to generate the index.html. I couldn't think of any other way to get all these different handlers to work together.

CMcC this is one of the subtle and intriguing differences between Wub and Wibble architectures - firstly you don't transform the request dict, you create a new one, and as a consequence you have to keep the original request around, and as a consequence of that you have to be able to rewrite the current request (if I understand it correctly.) Those are all (slightly) negative consequences of that architectural decision. The upside is that you don't have to keep track of protocol and meta-protocol elements of the response dict as tightly as Wub does - Wub has to cleanse the fields which make no sense in response, and that's time-consuming and unsightly - Wibble doesn't, and can also easily treat those elements using [dict with] which is a positive consequence of the decision.

AMG: Keeping the original request is easy and natural for me; all I had to do was use two variables: set response [getresponse $request]. To be honest, I didn't notice that Wub operated by transmuting the request dictionary into the response dictionary, so I didn't try to emulate that specific behavior. Instead it made sense to generate a new response from the request: from the perspective of a packet sniffer, that is what all web servers do. Also I enjoy the ability to not only rewrite requests, but also to create and delete alternative requests which are processed in a specific priority order. Branch both ways! The goal is to get a response, and handlers are able to suggest new requests which might succeed in eliciting a response. Or maybe they won't, but the original request will. Rewriters can leap without looking: they don't have to predict if the rewritten request will succeed. And in the indexfile/template/static/dirlist case, indexfile doesn't have the power to make this prediction.

CMcC: this took me a couple of read-throughs to get. You are expecting zone handlers which would otherwise fail to re-write the request to something they expect might succeed. It worries me that you may end up with more responses than requests (which would be disasterous) and I'm not sure what you do to prevent this (except you only ever have the latest request around, right? Because you don't model a pipeline directly, because you don't try to suspend responses?)

AMG: Yes, zone handlers can rewrite the request (or create a new request) that might succeed. It's not possible to get more responses than requests, since processing stops when the first valid response is obtained. The stacking order of zone handlers must be configured such that the first response is also the desired response. For example, putting dirlist before indexfile will prevent index.html from ever being served unless it is explicitly requested.

CMcC: One thing to bear in mind in the rewriting of requests: if you silently rewrite fred to fred/index.html in the server, next time the client requests fred, your server has to go through exactly the same process. Another way to do it is have the fred request result in a response which says that fred content has moved to fred/index.html. That way, the client and proxies can remember the relocation, and will ask for fred/index.html when they want fred, so the client does the work for you. So I'm not certain that your processing model is an unqualified good idea (nor am I certain it's not - the layering effect is powerful.)

AMG: This does not involve rewriting requests. To implement this behavior, the zone handler sends a Found or Moved or Whatever response to the client, which might then make an entirely new request unless it broke, lost interest, or found index.html in cache. It's up to the site administrator whether to rewrite requests within the server or to redirect the client. For an example of this kind of redirection, look at dirslash. Personally, I don't like instructing the client to make a new request for index.html, since I think it's ugly to have "index.html" at the end of the URL.

CMcC: You should probably add gzip as an output mode, if requested by the client, as it speeds things up.

AMG: I figured gzip can wait until later. It's more important for me to bang out a few sites using this. Also I need to look into multithreading so that clients don't have to wait on each other.

CMcC gzip's easy to add, and well worth adding. I should probably get around to adding Range to Wub, too.

AMG: Okay, I'll look into gzip and zlib deflate. Wibble never sends chunked data, so it should be as easy as you say. I'll just look at the response headers to see if I need to compress before sending.

Wibble doesn't support multipart ranges. I doubt any web servers do; it's complicated and it's worthless. Clients are better off making multiple pipelined individual range requests.

AMG: Update: I'm not sure how encodings and ranges are supposed to interact. Content-Length gives the number of bytes being transmitted; that much is clear. What about Content-Range? Do its byte counts reflect the encoded or unencoded data? And the request Range--- surely its byte counts are for the unencoded data.

CMcC: I completely ignored Range stuff, so I don't know. Guessing they either say 'you can't encode a range with anything but none', but for all I know they give a harder-to-implement stricture.

AMG: I'll just make a few guesses then see if it works with Firefox.

AMG: I think I'll ignore the qvalues; they're tough to parse and kind of dumb. Why would a client advertise that it accepts gzip but prefers uncompressed? Or why would it give something a qvalue of 0 rather than simply not listing it?

CMcC: yeah, you'd think the spec would just say 'list the things in the order you would prefer them' instead of the whole q= thing. I dunno, there are lots of anomalies. For example, a lot of clients claim to be able to accept content of type */*, but then users complain if you take 'em at their word :)

AMG: If I ever get "Accept-Encoding: *" then I will encode the response using lzip [L4 ]. It'll be awesome. :^)

CMcC: All in all, good fun. I still wish you'd applied this to Wub, and improved it, rather than forking it, but oh well. I wish you'd been around when I was developing Wub, as some of your ideas would have (and could still) contribute to Wub's improvement. I definitely like the simplicity of your processing loop, although I think that Wub Nub's method of generating a switch is faster and probably better (having said that, it's harder to get it to handle stacking.)

AMG: Yeah, I wish the same. I really haven't spent much time on Wibble. I wrote it in one afternoon, then a week later, spent a couple hours tidying it up for publication. I don't have a huge amount of time to spend on this sort of thing, so when I hack, I hack furiously! And I really couldn't wait for you and JDC to answer my Wub questions. Sorry! :^) I invite you to absorb as much as you like into Wub. If you give me direction and I have time, I'll gladly help.

Now that this code is written, I think it should stay up as a separate project. I think it fills a slightly different niche than Wub. Both are web servers, of course. But Wub is a large and complete server made up of many files, whereas Wibble is a smallish server hosted entirely on the Wiki. That makes it a lot more accessible and useful for educational and inspirational purposes, kind of like DustMote. Maybe think of it as Wub-lite, a gateway or gentle introduction to some of the concepts that undergird Wub.

Thank you for your comments.

CMcC You're welcome - as I say, this is interesting. It's interesting to see where you've taken the request- response- as dict paradigm, it's also interesting to see how you've used coroutines - very clean indeed. Wub has two coros per open connection, and a host of problems with keeping them synchronised. The idea was to keep protocol-syntax and semantics distinct, and therefore to make the server more responsive. I'm scratching my head, wondering whether to move to single-coro per pipeline, as Wibble does, but have to think through the implications.

It's good to see Wibble, because you started with dict and coroutine as given, and evolved it with them in mind, where Wub evolved in mid-stream to take advantage of them, Wibble seems to make better considered use of the facilities as a consequence.

I would definitely advise keeping Wibble as a distinct project - it addresses the problem of a minimal server (like Dustmote et al,) but still tries to provide useful functionality (unlike Dustmote et al.)

I'd be interested to see what you make of Wub/TclHttpd's Direct domain functionality.

AMG: I started with Coronet, which you and MS wrote. I dropped everything I didn't need, merged [get] and [read], absorbed [terminate] and $maxline into [get], eliminated the initial yields in [get], renamed [do] to [process] and [init] to [accept], changed [accept] to use the socket name as the coroutine name, and defined the readability handler before creating the coroutine. I did that last step because the coroutine might close the socket and return without yielding. Yeah, for an instant I have the readability handler set to a command that doesn't yet exist (or, in case of return without yield, will never exist), but this is safe since I don't call update.

CMcC: Noticed that small window, but you're right it can't ever be open. It's interesting to see the iterative evolution: Coronet was cribbed from Wub, you adapted and targetted it, now I'm considering cribbing your adaptation to simplify Wub. :)

AMG: I will try to look into Direct later today. I see Wub has directns and directoo; I am guessing that these map URIs to command invocations, possibly in the manner of Zope.

CMcC: not familiar with Zope, but yes Direct domain maps URLs to proc elements of a namespace or method elements of a TclOO object, and maps query elements to formal parameters by name. It's quite powerful.

AMG: Then it's like Zope. Disclaimer: It's been a long time since I used Zope, and I never did anything more with it than play with the examples.

AMG: Hey, for a fun read, check out [L5 ]. Does Wub support individual headers split across multiple lines?

CMcC: Yes, Wub supports that. You sort of have to, for things like Cookies, which can (and do) span several lines of text. You just have to remember the last field you filled in, and if your key value regexp has no key, you append that to the immediately prior field. Not too hard.

AMG: I've been thinking about cookies. Multiple cookies get posted as multiple Cookie: or Set-Cookie: headers or lines, which the current Wibble code can't handle. I could fix headers in the general case to support multiple lines, or I could add special support for cookies. I think I'll go with your approach of appending; it sounds much easier than adding a special case.

CMcC: Having read the RFC walkthrough you posted, I'll have to amend my answer to 'Yes, Wub supports that, but not all the stupid variants possible.' If a client sends stuff that looks like chaff, I'm ok with returning next-to-nothing. There's a school of thought, btw, which holds that you can best identify malicious spiders and bots by their poor implementation of the RFC, and so being ruthless in applying the standard can help cut your malware load. It's mainly about not being permissive with things which don't send Host: etc. Anything that cuts spiders short is OK with me.

CMcC: is considering modifying Httpd to have a single coro per connection. It's split into two largely for historical reasons (Wub used to parse headers in one thread, process in another, but there seemed to be no good performance reason - it was optimising for the wrong, and uncommon, case.) You do need to be able to defer responses, and sometimes delay processing of subsequent requests pending completion, WubChain relies upon it, and it's a documented and reasonable requirement.

I'm also considering moving a lot of the request pre-processing stuff into a distinct proc. Things like Cache, Block and Honeypot are done in-line, which is marginally more efficient than processing the whole request before making decisions, but I suspect the gains in code cleanliness more than compensate for a few cycles sacrificed.

AMG: I'll gladly trade a few milliseconds for increased code accessibility. The more people are able to understand the code, the more people are able to suggest architectural improvements that will yield more substantial performance boosts.


AMG: Something I need to think about is allowing multiple zone handlers to contribute to a response. For example, the static handler doesn't set content-type, but maybe a contenttype handler can fill that part in by inferring from the request URI or the response content or contentfile. That is, if content-type wasn't already set by some other handler. Another example: a setcookie zone handler might want to inject a cookie or two into the response. The way to do this is by defining a new opcode in addition to "sendresponse". Perhaps "updateresponse"? Or maybe I split "sendresponse" into two separate operations: "setresponse" and "finish". Then I give zone handlers the ability to return multiple opcodes. Also zone handlers probably need to see the under-construction response so they can modify it without outright replacing it. That would simply be a second parameter.

As for actually implementing multiple opcodes: I would need to drop the trick with the custom -opcode return options dictionary. I see now that it doesn't actually provide any benefit, since I already have [operation] to wrap around return. One possibility is for [operation] to return a list or dict of opcode/operand pairs. Another is to have some more fun with coroutines, where the zone handlers can yield as many opcode/operand pairs as needed before finally returning. Perhaps the latter approach can open the door to a more interesting interplay between zone handlers. I'll have to think about it.

CMcC Wub passes a kind of 'blackboard' dict around, it starts as a lightly annotated request dict, and ends up as a response dict. Each step of the way it's transformed. Wibble passes a request dict in, and expects a response dict back. Wub has a series of filters after response generation, in the Convert module, and makes good use of the transformation model. Wibble could do something similar by allowing modifications to the request dict (which it does) to be response fields. Then you could use a series of zone calls to progressively build up a response. I'm not recommending this, I'm merely suggesting it as a possible approach.

AMG: I did a little thinking about the computer science theory behind Wibble's quest for a response. The zone handlers form a tree which branches every time a prependrequest or replacerequest is done. deleterequest and sendresponse establish leaf nodes. Wibble performs a breadth-first search of this tree. When it encounters a sendresponse node, it terminates its search and sends the HTTP response proposed by the node. If there is no sendresponse node in the tree, it sends HTTP 501.

Now that I know what my own code is doing :^) , I can invent more flexible alternative implementations of the same algorithm. One thing comes to mind: each zone handler returns a list of request-response pairs. The length of this list indicates the branching behavior of the tree. If it's empty, the node is a leaf, like deleterequest. If there's one request and one response, the tree doesn't branch, like with replacerequest and pass. If there are multiple pairs, the tree branches, like with prependrequest; except that it's possible to branch to have more than two children, and the list order gives the BFS priority order. If there's one response but no request, the node is a leaf and it's time to send, like sendresponse.

AMG: Okay, done! This works.


AMG: Go have a look at the mini demo webserver page for an example of creating a reload zone handler that reloads the server source on command. Actually, I'll just post a copy right here:

proc reload {request response} {
    set wibble::zones {}
    foreach script [dict get $request scripts] {
        uplevel #0 [list source $script]
    }
    dict set response status 200
    dict set response header content-type text/plain
    dict set response content "Server reloaded.  Have a nice day.\n"
    wibble::sendresponse $response
}
wibble::handle /reload reload scripts [list [info script]]

APN Does the getline proc have a bug in its line limit check? It appears it will raise an error if the total buffered input exceeds maxline, as opposed to individual lines exceeding that limit. For example, if it receives 50 lines of 100 chars each in a single network chunk, it will raise an error when it should not.

AMG: I think you're right, but at the moment I don't have the means to test. If a client sends a bunch of short lines while the server is taking a nap, when the server wakes, it will act as if the client had sent one long line. For some reason I had thought [chan pending] would tell me how many characters would be in the next line returned by [chan gets], but here this is only the case when there's at most one line per packet. By "packet" I mean the batch of input characters which triggered the readable event.

What's the right way to handle this?

When I detect that the next [gets] may exceed $maxline, I could instead do a [chan read [expr {$maxline + 1}]]] and check for newline. If there's a newline, everything up to the (first) newline is the next line to process, and the newline itself is discarded. But what do I do with the rest of the input? What has been read, cannot be unread. Basically I would have to move the input buffering facility from the Tcl I/O subsystem into my own script. Things get really nasty when the read contains both HTTP header and body text, because the header text needs CR/LF translation and the body text does not. I can't put CR's back into the body text, because I don't know if they were there originally. So I would not be able to use Tcl's CR/LF translation for the header; instead I'd need to strip CR's in the script.

I think the goal of [chan pending] was to avoid having to do this, because the approach I have described is the same as what is required in the absence of [chan pending].

APN How about doing the chan gets before the chan pending? Something like -

proc wibble::getline {chan} {
    while {1} {
        if {[chan gets $chan line] >= 0} {
            # Got a whole line, may be more than 4096, but that's ok
            return $line
        } elseif {[chan blocked $chan]} {
            # Partial line, see if max length exceeded
            if {[chan pending input $chan] > 4096} {
                # Be anal. Do a gets again since data might have come in
                # between the chan gets and the chan pending calls.
                # For example, there are 4000 bytes without crlf when
                # the chan gets was called. Then another 1000 bytes
                # arrive with a crlf at the front before the chan pending
                # call. The line length limit is not exceeded in that
                # case even through chan pending returns > 4096.
                if {[chan gets $chan line] >= 0} {
                    return $line
                } else {
                    error "line length greater than 4096"
                }
            } else {
                # Incomplete line, but not exceeding max length. Wait for more.
                yield
            }
        } else {
            # EOF
            chan close $chan
            return -level [info level]
        }
    }
}

APN Why is the encoding of the output channel set to ISO8859-1 ? Should it not be UTF-8 ?

AMG: I don't remember! Does anyone around here know why someone would want to use ISO8859-1 in this situation?

AMG: I finally figured it out. Character sets other than ISO8859-1 require a proper charset parameter in the Content-Type response header. Also when there is an Accept request header, the server should try its best to honor it. I didn't want to deal with any of this, so I stuck with ISO8859-1. Support for other character sets is a possible future enhancement.

The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. [L6 ]


MAKR (2009-10-12): I just stumbled over this nice little thing ... I'd like to know what the license of this code is? Would it be possible to categorize it under the same license as Tcl's?

AMG: Sure, no problem. Wibble uses the same license as Tcl.


jcw 2009-11-11: Thanks Colin for mentioning this page. Wow, Andy, this is a fascinating server - and a great way for me to dive into new 8.6 stuff...

I was wondering why you need to pass around both requests and responses separately. Couldn't each newly-created response dict have the request as one of its entries? The same request would then be shared by each tentative response you create. If responses are derived from other responses (not sure that happens), the original request would be in there, but buried a bit deeper.

AMG: Thanks for the words of encouragement! Tonight I will contemplate the request/response dichotomy. For now, I just noticed that wibble::filejoin probably doesn't not work right when a directory name has a trailing dot. For example "foo./bar" might mutate into "foobar". I'll verify this as well.

AMG: Updates: I fixed wibble::filejoin by removing it--- it's actually not needed. I incorporated APN's wibble::getline. I put in some license information.

As for keeping request and response separate, I think this makes the zone handlers easier to write. They receive two arguments, the input request dict (which tells them what to do) and the output response dict (which they update). What benefit is there to merging the two arguments? I want to keep the dict structure as flat as I can, but I also want to avoid collisions between the request and response dicts. The simplest way to accomplish both is to make these two be separate variables.

APN: Updates: fixed non-chunked form handling (getline should have been getblock). I now use wibble in place of tclhttpd in BowWow. Smaller and more malleable.

AMG: I put that bug there to test your vision. :^) Actually I bet it crashed the first time you did a non-chunked post. I was so busy testing chunked transfers that I forgot to test non-chunked! Also, thanks for taking the leap; I would be delighted if you shared your experiences using Wibble. I haven't actually used it for anything yet, even though I had plans.


AMG: MS, I am curious about your edit (changing wibble::handle to use variable instead of a namespace qualifier). I had thought that using namespace qualifiers is faster when the variable only gets used once or twice. Is there something I'm missing?

MS Try sourcing wibble from a different namespace: namespace eval foo source /path/to/wibble. The new version works, the old one doesn't. Since 8.5 variable is bcc'ed and does not incur the old perf penalty.

AMG: Thanks! Good to know. :^)


jnc: In regards to multiple form elements of the same name, it comes in very hand when dealing with lists of items. Two examples: you are editing a parent and you have a table below containing <input name="child_id" type="hidden"/> <input name="child_name"/> <input name="child_dob"/>. When you get the data in your web app, it typically is a list. Therefore you can loop through the data easily. Now, another very helpful situation is when using list data again, and this time with checkboxes to select elements. Say you have a list of emails from your pop3 box that you are previewing the From/Subject for. The idea is you are going to delete these messages before downloading them into your mail program. Again, you have a list: <input type="checkbox" name="delete_email_ids" value="$email_id"/>. Now, with checkboxes, you only get the data submitted via the web browser if the item is checked. So, in the end, you get a list of only email_id's that the user wants to delete. The programmer can easily do: foreach id $delete_email_ids { ... }

AMG: That's (part of) what rawquery is for. (I admit that rawquery isn't easy to pick apart.) Since I anticipate that queries will most frequently be accessed like a dict, the query element of the request dictionary is a dict. If it turns out that like-named form elements are frequently needed, I can make a listquery or similar element that like query except constructed using [lappend] instead of [dict set]. Both would be compatible with [foreach] and [dict get]. Because of this compatibility, I could also make query itself be a key-value list instead of a dict. However, this would result in a copy of query shimmering to dict every time [dict get] is used to get a query variable. Suggestions?

jcw - Just one... fix dict so it keeps the data structure as a list, and stores the hash somewhere else ;)

AMG: I'm not entirely sure how that would work. Using [dict set] should overwrite existing elements, so I would have to use [lappend] in order to support multiple like-named elements. But later if I access query using [dict get], a temporary dict-type copy would be constructed. You're suggesting (tongue-in-cheek) that the hash table not be temporary but somehow get attached to the list-type Tcl_Obj for future use by [dict]. Cloverfield has ideas in that direction, but Tcl doesn't work that way. Unless... perhaps the list and dict types can be partially unified to reduce the cost of shimmering. It would be like a free trade treaty between neighboring nations. :^) Conceptually, that would make sense because list and dict are so similar, especially since dict was changed (in Tcl 8.5 alpha) to preserve element order. Practically, there are obstacles, for example changing or extending the C API.


APN: When run against Tcl built against CVS head as of 12/18/2009, wibble fails with the error "invalid command name process". The fix is to change the coroutine call in wibble::accept to pass in [namespace current]::process instead of process. The question is whether this change in coroutine command in how it resolves names was intentional or not.

AMG: That's odd. [namespace current]::process works fine, but [namespace code process] does not. When I try the latter, I get "cannot yield: C stack busy".

MS (after today's fix, but it shouldn't matter)

    % proc a {} {yield; return hello}
    % coroutine foo {*}[namespace code a]
    % foo
    hello

APN The current coroutine docs state At the point when command is called, the current namespace will be the global namespace and there will be no stack frames above it (in the sense of upvar and uplevel). However, which command to call will be determined in the namespace that the coroutine command was called from. Based on this I would presume the original code should have worked as well (without qualifiers) since the coroutine command should have resolved process to be wibble::process based on the namespace it was called from. This behaviour seems to have changed very recently, perhaps the documentation has not been updated?

MS 2009-12-19 looks like [bug #2917627] [L7 ], fixed in head. APN Confirmed.

AMG: MS, thanks for your fix. I reverted my workaround, since it's no longer needed. New topic: Why does [namespace code] add a frame to the C stack?


AMG: For the last couple days, I've been working on parsing the HTTP headers into lists and dicts and such. This is actually very complicated because the HTTP header grammar is an unholy abomination, and that's only counting the part that's documented! I'll post my work soon, but it's not going to be pretty. Also I hope to add support for file uploads through POST (not just PUT), since that is required for my project. This will also add the complexity of MIME-style multipart messages, possibly nested. It all makes me want to vomit. HTTP works better as a human language than as a computer language, like it was optimized for producing deceptively simple examples, not genuinely simple code. To paraphrase Dan Bernstein [L8 ]: "There are two types of interfaces in the world of computing: good interfaces and user interfaces."

jcw 2010-02-24 - I'd like to try bringing TclOO into this thing, so that some of the state can be managed as object variables, and to simplify adding utility code which can be used in each request (also mixins, forwards, etc). My previous code was based on an httpd.tcl rewrite (see [L9 ] and [L10 ] - each sourced in its own namespace), but wibble makes a more powerful substrate to build on. My idea would be to create a separate TclOO class, with objects named the same as the coroutines to allow a simple mapping without having to change this clean core. Andy, have you considered this option? Any thoughts on how to best bring it together?

AMG: Sorry, I don't have time now. :^( But I am interested. I hadn't thought of using TclOO, since I've never used TclOO in a project before. I really would like to see how it can improve Wibble. First I want to finish the list+dict project we discussed in email (I have code now!!), then I'll get back to the HTTP header parsing (I have about half of that code done). However I am stupidly starting another art project, plus I have a massive work schedule which will soon include another two weeks "vacation" from both the Internet and my personal computers. Oh well, Wibble started as a project to benefit the person I'm doing the art for, so I guess it's okay. :^) Perhaps this weekend I will have time to review your code to see how TclOO is used.

jcw - No worries. I can save you some time though: the links I mentioned don't use TclOO, so you can ignore them :) - I just think by now it has become a good option to consider. I'll try some things out.


dzach 2010-05-12: I've encountered an odd problem with proc getblock, which hangs wibble when sending a POST request from an IE8 in Windows XP: the append chunk $chunklet line seems to be failing to initialize variable chunk. IE8, in my configuration, sends the POST content in two packets, the first of which is empty (size is 0). When this happens wibble hangs for ever. Firefox sends the POST content in one packet and the problem does not appear. Initializing chunk outside the while loop with: set chunk "", solves the problem. In the test setup, Wibble runs on Linux with Tcl8.6b1.2.

AMG: That is very odd. I should get IE8 for myself to test, but I don't have it right this minute. Two-argument [append] always creates the variable if it doesn't exist, regardless of its second argument. In tclExecute.c, doStoreStk is part of the implementation of compiled append, and it calls TclObjLookupVarEx() with the "create" flags set.h. For the sake of argument, let's say that [append] is buggy and fails to create chunk when chunklet is empty. $size should be nonzero in this case, and [chan eof $chan] is false because IE8 hasn't closed the channel yet, so [yield] is executed. Nowhere in this code path does chunk's existence matter, and there's no reason why pre-creating chunk would have an effect. The little information available suggests there's a bug in Tcl, but I can't say for sure. Please put in some debug prints to stderr. In particular, I want prints before and after the yield, to see which branch of the [if] is being taken and whether the coroutine is ever resumed. Do this both without and with your set chunk "" workaround. Oh, another question: When Wibble hangs, does it take up 0% or 100% CPU time?

dzach: The tests run with utf-8 encoding. I noticed that when I insert the puts stderr $chunklet line, the problem does not appear. If I take out that line, I reproduce the problem. CPU load is 100%. Here come the tests:

 # test proc
 proc ::wibble::getblock {chan size} {
     puts stderr size=$size
     while {1} {
        set chunklet [chan read $chan $size]
        puts stderr "{$size - [string length $chunklet]}"
        set size [expr {$size - [string length $chunklet]}]
        append chunk $chunklet
        if {$size == 0} {
          puts stderr "return size==0"
          return $chunk
        } elseif {[chan eof $chan]} {
          chan close $chan
          puts stderr "return eof"
          return -level [info level]
        } else {
          puts stderr "before yield"
          yield
          puts stderr "after yield"
        }
     }
 }

 # output
 size=56
 {56 - 0}
 before yield
 after yield
 {56 - 56}

 # test proc (work-around)
 proc ::wibble::getblock {chan size} {
     puts stderr size=$size
     set chunk ""
     while {1} {
        set chunklet [chan read $chan $size]
        puts stderr "{$size - [string length $chunklet]}"
        puts stderr chunklet=$chunklet
        set size [expr {$size - [string length $chunklet]}]
        append chunk $chunklet
        if {$size == 0} {
          puts stderr "return size==0"
          return $chunk
        } elseif {[chan eof $chan]} {
          chan close $chan
          puts stderr "return eof"
          return -level [info level]
        } else {
          puts stderr "before yield"
          yield
          puts stderr "after yield"
        }
     }
 }

 # output (work-around)
 size=56
 {56 - 0}
 chunklet=
 before yield
 after yield
 {56 - 56}
 chunklet=name=&geometry=38.079226,23.930992&description=&dtstart=
 chunk=name=&geometry=38.079226,23.930992&description=&dtstart=
 return size==0

AMG: I created a simple method="post" form, installed IE8, and submitted data several times. IE8 always sent the full form data as a single packet, with no incomplete or zero-sized reads. Is there anything special about your form that's making IE8 do what it does? I'm running the Tcl CVS HEAD compiled earlier today (12 May 2010, ChangeLog revision 1.5108), on Slackware 13.0. How about you?

Also, please explain why you stated that you used UTF-8 encoding. At present, Wibble is only designed to work with ISO8859-1, because it doesn't have the smarts to figure out what encodings are acceptable to the client, and ISO8859-1 is the legal fallback in that case. (I have a project to fix this, but it's stalled due to lack of time.) How is it that you're using UTF-8?

If simply puts'ing the value of $chunklet makes the problem go away, most likely there is a Tcl bug. Please explore this a little more. What happens if you take the value of $chunklet and ignore it? For example, "list $chuklet". If adding that makes the problem go away, something is definitely going wacky under the hood, and you should file a bug report. Or maybe it's the puts that's having an effect, and it doesn't matter what text is being printed. Again, that's a Tcl problem, since the output channel is unrelated to network communication with the client.

100% CPU load suggests that [chan read] is returning empty string, yet the channel is still readable and [chan event] keeps calling into the coroutine. That doesn't make sense to me.

I see one more oddity I can't explain. Your debug print shows geometry to be "38.079226,23.930992", yet both Firefox 3.6.3 and IE8 encode this as "38.079226%2C23.930992". What's going on here? Why isn't your comma also encoded?

By the way, this page is getting too long for my liking. I wish some benevolent wikignomes would clean it up for me! :^)

dzach: Wibble, in my tests, substitutes a hand crafted -for speed- web server, and runs on a Kubuntu (2.6.32-22 kernel) with tcl from CVS HEAD (changelog 1.5098/Sat May 1 11:20:12 2010). IE8 is on the same machine running windows XP in a Sun's VirtualBox.

The POST is sent using a typical javascript AJAX XMLHttpRequest: req.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8"). In my narrow scope application, Wibble serves UTF encoded (mostly dynamic) content (in Greek) and user POSTed submissions get decoded inside a modified wibble::process using encoding convertto UTF-8 [dict get $response content] (could as well do it outside Wibble, since all this is done after getblock finishes reading the channel). I suspect UTF-8 might have something to do with the problem.

The POST content send by the browser using AJAX is encoded with javascript's encodeURI() which leaves some characters (, / ? : @ & = + $ #) unencoded. I'll check if something is missing or if changing this makes a difference. I'll also try to run IE8 on another machine using a native windows XP installation.

(A little later): IE8 hangs the server from a native XP installation too. Changing the encodeURI to encodeURIComponent (which encodes comma too) has no effect to the result. Away from the UTF issue, it seems that the problem is cured as soon as chunklet gets "stringified". I guess it would be worth while to test if chan read returns a string type when it reads nothing. Having said that, initializing chunklet instead of chunk doesn't help, so my guessing might be wrong.

AMG: dzach, thanks for your testing. Please keep us posted. If anyone has a clue what might be going on here, please jump in and give me a hint, 'cuz I'm mystified. :^)

dzach: I like Wibble's Zen, so testing it is fun, given the existance of a work-around for this problem. If I get the time, I'll try to write a simple AJAX test so that others can reproduce it.

dzach 2010-5-17: The problem appears with an unmodified Wibble (ISO8859-1 encoding), ruling out a UTF-8 involvement. To reproduce the error, use the code in http://paste.tclers.tk/2087 and Internet Explorer 8.


See also