Version 38 of Wibble discussion

Updated 2011-01-01 18:44:39 by jblz

AMG: This page is for general discussion on the subject of the Wibble web server. For discussion of bugs, see Wibble bugs.

Fetching backrefs...

jbr 2010-11-24

I was looking at the main getrequest loop in wibble and see that the request dict is constantly having a few zone dependent values inserted/removed from it as a request moves along the zone chain. I understand that you are trying to keep the nice clean "handler $request $response" api clean. This seems artificially. Why not just split the zone dependent parts into a third dict and pass that in to the handler also? The code will be cleaner, and no need to "filter out", which is a clumsy operation.

Here is a little untested code:

    # Create a dict of zone options
    #
    dict set zoneopts prefix $prefix
    dict set zoneopts suffix [string range $path\
                            [string length $prefix] end]
    if {[dict exists $options root]} {
        dict set zoneopts fspath\
            [dict get $options root]/[dict get $zoneopts suffix]
    }
    set zoneopts [dict merge $zoneopts $options]

    # Invoke the handler and process its outcome.
    #
    try {
        {*}$command $request $response $zoneopts
    } on 5 outcome {
        # Update the state tree and continue processing.
        #
        set state [lreplace $state $i $i+1 {*}$outcome]
    } on 6 outcome {
        # A response has been obtained.  Return it.
        #
        return $outcome
    }

I might even go as far as creating the fspath value even if the root option doesn't exist if I could squish the code down to a completely functional style.

    dict set zoneopts [dict merge $options [list prefix $prefix suffix [string rang...] ...]

As an incentive for pursuing this, since the request dict no longer needs to be fixed up, it may now be possible to tailcall down the chain of zone handlers. I'll think about how to construct this.

AMG: Sounds like a good idea. I think I'll make this change. First, let me consider your other comments below...

AMG, 2010-12-18: Done! Instead of merging request and options, they are separate sub-dictionaries inside a unified state dict. I prefer this to having the three arguments, since it doesn't require me to remember what order everything's in. ;^)


jbr 2010-11-25 (Happy Thanksgiving)

I've been thinking about the practical use of zone handlers and I've a questions and ideas.

First - There are no example handlers that return multiple request/response pairs. Are there example use cases that require this feature?

AMG: Yes: [indexfile]. If a directory /dir is being requested, it forks the request tree to contain requests for both /dir and /dir/index.html . It does so without first checking that index.html exists, because it can't know if this is supposed to be a file on disk or the product of a later zone handler. Leaping without looking. :^)

jbr: Along this line - if I were to modify a request or generate a new one and return it, how can I reset the zone handler loop so that my new request is looked at by all the zones, not just those remaining down stream of the current zone?

AMG: You can't, otherwise the server would hang forever when your zone handler is reinvoked and once again decides to reset the loop. I couldn't think of any uses for this feature, so I didn't bother with it. You can use a zone handler more than once if you really need this.

jbr: Second - It seems that zone handlers are of at least 3 distinct types.

  • Guards
  • Content
  • Helpers

Guard handlers are early on the list and implement authentication, session or protection of particular file types. I noticed that specifically asking for "file.tmpl" or "file.script" causes wibble to return application code! In a working system perhaps a guard handler placed before the static zone would explicitly reject these file types.

AMG: Directly returning file.tmpl or file.script is helpful to me for testing and debugging, and I actually do open up those sources on a site I developed a few years before Wibble. If that's not what you want, you can easily restrict access using a zone handler. See [L1 ] for a simple one that might work for you.

jbr: Content handlers return in line content or static files to serve up.

Helpers can be used to fill in desirable but missing parts of the response. In the current architecture they must work closely with Content zones and other helper zones. Example - mime types. If a mime type zone handler is included late in the chain, the static handler must not return the completed response. It fills in the file name but waits for the mime type to identify the content. I'm not sure this is a good thing. Perhaps its better to just call the mime type proc from the static handler and be done. There are a lot of complex interactions that can take place here, best to avoid them?.

Idea - Have several chains that a request/response must traverse. Just like iptables in linux networking:

  • Input - verify/modify request
  • Content - generate response
  • Output - Fill in missing header just, fix up response

Idea - A zone handler might radically modify the request and want to restart zone processing. The current request needs to be aborted, and the new one inserted in the input queue. For example, I have dynamically generated .doc and .xls files. I currently call a tclhttpd template to create the content in a temporary directory and then send the browser a redirect to the new file. It might be nice to create the file, set the fspath request value and content-disposition response header and then reset the zone handler chain. This could work now by placing the dynamic zone in front of the static zone, but I'd like to eliminate the interactions between zones.

I'm thinking about how to write this code.

AMG: Again, the zone handlers can be sequenced however you like, and it's possible to use a single zone handler multiple times. Also, you had another idea about one zone handler calling another one directly. That might be a valid way to implement custom sequencing without changing the core. The outer zone handler procedure can intercept the [nextrequest] and [sendresponse] commands of the inner zone handlers by calling them inside a try block.

Also please think about character encoding. I haven't done that properly yet. I understand that [chan copy] can convert character encoding according to the encodings of the input and output channels; I just haven't tried using this feature yet, mostly since being able to get the acceptable encodings is a new feature.


jbr 2010-11-28

Another Sunday afternoon noodling about with code. I've begun to use wibble and add the stuff that might make it a more complete product. Here are a couple of discoveries that I've bumped up against.

  • post queries are parsed into the request dict with an extra level of list to accommodate the "content-disposition" value used in file upload. This adds an annoying extra lindex or something for most all other usage. When I've done this in the past (parsed mixed-multipart) I've set the keyword value to the file name (content-disposition) and placed the content on disk in a staging directory. This isn't the best but its an idea. It does allow streaming the post data from the socket to disk in chunks. I'd ditch the extra list level for most usage and only use it in the file upload case.
  • There is an extra list level in cookie parsing also, don't know why yet.

AMG: For query parsing, this is to distinguish between a key existing and having a value. For request header and post parsing, it serves to separate a key's "main content" from its "attributes" or "metadata". I chose empty string for the element name because it can't possibly collide with any attributes. I wasn't entirely consistent about this: in the response dictionary I say "content" instead of empty string. But that's the response dictionary, so I can be different. ;^)

This extra level is inescapable: a dict element can either contain a "leaf" value or a child dictionary; it can't contain both, not without some means of encapsulation (e.g. a list). That's like having something on your disk that's both a file and a subdirectory. Instead, move the "file" into the subdirectory and give it a reserved name. Empty string is good, because no "real" file can have that name. ;^)

I don't much care for the idea of streaming all uploaded file data to disk. Definitely it makes sense for some cases, but I'm not sure how to distinguish. I'd rather just keep it all in memory and let the zone handlers write it wherever it wants, however it wants. If your concern is to prevent memory exhaustion by uploading huge files, that's already a problem without file uploads! Just send a header containing a neverending stream of lines shorter than 4096 bytes. No POST required. In other words, that's a problem I have to solve by more general means, not by special-casing file uploads.

[dict exists $query foo] checks if the foo key exists. [dict exists $query foo ""] checks if the foo key exists and has a value. [dict get $query foo ""] gets that value. I know, I could have made values default to empty string, but I decided against that, since that wouldn't solve my attribute problem. Basically I implement the concept of null using out-of-band signaling. It's not so odious in this context, since you're using the [dict] command anyway and it can easily chew through the extra level. I'm not sure what you would use [lindex] for.

The vars zone will help you sort this all out. This HTML snippet:

<form method="post" enctype="multipart/form-data" action="/vars">
  <input type="text" name="foo" />
  <input type="text" name="bar" />
  <input type="file" name="quux" />
  <input type="submit" value="Submit" />
</form>

Results in this vars output:

post foo content-disposition {} form-data
post foo content-disposition name foo
post foo {} value of foo
post bar content-disposition {} form-data
post bar content-disposition name bar
post bar {} value of bar
post quux content-disposition {} form-data
post quux content-disposition name quux
post quux content-disposition filename data.txt
post quux content-type {} application/octet-stream
post quux {} contents of data.txt
rawpost (len=585)

This shows that foo=value%20of%20foo and bar=value%20of%20bar. All three form inputs have content-disposition=form-data, but their content-disposition name attributes differ. content-disposition filename is available for quux, and there's a content-type too.

I know this all seems very complicated (it is), but it's actually a literal translation of HTTP. In other words: don't blame me, I'm just being faithful to the standards documents!

Testing out query strings is easier, since you don't need a <form>. Just go to http://localhost:8080/vars?foo=value+of+foo&bar=value+of+bar&key+with+no+value . By the way, here's a little patch for you: Edit [dumprequest] to change eq "post" to in {post query}. This makes display of query values consistent with display of post values.

query foo {} value of foo
query bar {} value of bar
query {key with no value}
rawquery ?foo=value+of+foo&bar=value+of+bar&key+with+no+value

Notice the presence of {} (a.k.a. "") at the end of the query keys with values and the absence of same at the end of query keys without values.

jbr - Ah. Thank you for this description. I was wondering why the "vars" output was listing the empty value on the "key" side of the display. When you just print out the value you don't see the intended structure and I was proffering lindex to get past the {} and onto the value, Of course in practice I too used an extra level of dict as you suggest.

jbr, cont.:

  • The zone handlers are first grouped by prefix and then executed in order. This is confusing.

AMG: Yup, you're totally right. That's my mistake. Just a few minutes ago I ran face-first into that same problem. Hard. I may have broken my nose.

You may be waffling about whether or not it's a bug, but I firmly believe it's a bug. :^) Now, how shall I fix it? I guess I should simply not group by prefix. I'll probably fix this at the same time I implement the separate zoneopts dict.

jbr, cont.: My first attempt at authentication was :

 wibble::handler / authenticate
 wibble::handler /vars
 wibble::handler / dirslash
 ...
 wibble::handler / notfound

I intended the first authenticate handler to force all server access to be logged in and then go down the list processing prefixes. This doesn't work because all of the "/" prefix handlers are tried first, then the "/vars" prefixes. Unfortunately notfound is encountered at the end of the "/" list. I think that this is a bug.

After having dinner and considering, I'll retract my statement that its a bug and leave it that it needs better explanation/examples. It would still be nice to have some method of responding to a request by prefix without committing ones handler to a specific prefix handler list (that a mouthful.) Consider my current handler setup:

    wibble::handle /vars   authenticate
    wibble::handle /vars   authorize expr { $user ne guest }
    wibble::handle /vars   vars

    wibble::handle /data   authenticate
    wibble::handle /data   authorize expr { $user in [group data] }
    wibble::handle /data   dirslash root $root
    wibble::handle /data   static root $root
    wibble::handle /data   dirlist root $root
    wibble::handle /data   notfound

    wibble::handle /       authenticate
    wibble::handle /       authorize expr { [dotaccess $user] }
    wibble::handle /       dirslash root $root
    wibble::handle /       indexfile root $root indexfile index.html
    wibble::handle /       static root $root
    wibble::handle /       template root $root
    wibble::handle /       script root $root
    wibble::handle /       dirlist root $root
    wibble::handle /       notfound

But maybe this is pretty good.


jbr 2010-12-12

Here is a new zone registration function to fix the above discussed bug and create the data structure needed by getresponce.

 # Register a zone handler.
 proc wibble::handle {zone command args} {
    variable zones
    if { [lindex $zones end-1] eq $zone } {
        set handlers [lindex $zones end]
        lappend handlers [list $command $args]
        lset zones end $handlers
    } else {
        lappend zones $zone [list [list $command $args]]
    }
 }

AMG: I flattened out the zone handler list. Out-of-order zone handlers should no longer be a problem. Please give it a try. Note that you'll have to modify your custom zone handlers to work with the new version of Wibble.


jblz 2010-12-18

Just wanted to mention that the latest ActiveTcl 8.6 (8.6.0.0b4) does not appear to suffer from the bug reported here , and is probably suitable for wibble. Also, you can find Tcl 8.6 tclkits built after 2010-09-15 here .


jblz 2010-12-18

Hi AMG, Having a lot of fun with wibble!

In each of the zone handlers there is a line:

  dict with state request {}; dict with state options {}

I believe that request & options are sub-dicts of the state dict, and i am familiar with the usage of dict with if there is a body defined, but i am totally clueless as to what: "dict with state request {}" might accomplish.

AMG: Glad to hear it! I'm taking advantage of the fact that the variables created by [dict with] persist after it finishes executing. This basically dumps the contents of state request and state options into local variables for easier access. When the code isn't trying to modify the dict, it doesn't matter if it's inside the [dict with] body or after it. I put it after just to avoid indentation. ;^)

jbr - [dict with] also reloads the dict variables back into the dict variable after the body executes. I believe that I remember discussion that this step is optimized away if body is empty.

jblz very slick trick! Thanks for the quick response!


jbr 2010-12-20 I have updated the WebSocket with a start on code for wibble.

AMG: Thanks. I should give Chrome a try one of these days. Right now JCW and I are focusing on AJAX instead, specifically because WebSocket isn't widely implemented in browsers.


jbr 2010-12-20 I notice that the example zone handlers always end with [nexthandler $state] even when they don't modify the state. Shouldn't they just return? Then they won't incur exception processing and the overhead to update the state list in getresponse?

AMG: I doubt exception processing costs significantly more than any other kind of return, but you're right that it is a waste to modify the state list without really changing it. ;^) I'll make an update in a few minutes.


OP - 2010-12-27 05:47:36

Hello. Long polling with Ajax is an interesting technique. Any idea on how to implement Comet like server push ? My idea would be to create a hidden iframe in my client HTML that gets refreshed by a push.script working with suspend/resume, pushing content if available then sleeping a while. Can i just flush content parts to the client socket ? What would be the Wibble way to achieve that ? Thanks for your suggestions !

AMG: I must admit ignorance of Comet, but I believe I know what you mean by push. I've already written working test code that does it. I didn't use an <iframe>, although that's an option; I used JavaScript to replace the contents of <div>s, etc., in the page, then immediately start another XMLHttpRequest to poll for more data.

In my as-yet-unpublished development version, I have rewritten [suspend] and [resume] to form a notification system. As in the current version, they take arguments indicating what events to wait for or what event has happened, respectively. In addition, [suspend] takes an optional timeout, and [resume] takes the socket name(s) of the coroutine(s) to notify (can be "all" for all coroutines interested in the event). If [resume] is called from within a coroutine, it reschedules itself to be called from the event loop, then returns. Extra arguments can be passed to [resume]; [suspend] returns these arguments as a list, preceded by the event name. [suspend] also recognizes "readable" as an event name, and I already mentioned the optional timeout; these tie into the Tcl event loop and can wake without a [resume].

There's a [resumecmd] separate from [resume] that constructs a command prefix calling [resume] for use with [after], [chan copy], etc. [resumecmd] additionally recognizes the special coroutine name "self", which it translates it to the real current coroutine name. Remember that the coroutine name is equal to the socket name. The only gotcha is that the coroutines are in the "wibble" namespace.

Content gets flushed after the zone handlers finish. Don't put a read/write loop inside a zone handler; that's already in Wibble, which calls the zone handlers between reading and writing. The zone handler system is designed to generates a response to a request, not to get the request, not to send the response. Zone handlers can also suspend the coroutine until there's data to send.

I'll put up my new code soon, along with an example showing asynchronous push notifications. For now, look at the Wibble AJAX example [L2 ], which simply notifies a client when a timer expires.


OP - 2010-12-28 04:49:30

Playing with Wibble, coroutine and server push.

I took insiration from this page http://www.zeitoun.net/articles/comet_and_php/start I just changed the server page reference so that it points to this Wibble script:

set socket [dict get $request socket]
set ns push_$socket
namespace eval $ns {
        variable socket 
        
        proc start { s } {
                variable socket $s
                chan puts $socket "HTTP/1.0 200 OK"
                chan puts $socket "Content-Type: text/html\n"
                chan puts $socket "<html><body>"
                coroutine [namespace current]::iter push
                iter
        }
        
        proc push {} {
                variable socket
                yield
                set i 0
                while { 1 } {
                        puts "iter $i"
                        if { [eof $socket] } { break }
                        chan puts $socket "<script type=\"text/javascript\">"
                        chan puts $socket "window.parent.comet.printServerTime('[clock format [clock seconds]] ($i)');"
                        chan puts $socket "</script>"
                        chan flush $socket
                        after 1000 [namespace current]::iter
                        yield $i
                        incr i
                }
        }
}

${ns}::start $socket

AMG, this is a basic test i made before reading your answer to my question (i am nowhere in understanding Wibble and coroutine magic)

Basically, it works (the client page displays updated time and counter information) and the server keeps running and responding to requests.

I am trying to avoid chaining Ajax requests or playing with long polling requests because UI parts already heavilly rely on Ajax calls. I have yet to discover what may be wrong with this straight approach (i have to fully understand your previous comments).

Thanks for your comments (by the way thanks to all contributors to Wibble, it rocks !)

AMG: AJAX is an integral feature of Wibble (there's an inside joke in that statement if you know where to look), so you don't need to implement it in your zone handler. Here's how I would implement the same thing. It doesn't need to create its own coroutines, nor do its own I/O. Also it doesn't rely on Comet; it sends the client all the JavaScript it needs for AJAX. I put this in a file called index.html.tmpl:

% dict set response header content-type text/html
% if {[info exists query] && [dict exists $query update]} {
%   if {![dict exists $query now]} {
%     suspend {} 1000
%   }
[enhtml [clock format [clock seconds]]]
% } else {
<html><head><script type="text/javascript">
function update(initialize) {
  var xhr, span = document.getElementById("time");
  if (window.XMLHttpRequest) {
    xhr = new XMLHttpRequest();
  } else {
    xhr = new ActiveXObject("Microsoft.XMLHTTP");
  }
  xhr.onreadystatechange = function() {
    if (xhr.readyState == 4 && xhr.status == 200) {
      span.innerHTML = xhr.responseText;
      update(false);
    }
  };
  xhr.open("GET", (initialize ? "?update&now" : "?update"), true);
  xhr.send();
}
</script></head><body onload="update(true)">
  The current time is: <span id="time"></span>
</body></html>
% }

The [suspend] command in the currently published version of Wibble doesn't have the timeout feature, so for now you will have to replace the "suspend {} 1000" line with this:

%     after 1000 [resume timeout]
%     suspend timeout

htmlgen + wibble

UPDATE 1-1-11 (HAPPY NEW YEAR!)

I may move this to it's own page if i end up using htmlgen in a serious way, but for the meantime, this is how to use htmlgen with wibble:

  1. Download xmlgen from sourceforge
  2. Decompress the directory, and place it in a directory listed on your $::tcl_pkgPath
  3. Because htmlgen creates a proc named "script" (for the "script" html tag, of course), you will need to rename the "script" zonehandler to something like "ascript"
  4. Create a file such as the following in your wibble $root directory, with a name like foo.html.script
# this is a reimplementation of the code in the /vars zonehandler
# adapted to using htmlgen

package require htmlgen
namespace import ::htmlgen::*

dict set response header content-type text/html

::htmlgen::buffer ::DOC {
    html ! {
        head ! {
            title - Playing Vars
            style type=text/css - {
                body {font-family: monospace}
                table {border-collapse: collapse; outline: 1px solid #000}
                th {white-space: nowrap; text-align: left}
                th, td {border: 1px solid #727772}
                tr:nth-child(odd) {background-color: #ded}
                tr:nth-child(even) {background-color: #eee}
                th.title {background-color: #8d958d; text-align: center}
            }
        }
        body ! {
            div id=wrapper ! {
                table ! {
                    foreach {dictname dictvalue} $state {
                        if {$dictname eq "request"} {
                            set dictvalue [dumprequest $dictvalue]
                        }
                        tr ! {
                            th class=title colspan=2 + {
                                [enhtml $dictname]
                            }
                        }
                        foreach {key value} $dictvalue {
                            tr ! {
                                th + {[enhtml $key]}
                                td + {[enhtml $value]}
                            }
                        }
                    }
                }
            }
        }
    }
}
    
dict append response content $::DOC

A few points worth mentioning

  • the default zonehandler sends .script files as text/plain, so make sure you change the "response header content-type" to text/html
  • because we have to send the finished page to the response dict, you'll need to use ::htmlgen::buffer to capture the output

Thanks to kbk in tcler's chat for helping me with wrangling htmlgen.


Cookie Convenience Procs

These procs are taken from the ngci package, and modified for use with wibble.

proc ::wibble::cookie {cookie} {
    upvar header cheader
    if {[dict exists $cheader cookie $cookie ""]} {
       return [dict get $cheader cookie $cookie ""]
    }
}

proc ::wibble::setcookie {args} {
    upvar response cresponse
    array set opt $args
    set line "$opt(-name)=$opt(-value) ;"
    foreach extra {path domain} {
        if {[info exists opt(-$extra)]} {
            append line " $extra=$opt(-$extra) ;"
        }
    }
    if {[info exists opt(-expires)]} {
        switch -glob -- $opt(-expires) {
            *GMT {
                set expires $opt(-expires)
            }
            default {
                set expires [clock format [clock scan $opt(-expires)] \
                    -format "%A, %d-%b-%Y %H:%M:%S GMT" -gmt 1]
            }
        }   
        append line " expires=$expires ;"
    }
    if {[info exists opt(-secure)]} {
        append line " secure "
    }
    dict set cresponse header Set-Cookie $line
}

Right now i can't seem to add more than one cookie per request. Changing dict set cresponse header Set-Cookie $line to dict lappend cresponse header Set-Cookie $line does put two Set-Cookie dicts in the response dict, but only the last one seems to be recognized by the browser. Any edits by an http guru to support multiple cookies per request would be super :).

Come to think of it, this may be a limitation of using a dict. Maybe only one pair of key/values named Set-Cookie are possible.