Version 17 of Wibble discussion

Updated 2010-12-18 21:48:42 by AMG

AMG: This page is for general discussion on the subject of the Wibble web server. For discussion of bugs, see Wibble bugs.

Fetching backrefs...

jbr 2010-11-24

I was looking at the main getrequest loop in wibble and see that the request dict is constantly having a few zone dependent values inserted/removed from it as a request moves along the zone chain. I understand that you are trying to keep the nice clean "handler $request $response" api clean. This seems artificially. Why not just split the zone dependent parts into a third dict and pass that in to the handler also? The code will be cleaner, and no need to "filter out", which is a clumsy operation.

Here is a little untested code:

    # Create a dict of zone options
    #
    dict set zoneopts prefix $prefix
    dict set zoneopts suffix [string range $path\
                            [string length $prefix] end]
    if {[dict exists $options root]} {
        dict set zoneopts fspath\
            [dict get $options root]/[dict get $zoneopts suffix]
    }
    set zoneopts [dict merge $zoneopts $options]

    # Invoke the handler and process its outcome.
    #
    try {
        {*}$command $request $response $zoneopts
    } on 5 outcome {
        # Update the state tree and continue processing.
        #
        set state [lreplace $state $i $i+1 {*}$outcome]
    } on 6 outcome {
        # A response has been obtained.  Return it.
        #
        return $outcome
    }

I might even go as far as creating the fspath value even if the root option doesn't exist if I could squish the code down to a completely functional style.

    dict set zoneopts [dict merge $options [list prefix $prefix suffix [string rang...] ...]

As an incentive for pursuing this, since the request dict no longer needs to be fixed up, it may now be possible to tailcall down the chain of zone handlers. I'll think about how to construct this.

AMG: Sounds like a good idea. I think I'll make this change. First, let me consider your other comments below...

[AMG, 2010-12-18: Done! Instead of merging request and options, they are separate sub-dictionaries inside a unified state dict. I prefer this to having the three arguments, since it doesn't require me to remember what order everything's in. ;^)


jbr 2010-11-25 (Happy Thanksgiving)

I've been thinking about the practical use of zone handlers and I've a questions and ideas.

First - There are no example handlers that return multiple request/response pairs. Are there example use cases that require this feature?

AMG: Yes: [indexfile]. If a directory /dir is being requested, it forks the request tree to contain requests for both /dir and /dir/index.html . It does so without first checking that index.html exists, because it can't know if this is supposed to be a file on disk or the product of a later zone handler. Leaping without looking. :^)

jbr: Along this line - if I were to modify a request or generate a new one and return it, how can I reset the zone handler loop so that my new request is looked at by all the zones, not just those remaining down stream of the current zone?

AMG: You can't, otherwise the server would hang forever when your zone handler is reinvoked and once again decides to reset the loop. I couldn't think of any uses for this feature, so I didn't bother with it. You can use a zone handler more than once if you really need this.

jbr: Second - It seems that zone handlers are of at least 3 distinct types.

  • Guards
  • Content
  • Helpers

Guard handlers are early on the list and implement authentication, session or protection of particular file types. I noticed that specifically asking for "file.tmpl" or "file.script" causes wibble to return application code! In a working system perhaps a guard handler placed before the static zone would explicitly reject these file types.

AMG: Directly returning file.tmpl or file.script is helpful to me for testing and debugging, and I actually do open up those sources on a site I developed a few years before Wibble. If that's not what you want, you can easily restrict access using a zone handler. See [L1 ] for a simple one that might work for you.

jbr: Content handlers return in line content or static files to serve up.

Helpers can be used to fill in desirable but missing parts of the response. In the current architecture they must work closely with Content zones and other helper zones. Example - mime types. If a mime type zone handler is included late in the chain, the static handler must not return the completed response. It fills in the file name but waits for the mime type to identify the content. I'm not sure this is a good thing. Perhaps its better to just call the mime type proc from the static handler and be done. There are a lot of complex interactions that can take place here, best to avoid them?.

Idea - Have several chains that a request/response must traverse. Just like iptables in linux networking:

  • Input - verify/modify request
  • Content - generate response
  • Output - Fill in missing header just, fix up response

Idea - A zone handler might radically modify the request and want to restart zone processing. The current request needs to be aborted, and the new one inserted in the input queue. For example, I have dynamically generated .doc and .xls files. I currently call a tclhttpd template to create the content in a temporary directory and then send the browser a redirect to the new file. It might be nice to create the file, set the fspath request value and content-disposition response header and then reset the zone handler chain. This could work now by placing the dynamic zone in front of the static zone, but I'd like to eliminate the interactions between zones.

I'm thinking about how to write this code.

AMG: Again, the zone handlers can be sequenced however you like, and it's possible to use a single zone handler multiple times. Also, you had another idea about one zone handler calling another one directly. That might be a valid way to implement custom sequencing without changing the core. The outer zone handler procedure can intercept the [nextrequest] and [sendresponse] commands of the inner zone handlers by calling them inside a try block.

Also please think about character encoding. I haven't done that properly yet. I understand that [chan copy] can convert character encoding according to the encodings of the input and output channels; I just haven't tried using this feature yet, mostly since being able to get the acceptable encodings is a new feature.


jbr 2010-11-28

Another Sunday afternoon noodling about with code. I've begun to use wibble and add the stuff that might make it a more complete product. Here are a couple of discoveries that I've bumped up against.

  • post queries are parsed into the request dict with an extra level of list to accommodate the "content-disposition" value used in file upload. This adds an annoying extra lindex or something for most all other usage. When I've done this in the past (parsed mixed-multipart) I've set the keyword value to the file name (content-disposition) and placed the content on disk in a staging directory. This isn't the best but its an idea. It does allow streaming the post data from the socket to disk in chunks. I'd ditch the extra list level for most usage and only use it in the file upload case.
  • There is an extra list level in cookie parsing also, don't know why yet.

AMG: For query parsing, this is to distinguish between a key existing and having a value. For request header and post parsing, it serves to separate a key's "main content" from its "attributes" or "metadata". I chose empty string for the element name because it can't possibly collide with any attributes. I wasn't entirely consistent about this: in the response dictionary I say "content" instead of empty string. But that's the response dictionary, so I can be different. ;^)

This extra level is inescapable: a dict element can either contain a "leaf" value or a child dictionary; it can't contain both, not without some means of encapsulation (e.g. a list). That's like having something on your disk that's both a file and a subdirectory. Instead, move the "file" into the subdirectory and give it a reserved name. Empty string is good, because no "real" file can have that name. ;^)

I don't much care for the idea of streaming all uploaded file data to disk. Definitely it makes sense for some cases, but I'm not sure how to distinguish. I'd rather just keep it all in memory and let the zone handlers write it wherever it wants, however it wants. If your concern is to prevent memory exhaustion by uploading huge files, that's already a problem without file uploads! Just send a header containing a neverending stream of lines shorter than 4096 bytes. No POST required. In other words, that's a problem I have to solve by more general means, not by special-casing file uploads.

[dict exists $query foo] checks if the foo key exists. [dict exists $query foo ""] checks if the foo key exists and has a value. [dict get $query foo ""] gets that value. I know, I could have made values default to empty string, but I decided against that, since that wouldn't solve my attribute problem. Basically I implement the concept of null using out-of-band signaling. It's not so odious in this context, since you're using the [dict] command anyway and it can easily chew through the extra level. I'm not sure what you would use [lindex] for.

The vars zone will help you sort this all out. This HTML snippet:

<form method="post" enctype="multipart/form-data" action="/vars">
  <input type="text" name="foo" />
  <input type="text" name="bar" />
  <input type="file" name="quux" />
  <input type="submit" value="Submit" />
</form>

Results in this vars output:

post foo content-disposition {} form-data
post foo content-disposition name foo
post foo {} value of foo
post bar content-disposition {} form-data
post bar content-disposition name bar
post bar {} value of bar
post quux content-disposition {} form-data
post quux content-disposition name quux
post quux content-disposition filename data.txt
post quux content-type {} application/octet-stream
post quux {} contents of data.txt
rawpost (len=585)

This shows that foo=value%20of%20foo and bar=value%20of%20bar. All three form inputs have content-disposition=form-data, but their content-disposition name attributes differ. content-disposition filename is available for quux, and there's a content-type too.

I know this all seems very complicated (it is), but it's actually a literal translation of HTTP. In other words: don't blame me, I'm just being faithful to the standards documents!

Testing out query strings is easier, since you don't need a <form>. Just go to http://localhost:8080/vars?foo=value+of+foo&bar=value+of+bar&key+with+no+value . By the way, here's a little patch for you: Edit [dumprequest] to change eq "post" to in {post query}. This makes display of query values consistent with display of post values.

query foo {} value of foo
query bar {} value of bar
query {key with no value}
rawquery ?foo=value+of+foo&bar=value+of+bar&key+with+no+value

Notice the presence of {} (a.k.a. "") at the end of the query keys with values and the absence of same at the end of query keys without values.

jbr - Ah. Thank you for this description. I was wondering why the "vars" output was listing the empty value on the "key" side of the display. When you just print out the value you don't see the intended structure and I was proffering lindex to get past the {} and onto the value, Of course in practice I too used an extra level of dict as you suggest.

jbr, cont.:

  • The zone handlers are first grouped by prefix and then executed in order. This is confusing.

AMG: Yup, you're totally right. That's my mistake. Just a few minutes ago I ran face-first into that same problem. Hard. I may have broken my nose.

You may be waffling about whether or not it's a bug, but I firmly believe it's a bug. :^) Now, how shall I fix it? I guess I should simply not group by prefix. I'll probably fix this at the same time I implement the separate zoneopts dict.

jbr, cont.: My first attempt at authentication was :

 wibble::handler / authenticate
 wibble::handler /vars
 wibble::handler / dirslash
 ...
 wibble::handler / notfound

I intended the first authenticate handler to force all server access to be logged in and then go down the list processing prefixes. This doesn't work because all of the "/" prefix handlers are tried first, then the "/vars" prefixes. Unfortunately notfound is encountered at the end of the "/" list. I think that this is a bug.

After having dinner and considering, I'll retract my statement that its a bug and leave it that it needs better explanation/examples. It would still be nice to have some method of responding to a request by prefix without committing ones handler to a specific prefix handler list (that a mouthful.) Consider my current handler setup:

    wibble::handle /vars   authenticate
    wibble::handle /vars   authorize expr { $user ne guest }
    wibble::handle /vars   vars

    wibble::handle /data   authenticate
    wibble::handle /data   authorize expr { $user in [group data] }
    wibble::handle /data   dirslash root $root
    wibble::handle /data   static root $root
    wibble::handle /data   dirlist root $root
    wibble::handle /data   notfound

    wibble::handle /       authenticate
    wibble::handle /       authorize expr { [dotaccess $user] }
    wibble::handle /       dirslash root $root
    wibble::handle /       indexfile root $root indexfile index.html
    wibble::handle /       static root $root
    wibble::handle /       template root $root
    wibble::handle /       script root $root
    wibble::handle /       dirlist root $root
    wibble::handle /       notfound

But maybe this is pretty good.


jbr 2010-12-12

Here is a new zone registration function to fix the above discussed bug and create the data structure needed by getresponce.

 # Register a zone handler.
 proc wibble::handle {zone command args} {
    variable zones
    if { [lindex $zones end-1] eq $zone } {
        set handlers [lindex $zones end]
        lappend handlers [list $command $args]
        lset zones end $handlers
    } else {
        lappend zones $zone [list [list $command $args]]
    }
 }

AMG: I flattened out the zone handler list. Out-of-order zone handlers should no longer be a problem. Please give it a try. Note that you'll have to modify your custom zone handlers to work with the new version of Wibble.