Wibble discussion archive

AMG: I archive older discussion for Wibble on this page. For current discussion, see Wibble discussion.

Fetching backrefs...

Origin of Wibble

AMG: This discussion, moved from the Wub page, marks the origin of Wibble.

AMG: I am toying with the idea of putting together a site for serving a particular Web comic [L1 ]. Necessary features:

HTTP 1.0 (I doubt any 1.1 features are needed)
Serving static files from the filesystem
CGI-like functionality from templates in the "Templates and subst" format
Ability to access SQLite databases
User accounts authenticated with HTML forms and cookies, not HTTP authentication
Uploading of "avatar" images for user-posted comments

While it would be nice to support administration through the Web (e.g. using a browser to upload new pages or delete unwanted comments), I think it would also be acceptable to have a collection of scripts usable from a ssh login shell. (The comic artist may disagree; I haven't consulted with her yet.) For this reason I don't think SSL is necessary. Of course, features can be added in time.

Is Wub a good fit for this task, or is it overkill? Should I be looking at something simpler? Right now I'm simply hacking on DustMote.

The reason I'm contemplating this project is that the artist has wanted for years to set up ComicPress [L2 ] but cannot master MySQL's administrative complexity. I think she would be much better off with SQLite.

CMcC: As to requirements:

HTTP 1.1 - one of the features of 1.1 is pipelining which means significantly faster loading (because the client doesn't need to re-open a connection for each entity they want to fetch.) There's really no excuse for serving HTTP/1.0 anymore.
Wub's File and Mason domains handles static file serving
Wub's Mason allows files to be interpreted directly within tcl, no need for CGI (which is slow, gross and grossly slow.)
Anything running under tcl can access SQLite.
Wub's Login domain does this. ATM it uses Metakit, I'll coerce it to use SQLite if you like. It does Login/Auth using cookies (or HTTP auth) and also gives you per-user rows for storing stuff.
Uploading, yeah, easy enough. Wub's Form utility also enables you to do nice form handling in conjunction with its Direct domains. Wub's jQ domain lets you use jQuery for making it nice (multiple file uploads, etc.)

Additionally, Wub has support for site-wide page customisation.

As to 'overkill': serving pages is part of it. Serving pages efficiently is more of it (Wub does server side caching.) Serving pages in the presence of malicious bots and spiders is a lot more of it. While you can certainly use DustMote for the first part (serving pages) you're left with a lot more to do to achieve much of the rest of it. Wub's not particularly minimal, but I think it's about as heavyweight as the full task requires. If you do choose to use DustMote, or write your own, I'd encourage you to look at Wub's Utilities/ directory as it has a lot of functionality you're going to need anyway.

I'm more than happy to work with people who want to try this stuff out.

AMG: Okay, I have the Wub SVN installed, along with its dependencies. (I even have mk4tcl, which was a pain because the latest release doesn't compile with Tcl 8.6; the latest SVN works, with a lot of warnings and one test failure.) It runs. But I don't have any idea how to set it up. I don't know where to begin customizing it, configuring it, or even learning it. My impression is that it's bigger and more complicated than is warranted for my needs. But I could be wrong--- maybe my needs are more complex than I realize, and/or maybe Wub is simpler than I think it is.

So I will explain where I am at with DustMote. I added template [subst]'ification of *.tmpl files, added directory redirection (301's /dirname to /dirname/), and changed index.html defaulting to search for index.tmpl or index.html. I will need to bring in the utility procedures I previously wrote for my own Web site [L3 ] to handle quoting, etc., and I'll also bring in handling of HTTP headers and URL query parameters. But still, this would leave me without pipelining, uploads, cookies, sessions, or authentication. I'm sure these can be added. Server-side caching can be accomplished by remembering the [subst] scripts generated from the *.tmpl files, but I'd have to profile first to see if it's worth it. (Hooray for [time]!) I don't know how to deal with malicious bots or spiders.

jdc: Have a look at [L4 ]. I'm trying to put a collection of examples there as part of a tutorial on Wub. Each example introduces a Wub feature. This is still work-in-progress.

CMcC: let's do some problem analysis:

receiving, decoding, dispatching-upon and replying-to HTTP protocol streams is one part of the task of serving a web page. A solution to that can be found in [L5 ]. It's a pretty good solution, it is not as tiny as Dustmote because it does a lot more, only some of which is able to be removed.
interpreting received HTTP request URLs semantically and transforming them into responses is covered by all the stuff in [L6 ] where you will find several modules of code which perform generally useful transformations. You can choose to use whichever of them suits your needs, or none of them, or modify them as you wish.
interpreting various fields in the HTTP request and giving them semantics like 'cookie', 'query', 'user agent', 'url' etc, generating forms, html, report-form tables, etc are covered in [L7 ]

Is it possible to take Dustmote (which does a severely restricted subset of 1. above,) add 2. and 3. above to it, and get a solution? Sure. Is it possible to write brand new code to do 1., 2. and 3. above? Sure.

The only reasons I can imagine for doing that (rewriting 1. 2. and 3.) would be:

the provisions above are insufficient in some regard.
one just wants to learn what goes into doing it.
it's too hard to learn how to use the existing code

If the provisions are insufficient, I would like to know why, so I can improve them. If you just want to write it from scratch, that's fine too ... I will learn from the sub-problems you have had to solve and incorporate the solutions you arrive at (if they're better than mine.) If you just find it hard to learn to use existing code, part of that is my fault - there needs to be more and better documentation ('specially for the Utilities,) but I think that part of it is an underestimation of the relative difficulty of finding and solving the problems you will confront while trying to provide the facilities.

You should understand, I think, that I didn't set out to write a replacement for tclhttpd (and scads of tcllib stuff like ncgi) merely because I thought it'd be cool. I did it because I wanted to do stuff which neither tclhttpd or things like ncgi did properly. I did it after I tried to work with tclhttpd authors to make it work better. I did it as a last resort, in effect.

It's been an interesting journey. I think you should at least consider the use of a map and compass before undertaking it.

AMG: For this project, I wrote Wibble. Perhaps some ideas from Wibble can be incorporated into Wub. Have a look.

CMcC: I've taken a radical simplification from Wibble, and reduced the number of coroutines per connection down to 1. It's still experimental right now, but as it's currently (3Jun09) running this wiki, it seems to be working fine.

CMcC design discussion

CMcC likes this, it's neat and minimal, but flexible and fully functional. A couple of observations as they arise: all the header keys have to be set to a standard case after you parse them out of the request stream, as the spec doesn't require a client to use a standard case.

AMG: HTTP allows case insensitivity? Damn. Case insensitivity will be the death of us all! HTTP specifications (or at least clients) probably require a specific case for the server, which unfortunately is neither all-lowercase nor all-uppercase. What a pain!

CMcC: AFAIK you can return whatever you like (case-wise) from the server, so no ... no requirement. It's all case-insensitive for the field names.

AMG: Still, I must be able to deal with clients which assume HTTP is case sensitive and that it requires the case style shown in the examples. Most folks only read the examples, so they draw this conclusion: [L8 ]. Just look at Wibble itself! It fails when the client uses unexpected letter case in the request headers! I didn't spot where the specification allowed case independence, and none of the examples suggested this to me.

AMG: Update: I now force all request headers to lowercase and normalize all known response headers to the "standard" case observed at [L9 ]. Unrecognized response headers are left untouched.

CMcC: That's consistent with the networking principle (be tolerant of what you accept, consistent in what you provide.) Wub, FWIW, sends everything out in lowercase (IIRC) on the principle of 'screw 'em if they can't take a joke.'

CMcC: Also, I'm not sure what you do with multiple query args which have the same name, you have to be prepared for that, and do you handle query args without value? Unsure.

AMG: An earlier revision supported multiple query arguments with the same name, plus query arguments without values. Then I decided those two features weren't really important to me, that it was simpler to just require that sites made using Wibble wouldn't depend on those features. But if you can give me a compelling application for multiple like-named and argument-less queries, I'll re-add support. For now, later query arguments replace like-named earlier query arguments, and query arguments without value are taken as having empty string as the value. My earlier solution was for queries with arguments to be in a key-value list, then for all query argument names (even those without values) to be in a second list, sorted by the order in which they appeared in the URI.

CMcC: Yeah, it's much easier if you can ignore that requirement. Wibble's free to do that, Wub's not (sadly.)

AMG: How does Wub make use of argument-less query arguments and repeated query argument names?

CMcC: It's all set up in the Query module, a decoded query is parsed into a dict which contains a list of metadata and values per query element, accessors return the number of values, the metadata, and the specified element. Of course, most of the time in use I just flatten the dict into an a-list and ignore all the detail.

CMcC: Adding virtual host support is trivial as you've noted, you just need to combine the host with your zone regexp.

I note you don't fall-back to HTTP/1.0 (not really necessary, I guess,)

AMG: I have the beginnings of support for falling back to HTTP/1.0, in that I remember the protocol advertised by the client. In the future I can use that information.

CMcC I really wouldn't bother - there's no real need to support HTTP/1.0 IMHO - the only existing client still using it is the Tcl client (and that should be fixed soon.) Again, Wub doesn't have the option of taking the sensible path.

AMG: I'll have to check a couple streaming music players to see if they all grok HTTP/1.1. They would have to if they support seeking.

CMcC: nor do you stop processing input on POST/PUT as the spec requires (you ought to make sure this is done, as some things require it.) Your pipeline processing requires run-to-completion of each request processing, I think, but there are definitely cases where you would not want this (they're not common, but when you get such a requirement there's no way around it, I think) so that's a limitation, although not a show-stopper.

AMG: I don't have any experience with POST/PUT. I just put in the few rudiments I could figure out from the HTTP specification. I'll have to read up on POST/PUT in more detail.

CMcC: The spec says you can't process any requests (and the client oughtn't to send any requests) on a pipeline until the POST/PUT is handled completely. It's subtle, but it's (just) conceivable that something could be screwed up by it. Since your approach is to do things which make sense for most apps, you could probably get away with it by just documenting the behaviour.

AMG: Wibble processes each request immediately after reading it. Its main loop is: get request, compute response, send response, repeat. Computing a response for a POST/PUT necessarily involves committing its side effects to whatever database is used. So subsequent responses remain unread, waiting in the receive buffer, until Wibble is completely finished with the POST/PUT, or any other type of request.

CMcC: it's more for situations like when you have asynchronous background type processing. Updating a DB, for example.

AMG: I don't think I'll be doing any asynchronous background processing within a single connection. Someday I do plan to process multiple connections in parallel as separate processes or threads. But that's a completely separate issue.

CMcC: I like the way zone handlers stack, but the necessity of returning a list is less good, IMHO - I prefer the default case to be easy. I'd consider using a [return -code] to modify the return behaviour, or perhaps using a pseudo response key element to specify further processing.

AMG: I haven't done much with [return -code], so it hadn't occurred to me. That's an interesting idea, thanks. I think I'll change it to return the operation as the return code and the operand as the return value.

CMcC Yah, you might want to map the normal error codes from Tcl (including Tcl_OK) to reasonable values (e.g. Tcl_OK=>200, Tcl_Error=>500).

AMG: I wound up using [return -opcode] (wrapped by the [operation] sugar proc), which puts a custom "-opcode" key in the -options dict, then I receive this opcode using catch. The purpose of [return -code] is already defined, and it requires integers or a predefined enumeration, so I decided not to mess with it. Also the purpose of this general mechanism is not to report errors or status, but rather to tell the response generator what operation to take next: modify request, send response, etc. I do map error to HTTP 500 using try {...} on error {...}, then I print the error options dictionary to both stderr and (if possible) to the client socket. On error, I always close the client socket, forcing tenacious clients to reconnect, which is almost like rebooting a computer following a crash.

CMcC: I think the idea of starting a new dictionary for response is pretty good (as it means you don't have to filter out the old stuff,) but I'm not sure that doing so retains enough data for fully processing the original request. Do you pass the dict to the zone handlers converted to a list? That's not so good, as it causes the dict to shimmer.

AMG: Both the request and response dictionaries are available everywhere in the code. They're just in separate variables. Yeah, I convert the dict to a list by means of {*} into args. If that causes shimmering, I'll change the zone handlers to accept a single normal argument. By the way, extra non-dict arguments can be passed to the zone handler by making the command name a list. This makes it possible to use namespace ensemble commands, etc. as zone handlers.

AMG: Update: I have made this change. The shimmering is eliminated.

CMcC: I'm not sure that the idea of jamming new requests into the pipeline is a good one.

AMG: It was the best solution I could think of for handling index.html in the face of the template generation. With the example setup, if there is a file called index.html in the directory being requested, static will serve it straight from disk. If not, template will try to make one from index.html.tmpl. And--- very important!--- if that doesn't work, dirlist will generate a listing. If indexfile simply replaced requests for a directory with requests for index.html, dirlist could never trigger. And if indexfile only did this replacement if index.html actually existed on disk, template would not be used to generate the index.html. I couldn't think of any other way to get all these different handlers to work together.

CMcC: This is one of the subtle and intriguing differences between Wub and Wibble architectures - firstly you don't transform the request dict, you create a new one, and as a consequence you have to keep the original request around, and as a consequence of that you have to be able to rewrite the current request (if I understand it correctly.) Those are all (slightly) negative consequences of that architectural decision. The upside is that you don't have to keep track of protocol and meta-protocol elements of the response dict as tightly as Wub does - Wub has to cleanse the fields which make no sense in response, and that's time-consuming and unsightly - Wibble doesn't, and can also easily treat those elements using [dict with] which is a positive consequence of the decision.

AMG: Keeping the original request is easy and natural for me; all I had to do was use two variables: set response [getresponse $request]. To be honest, I didn't notice that Wub operated by transmuting the request dictionary into the response dictionary, so I didn't try to emulate that specific behavior. Instead it made sense to generate a new response from the request: from the perspective of a packet sniffer, that is what all web servers do. Also I enjoy the ability to not only rewrite requests, but also to create and delete alternative requests which are processed in a specific priority order. Branch both ways! The goal is to get a response, and handlers are able to suggest new requests which might succeed in eliciting a response. Or maybe they won't, but the original request will. Rewriters can leap without looking: they don't have to predict if the rewritten request will succeed. And in the indexfile/template/static/dirlist case, indexfile doesn't have the power to make this prediction.

CMcC: this took me a couple of read-throughs to get. You are expecting zone handlers which would otherwise fail to re-write the request to something they expect might succeed. It worries me that you may end up with more responses than requests (which would be disasterous) and I'm not sure what you do to prevent this (except you only ever have the latest request around, right? Because you don't model a pipeline directly, because you don't try to suspend responses?)

AMG: Yes, zone handlers can rewrite the request (or create a new request) that might succeed. It's not possible to get more responses than requests, since processing stops when the first valid response is obtained. The stacking order of zone handlers must be configured such that the first response is also the desired response. For example, putting dirlist before indexfile will prevent index.html from ever being served unless it is explicitly requested.

CMcC: One thing to bear in mind in the rewriting of requests: if you silently rewrite fred to fred/index.html in the server, next time the client requests fred, your server has to go through exactly the same process. Another way to do it is have the fred request result in a response which says that fred content has moved to fred/index.html. That way, the client and proxies can remember the relocation, and will ask for fred/index.html when they want fred, so the client does the work for you. So I'm not certain that your processing model is an unqualified good idea (nor am I certain it's not - the layering effect is powerful.)

AMG: This does not involve rewriting requests. To implement this behavior, the zone handler sends a Found or Moved or Whatever response to the client, which might then make an entirely new request unless it broke, lost interest, or found index.html in cache. It's up to the site administrator whether to rewrite requests within the server or to redirect the client. For an example of this kind of redirection, look at dirslash. Personally, I don't like instructing the client to make a new request for index.html, since I think it's ugly to have "index.html" at the end of the URL.

CMcC: You should probably add gzip as an output mode, if requested by the client, as it speeds things up.

AMG: I figured gzip can wait until later. It's more important for me to bang out a few sites using this. Also I need to look into multithreading so that clients don't have to wait on each other.

CMcC: gzip's easy to add, and well worth adding. I should probably get around to adding Range to Wub, too.

AMG: Okay, I'll look into gzip and zlib deflate. Wibble never sends chunked data, so it should be as easy as you say. I'll just look at the response headers to see if I need to compress before sending.

Wibble doesn't support multipart ranges. I doubt any web servers do; it's complicated and it's worthless. Clients are better off making multiple pipelined individual range requests.

AMG: Update: I'm not sure how encodings and ranges are supposed to interact. Content-Length gives the number of bytes being transmitted; that much is clear. What about Content-Range? Do its byte counts reflect the encoded or unencoded data? And the request Range--- surely its byte counts are for the unencoded data.

CMcC: I completely ignored Range stuff, so I don't know. Guessing they either say 'you can't encode a range with anything but none', but for all I know they give a harder-to-implement stricture.

AMG: I'll just make a few guesses then see if it works with Firefox.

AMG: I think I'll ignore the qvalues; they're tough to parse and kind of dumb. Why would a client advertise that it accepts gzip but prefers uncompressed? Or why would it give something a qvalue of 0 rather than simply not listing it?

CMcC: yeah, you'd think the spec would just say 'list the things in the order you would prefer them' instead of the whole q= thing. I dunno, there are lots of anomalies. For example, a lot of clients claim to be able to accept content of type */*, but then users complain if you take 'em at their word :)

AMG: If I ever get "Accept-Encoding: *" then I will encode the response using lzip [L10 ]. It'll be awesome. :^)

CMcC: All in all, good fun. I still wish you'd applied this to Wub, and improved it, rather than forking it, but oh well. I wish you'd been around when I was developing Wub, as some of your ideas would have (and could still) contribute to Wub's improvement. I definitely like the simplicity of your processing loop, although I think that Wub Nub's method of generating a switch is faster and probably better (having said that, it's harder to get it to handle stacking.)

AMG: Yeah, I wish the same. I really haven't spent much time on Wibble. I wrote it in one afternoon, then a week later, spent a couple hours tidying it up for publication. I don't have a huge amount of time to spend on this sort of thing, so when I hack, I hack furiously! And I really couldn't wait for you and JDC to answer my Wub questions. Sorry! :^) I invite you to absorb as much as you like into Wub. If you give me direction and I have time, I'll gladly help.

Now that this code is written, I think it should stay up as a separate project. I think it fills a slightly different niche than Wub. Both are web servers, of course. But Wub is a large and complete server made up of many files, whereas Wibble is a smallish server hosted entirely on the Wiki. That makes it a lot more accessible and useful for educational and inspirational purposes, kind of like DustMote. Maybe think of it as Wub-lite, a gateway or gentle introduction to some of the concepts that undergird Wub.

Thank you for your comments.

CMcC: You're welcome - as I say, this is interesting. It's interesting to see where you've taken the request- response- as dict paradigm, it's also interesting to see how you've used coroutines - very clean indeed. Wub has two coros per open connection, and a host of problems with keeping them synchronised. The idea was to keep protocol-syntax and semantics distinct, and therefore to make the server more responsive. I'm scratching my head, wondering whether to move to single-coro per pipeline, as Wibble does, but have to think through the implications.

It's good to see Wibble, because you started with dict and coroutine as given, and evolved it with them in mind, where Wub evolved in mid-stream to take advantage of them, Wibble seems to make better considered use of the facilities as a consequence.

I would definitely advise keeping Wibble as a distinct project - it addresses the problem of a minimal server (like Dustmote et al,) but still tries to provide useful functionality (unlike Dustmote et al.)

I'd be interested to see what you make of Wub/TclHttpd's Direct domain functionality.

AMG: I started with Coronet, which you and MS wrote. I dropped everything I didn't need, merged [get] and [read], absorbed [terminate] and $maxline into [get], eliminated the initial yields in [get], renamed [do] to [process] and [init] to [accept], changed [accept] to use the socket name as the coroutine name, and defined the readability handler before creating the coroutine. I did that last step because the coroutine might close the socket and return without yielding. Yeah, for an instant I have the readability handler set to a command that doesn't yet exist (or, in case of return without yield, will never exist), but this is safe since I don't call update.

CMcC: Noticed that small window, but you're right it can't ever be open. It's interesting to see the iterative evolution: Coronet was cribbed from Wub, you adapted and targetted it, now I'm considering cribbing your adaptation to simplify Wub. :)

AMG: I will try to look into Direct later today. I see Wub has directns and directoo; I am guessing that these map URIs to command invocations, possibly in the manner of Zope.

CMcC: not familiar with Zope, but yes Direct domain maps URLs to proc elements of a namespace or method elements of a TclOO object, and maps query elements to formal parameters by name. It's quite powerful.

AMG: Then it's like Zope. Disclaimer: It's been a long time since I used Zope, and I never did anything more with it than play with the examples.

AMG: Hey, for a fun read, check out [L11 ]. Does Wub support individual headers split across multiple lines?

CMcC: Yes, Wub supports that. You sort of have to, for things like Cookies, which can (and do) span several lines of text. You just have to remember the last field you filled in, and if your key value regexp has no key, you append that to the immediately prior field. Not too hard.

AMG: I've been thinking about cookies. Multiple cookies get posted as multiple Cookie: or Set-Cookie: headers or lines, which the current Wibble code can't handle. I could fix headers in the general case to support multiple lines, or I could add special support for cookies. I think I'll go with your approach of appending; it sounds much easier than adding a special case.

CMcC: Having read the RFC walkthrough you posted, I'll have to amend my answer to 'Yes, Wub supports that, but not all the stupid variants possible.' If a client sends stuff that looks like chaff, I'm ok with returning next-to-nothing. There's a school of thought, btw, which holds that you can best identify malicious spiders and bots by their poor implementation of the RFC, and so being ruthless in applying the standard can help cut your malware load. It's mainly about not being permissive with things which don't send Host: etc. Anything that cuts spiders short is OK with me.

CMcC: is considering modifying Httpd to have a single coro per connection. It's split into two largely for historical reasons (Wub used to parse headers in one thread, process in another, but there seemed to be no good performance reason - it was optimising for the wrong, and uncommon, case.) You do need to be able to defer responses, and sometimes delay processing of subsequent requests pending completion, WubChain relies upon it, and it's a documented and reasonable requirement.

I'm also considering moving a lot of the request pre-processing stuff into a distinct proc. Things like Cache, Block and Honeypot are done in-line, which is marginally more efficient than processing the whole request before making decisions, but I suspect the gains in code cleanliness more than compensate for a few cycles sacrificed.

AMG: I'll gladly trade a few milliseconds for increased code accessibility. The more people are able to understand the code, the more people are able to suggest architectural improvements that will yield more substantial performance boosts.

Rethinking zone handlers

AMG: Something I need to think about is allowing multiple zone handlers to contribute to a response. For example, the static handler doesn't set content-type, but maybe a contenttype handler can fill that part in by inferring from the request URI or the response content or contentfile. That is, if content-type wasn't already set by some other handler. Another example: a setcookie zone handler might want to inject a cookie or two into the response. The way to do this is by defining a new opcode in addition to "sendresponse". Perhaps "updateresponse"? Or maybe I split "sendresponse" into two separate operations: "setresponse" and "finish". Then I give zone handlers the ability to return multiple opcodes. Also zone handlers probably need to see the under-construction response so they can modify it without outright replacing it. That would simply be a second parameter.

As for actually implementing multiple opcodes: I would need to drop the trick with the custom -opcode return options dictionary. I see now that it doesn't actually provide any benefit, since I already have [operation] to wrap around return. One possibility is for [operation] to return a list or dict of opcode/operand pairs. Another is to have some more fun with coroutines, where the zone handlers can yield as many opcode/operand pairs as needed before finally returning. Perhaps the latter approach can open the door to a more interesting interplay between zone handlers. I'll have to think about it.

CMcC: Wub passes a kind of 'blackboard' dict around, it starts as a lightly annotated request dict, and ends up as a response dict. Each step of the way it's transformed. Wibble passes a request dict in, and expects a response dict back. Wub has a series of filters after response generation, in the Convert module, and makes good use of the transformation model. Wibble could do something similar by allowing modifications to the request dict (which it does) to be response fields. Then you could use a series of zone calls to progressively build up a response. I'm not recommending this, I'm merely suggesting it as a possible approach.

AMG: I did a little thinking about the computer science theory behind Wibble's quest for a response. The zone handlers form a tree which branches every time a prependrequest or replacerequest is done. deleterequest and sendresponse establish leaf nodes. Wibble performs a breadth-first search of this tree. When it encounters a sendresponse node, it terminates its search and sends the HTTP response proposed by the node. If there is no sendresponse node in the tree, it sends HTTP 501.

Now that I know what my own code is doing :^) , I can invent more flexible alternative implementations of the same algorithm. One thing comes to mind: each zone handler returns a list of request-response pairs. The length of this list indicates the branching behavior of the tree. If it's empty, the node is a leaf, like deleterequest. If there's one request and one response, the tree doesn't branch, like with replacerequest and pass. If there are multiple pairs, the tree branches, like with prependrequest; except that it's possible to branch to have more than two children, and the list order gives the BFS priority order. If there's one response but no request, the node is a leaf and it's time to send, like sendresponse.

AMG: Okay, done! This works.

ISO8859-1

APN: Why is the encoding of the output channel set to ISO8859-1 ? Should it not be UTF-8 ?

AMG: I don't remember! Does anyone around here know why someone would want to use ISO8859-1 in this situation?

AMG: I finally figured it out. Character sets other than ISO8859-1 require a proper charset parameter in the Content-Type response header. Also when there is an Accept request header, the server should try its best to honor it. I didn't want to deal with any of this, so I stuck with ISO8859-1. Support for other character sets is a possible future enhancement.

The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. [L12 ]

License

MAKR (2009-10-12): I just stumbled over this nice little thing ... I'd like to know what the license of this code is? Would it be possible to categorize it under the same license as Tcl's?

AMG: Sure, no problem. Wibble uses the same license as Tcl.

Separate request and response dicts

jcw 2009-11-11: Thanks Colin for mentioning this page. Wow, Andy, this is a fascinating server - and a great way for me to dive into new 8.6 stuff...

I was wondering why you need to pass around both requests and responses separately. Couldn't each newly-created response dict have the request as one of its entries? The same request would then be shared by each tentative response you create. If responses are derived from other responses (not sure that happens), the original request would be in there, but buried a bit deeper.

AMG: Thanks for the words of encouragement! Tonight I will contemplate the request/response dichotomy. For now, I just noticed that wibble::filejoin probably doesn't not work right when a directory name has a trailing dot. For example "foo./bar" might mutate into "foobar". I'll verify this as well.

AMG: Updates: I fixed wibble::filejoin by removing it--- it's actually not needed. I incorporated APN's wibble::getline. I put in some license information.

As for keeping request and response separate, I think this makes the zone handlers easier to write. They receive two arguments, the input request dict (which tells them what to do) and the output response dict (which they update). What benefit is there to merging the two arguments? I want to keep the dict structure as flat as I can, but I also want to avoid collisions between the request and response dicts. The simplest way to accomplish both is to make these two be separate variables.

APN bug fixes, use in BowWow

APN: Updates: fixed non-chunked form handling (getline should have been getblock). I now use wibble in place of tclhttpd in BowWow. Smaller and more malleable.

AMG: I put that bug there to test your vision. :^) Actually I bet it crashed the first time you did a non-chunked post. I was so busy testing chunked transfers that I forgot to test non-chunked! Also, thanks for taking the leap; I would be delighted if you shared your experiences using Wibble. I haven't actually used it for anything yet, even though I had plans.

MS adding variable command

AMG: MS, I am curious about your edit (changing wibble::handle to use variable instead of a namespace qualifier). I had thought that using namespace qualifiers is faster when the variable only gets used once or twice. Is there something I'm missing?

MS Try sourcing wibble from a different namespace: namespace eval foo source /path/to/wibble. The new version works, the old one doesn't. Since 8.5 variable is bcc'ed and does not incur the old perf penalty.

AMG: Thanks! Good to know. :^)

Multiple form elements with the same name

jnc: In regards to multiple form elements of the same name, it is very handy when dealing with lists of items. Two examples: you are editing a parent and you have a table below containing <input name="child_id" type="hidden"/> <input name="child_name"/> <input name="child_dob"/>. When you get the data in your web app, it typically is a list. Therefore you can loop through the data easily. Now, another very helpful situation is when using list data again, and this time with checkboxes to select elements. Say you have a list of emails from your pop3 box that you are previewing the From/Subject for. The idea is you are going to delete these messages before downloading them into your mail program. Again, you have a list: <input type="checkbox" name="delete_email_ids" value="$email_id"/>. Now, with checkboxes, you only get the data submitted via the web browser if the item is checked. So, in the end, you get a list of only email_id's that the user wants to delete. The programmer can easily do: foreach id $delete_email_ids { ... }

AMG: That's (part of) what rawquery is for. (I admit that rawquery isn't easy to pick apart.) Since I anticipate that queries will most frequently be accessed like a dict, the query element of the request dictionary is a dict. If it turns out that like-named form elements are frequently needed, I can make a listquery or similar element that like query except constructed using [lappend] instead of [dict set]. Both would be compatible with [foreach] and [dict get]. Because of this compatibility, I could also make query itself be a key-value list instead of a dict. However, this would result in a copy of query shimmering to dict every time [dict get] is used to get a query variable. Suggestions?

jcw: Just one... fix dict so it keeps the data structure as a list, and stores the hash somewhere else ;)

AMG: I'm not entirely sure how that would work. Using [dict set] should overwrite existing elements, so I would have to use [lappend] in order to support multiple like-named elements. But later if I access query using [dict get], a temporary dict-type copy would be constructed. You're suggesting (tongue-in-cheek) that the hash table not be temporary but somehow get attached to the list-type Tcl_Obj for future use by [dict]. Cloverfield has ideas in that direction, but Tcl doesn't work that way. Unless... perhaps the list and dict types can be partially unified to reduce the cost of shimmering. It would be like a free trade treaty between neighboring nations. :^) Conceptually, that would make sense because list and dict are so similar, especially since dict was changed (in Tcl 8.5 alpha) to preserve element order. Practically, there are obstacles, for example changing or extending the C API.

Making Wibble actually parse HTTP headers

AMG: For the last couple days, I've been working on parsing the HTTP headers into lists and dicts and such. This is actually very complicated because the HTTP header grammar is an unholy abomination, and that's only counting the part that's documented! I'll post my work soon, but it's not going to be pretty. Also I hope to add support for file uploads through POST (not just PUT), since that is required for my project. This will also add the complexity of MIME-style multipart messages, possibly nested. It all makes me want to vomit. HTTP works better as a human language than as a computer language, like it was optimized for producing deceptively simple examples, not genuinely simple code. To paraphrase Dan Bernstein [L13 ]: "There are two types of interfaces in the world of computing: good interfaces and user interfaces."

TclOO and Wibble

jcw 2010-02-24: I'd like to try bringing TclOO into this thing, so that some of the state can be managed as object variables, and to simplify adding utility code which can be used in each request (also mixins, forwards, etc). My previous code was based on an httpd.tcl rewrite (see [L14 ] and [L15 ] - each sourced in its own namespace), but wibble makes a more powerful substrate to build on. My idea would be to create a separate TclOO class, with objects named the same as the coroutines to allow a simple mapping without having to change this clean core. Andy, have you considered this option? Any thoughts on how to best bring it together?

AMG: Sorry, I don't have time now. :^( But I am interested. I hadn't thought of using TclOO, since I've never used TclOO in a project before. I really would like to see how it can improve Wibble. First I want to finish the list+dict project we discussed in email (I have code now!!), then I'll get back to the HTTP header parsing (I have about half of that code done). However I am stupidly starting another art project, plus I have a massive work schedule which will soon include another two weeks "vacation" from both the Internet and my personal computers. Oh well, Wibble started as a project to benefit the person I'm doing the art for, so I guess it's okay. :^) Perhaps this weekend I will have time to review your code to see how TclOO is used.

jcw: No worries. I can save you some time though: the links I mentioned don't use TclOO, so you can ignore them :) - I just think by now it has become a good option to consider. I'll try some things out.

JCW uses Wibble in JeeMon

AMG: I found this project by JCW which uses Wibble: [L16 ] [L17 ]

WebSocket and other news

AMG: Let's see, what's news... AGB wrote WebSocket, which is an add-on to Wibble. I'd like to research it more thoroughly and see if I can better integrate it. Last Christmas or so I wrote a [deheader] proc that attempts to not only split the header into lists but also break the elements into lists and dicts. The main thing holding me back from posting it is its complexity; I'd like to find a way to trim it down. At the moment it's 87 lines replacing what used to be 9 lines. [deheader] was the first step in my quest to support file uploads; I need to slay the multipart MIME dragon before I can declare victory. Maybe PUT too. Also I need to do some work with content-type: and accept:, including mapping between the encoding names used by HTTP and Tcl. All that qvalue garbage will be interesting to sort out. APN has made some customizations to Wibble in BowWow, which I should consider for inclusion. Earlier today I changed the way [nexthandler] and [sendresponse] work; instead of [return -level], they do [return -code] with custom codes caught by [try]. That works better. Hmm, JCW earlier suggested I research TclOO; I did some reading on that front. For now I think I'll just keep everything as lists and dicts, rather than making objects.

bch: AMG -- put up a fossil repo and let others help share the load.

AMG: Maybe someday, but not while I'm in the middle of a major edit.

I'm making good progress on parsing headers. The regexps are quite hairy due to handling quotes and backslashes, but they pass all my tests so far. Again, I'd really like to find a way to unify the headers; right now there's a fair bit of duplication between the routines that handle each of the following:

Cache-Control, Pragma: key=val,key=val list
Connection, Content-Encoding, Content-Language, If-Match, If-None-Match, Trailer, Upgrade, Vary, Via, Warning: elem,elem list
Accept, Accept-Charset, Accept-Encoding, Accept-Language, Content-Disposition, Content-Type, Expect, TE, Transfer-Encoding: elem;param=val;param=val,elem;param=val;param=val list
Cookie: list that allows both , and ; as separators, weirdness with $
everything else: single element

I suspect I can't unify them, since HTTP is quite irregular at heart.

I got the qvalues taken care of. It took a whopping eighteen lines, which I'm not happy about. Basically I sort the Accept-* stuff by decreasing qvalue. Why oh why doesn't HTTP simply do that directly!?

Improving [getrequest]

jbr 2010-11-24:

I was looking at the main getrequest loop in wibble and see that the request dict is constantly having a few zone dependent values inserted/removed from it as a request moves along the zone chain. I understand that you are trying to keep the nice clean "handler $request $response" api clean. This seems artificially. Why not just split the zone dependent parts into a third dict and pass that in to the handler also? The code will be cleaner, and no need to "filter out", which is a clumsy operation.

Here is a little untested code:

# Create a dict of zone options
#
dict set zoneopts prefix $prefix
dict set zoneopts suffix [string range $path [
    string length $prefix] end]
if {[dict exists $options root]} {
    dict set zoneopts fspath [
        dict get $options root]/[dict get $zoneopts suffix]
}
set zoneopts [dict merge $zoneopts $options]

# Invoke the handler and process its outcome.
#
try {
    {*}$command $request $response $zoneopts
} on 5 outcome {
    # Update the state tree and continue processing.
    #
    set state [lreplace $state $i $i+1 {*}$outcome]
} on 6 outcome {
    # A response has been obtained.  Return it.
    #
    return $outcome
}

I might even go as far as creating the fspath value even if the root option doesn't exist if I could squish the code down to a completely functional style.

    dict set zoneopts [dict merge $options [list prefix $prefix suffix [string rang...] ...]

As an incentive for pursuing this, since the request dict no longer needs to be fixed up, it may now be possible to tailcall down the chain of zone handlers. I'll think about how to construct this.

AMG: Sounds like a good idea. I think I'll make this change. First, let me consider your other comments below...

AMG: 2010-12-18: Done! Instead of merging request and options, they are separate sub-dictionaries inside a unified state dict. I prefer this to having the three arguments, since it doesn't require me to remember what order everything's in. ;^)

Other ways of looking at zone handlers

jbr 2010-11-25 (Happy Thanksgiving):

I've been thinking about the practical use of zone handlers and I've a questions and ideas.

First - There are no example handlers that return multiple request/response pairs. Are there example use cases that require this feature?

AMG: Yes: [indexfile]. If a directory /dir is being requested, it forks the request tree to contain requests for both /dir and /dir/index.html . It does so without first checking that index.html exists, because it can't know if this is supposed to be a file on disk or the product of a later zone handler. Leaping without looking. :^)

jbr: Along this line - if I were to modify a request or generate a new one and return it, how can I reset the zone handler loop so that my new request is looked at by all the zones, not just those remaining down stream of the current zone?

AMG: You can't, otherwise the server would hang forever when your zone handler is reinvoked and once again decides to reset the loop. I couldn't think of any uses for this feature, so I didn't bother with it. You can use a zone handler more than once if you really need this.

jbr: Second - It seems that zone handlers are of at least 3 distinct types.

Guards
Content
Helpers

Guard handlers are early on the list and implement authentication, session or protection of particular file types. I noticed that specifically asking for "file.tmpl" or "file.script" causes wibble to return application code! In a working system perhaps a guard handler placed before the static zone would explicitly reject these file types.

AMG: Directly returning file.tmpl or file.script is helpful to me for testing and debugging, and I actually do open up those sources on a site I developed a few years before Wibble. If that's not what you want, you can easily restrict access using a zone handler. See [L18 ] for a simple one that might work for you.

jbr: Content handlers return in line content or static files to serve up.

Helpers can be used to fill in desirable but missing parts of the response. In the current architecture they must work closely with Content zones and other helper zones. Example - mime types. If a mime type zone handler is included late in the chain, the static handler must not return the completed response. It fills in the file name but waits for the mime type to identify the content. I'm not sure this is a good thing. Perhaps its better to just call the mime type proc from the static handler and be done. There are a lot of complex interactions that can take place here, best to avoid them?.

Idea - Have several chains that a request/response must traverse. Just like iptables in linux networking:

Input - verify/modify request
Content - generate response
Output - Fill in missing header just, fix up response

Idea - A zone handler might radically modify the request and want to restart zone processing. The current request needs to be aborted, and the new one inserted in the input queue. For example, I have dynamically generated .doc and .xls files. I currently call a tclhttpd template to create the content in a temporary directory and then send the browser a redirect to the new file. It might be nice to create the file, set the fspath request value and content-disposition response header and then reset the zone handler chain. This could work now by placing the dynamic zone in front of the static zone, but I'd like to eliminate the interactions between zones.

I'm thinking about how to write this code.

AMG: Again, the zone handlers can be sequenced however you like, and it's possible to use a single zone handler multiple times. Also, you had another idea about one zone handler calling another one directly. That might be a valid way to implement custom sequencing without changing the core. The outer zone handler procedure can intercept the [nextrequest] and [sendresponse] commands of the inner zone handlers by calling them inside a try block.

Also please think about character encoding. I haven't done that properly yet. I understand that [chan copy] can convert character encoding according to the encodings of the input and output channels; I just haven't tried using this feature yet, mostly since being able to get the acceptable encodings is a new feature.

Request dict encoding

jbr 2010-11-28

Another Sunday afternoon noodling about with code. I've begun to use wibble and add the stuff that might make it a more complete product. Here are a couple of discoveries that I've bumped up against.

post queries are parsed into the request dict with an extra level of list to accommodate the "content-disposition" value used in file upload. This adds an annoying extra lindex or something for most all other usage. When I've done this in the past (parsed mixed-multipart) I've set the keyword value to the file name (content-disposition) and placed the content on disk in a staging directory. This isn't the best but its an idea. It does allow streaming the post data from the socket to disk in chunks. I'd ditch the extra list level for most usage and only use it in the file upload case.
There is an extra list level in cookie parsing also, don't know why yet.

AMG: For query parsing, this is to distinguish between a key existing and having a value. For request header and post parsing, it serves to separate a key's "main content" from its "attributes" or "metadata". I chose empty string for the element name because it can't possibly collide with any attributes. I wasn't entirely consistent about this: in the response dictionary I say "content" instead of empty string. But that's the response dictionary, so I can be different. ;^)

This extra level is inescapable: a dict element can either contain a "leaf" value or a child dictionary; it can't contain both, not without some means of encapsulation (e.g. a list). That's like having something on your disk that's both a file and a subdirectory. Instead, move the "file" into the subdirectory and give it a reserved name. Empty string is good, because no "real" file can have that name. ;^)

I don't much care for the idea of streaming all uploaded file data to disk. Definitely it makes sense for some cases, but I'm not sure how to distinguish. I'd rather just keep it all in memory and let the zone handlers write it wherever it wants, however it wants. If your concern is to prevent memory exhaustion by uploading huge files, that's already a problem without file uploads! Just send a header containing a neverending stream of lines shorter than 4096 bytes. No POST required. In other words, that's a problem I have to solve by more general means, not by special-casing file uploads.

[dict exists $query foo] checks if the foo key exists. [dict exists $query foo ""] checks if the foo key exists and has a value. [dict get $query foo ""] gets that value. I know, I could have made values default to empty string, but I decided against that, since that wouldn't solve my attribute problem. Basically I implement the concept of null using out-of-band signaling. It's not so odious in this context, since you're using the [dict] command anyway and it can easily chew through the extra level. I'm not sure what you would use [lindex] for.

The vars zone will help you sort this all out. This HTML snippet:

<form method="post" enctype="multipart/form-data" action="/vars">
  <input type="text" name="foo" />
  <input type="text" name="bar" />
  <input type="file" name="quux" />
  <input type="submit" value="Submit" />
</form>

Results in this vars output:

post foo content-disposition {}	form-data
post foo content-disposition name	foo
post foo {}	value of foo
post bar content-disposition {}	form-data
post bar content-disposition name	bar
post bar {}	value of bar
post quux content-disposition {}	form-data
post quux content-disposition name	quux
post quux content-disposition filename	data.txt
post quux content-type {}	application/octet-stream
post quux {}	contents of data.txt
rawpost	(len=585)

This shows that foo=value%20of%20foo and bar=value%20of%20bar. All three form inputs have content-disposition=form-data, but their content-disposition name attributes differ. content-disposition filename is available for quux, and there's a content-type too.

I know this all seems very complicated (it is), but it's actually a literal translation of HTTP. In other words: don't blame me, I'm just being faithful to the standards documents!

Testing out query strings is easier, since you don't need a <form>. Just go to http://localhost:8080/vars?foo=value+of+foo&bar=value+of+bar&key+with+no+value . By the way, here's a little patch for you: Edit [dumprequest] to change eq "post" to in {post query}. This makes display of query values consistent with display of post values.

query foo {}	value of foo
query bar {}	value of bar
query {key with no value}
rawquery	?foo=value+of+foo&bar=value+of+bar&key+with+no+value

Notice the presence of {} (a.k.a. "") at the end of the query keys with values and the absence of same at the end of query keys without values.

jbr: Ah. Thank you for this description. I was wondering why the "vars" output was listing the empty value on the "key" side of the display. When you just print out the value you don't see the intended structure and I was proffering lindex to get past the {} and onto the value, Of course in practice I too used an extra level of dict as you suggest.

jbr, cont.:

The zone handlers are first grouped by prefix and then executed in order. This is confusing.

AMG: Yup, you're totally right. That's my mistake. Just a few minutes ago I ran face-first into that same problem. Hard. I may have broken my nose.

You may be waffling about whether or not it's a bug, but I firmly believe it's a bug. :^) Now, how shall I fix it? I guess I should simply not group by prefix. I'll probably fix this at the same time I implement the separate zoneopts dict.

jbr, cont.: My first attempt at authentication was :

 wibble::handler / authenticate
 wibble::handler /vars
 wibble::handler / dirslash
 ...
 wibble::handler / notfound

I intended the first authenticate handler to force all server access to be logged in and then go down the list processing prefixes. This doesn't work because all of the "/" prefix handlers are tried first, then the "/vars" prefixes. Unfortunately notfound is encountered at the end of the "/" list. I think that this is a bug.

After having dinner and considering, I'll retract my statement that its a bug and leave it that it needs better explanation/examples. It would still be nice to have some method of responding to a request by prefix without committing ones handler to a specific prefix handler list (that a mouthful.) Consider my current handler setup:

    wibble::handle /vars   authenticate
    wibble::handle /vars   authorize expr { $user ne guest }
    wibble::handle /vars   vars

    wibble::handle /data   authenticate
    wibble::handle /data   authorize expr { $user in [group data] }
    wibble::handle /data   dirslash root $root
    wibble::handle /data   static root $root
    wibble::handle /data   dirlist root $root
    wibble::handle /data   notfound

    wibble::handle /       authenticate
    wibble::handle /       authorize expr { [dotaccess $user] }
    wibble::handle /       dirslash root $root
    wibble::handle /       indexfile root $root indexfile index.html
    wibble::handle /       static root $root
    wibble::handle /       template root $root
    wibble::handle /       script root $root
    wibble::handle /       dirlist root $root
    wibble::handle /       notfound

But maybe this is pretty good.

Altered zone handler registration proc

jbr 2010-12-12

Here is a new zone registration function to fix the above discussed bug and create the data structure needed by getresponce.

# Register a zone handler.
proc wibble::handle {zone command args} {
    variable zones
    if {[lindex $zones end-1] eq $zone} {
        set handlers [lindex $zones end]
        lappend handlers [list $command $args]
        lset zones end $handlers
    } else {
        lappend zones $zone [list [list $command $args]]
    }
}

AMG: I flattened out the zone handler list. Out-of-order zone handlers should no longer be a problem. Please give it a try. Note that you'll have to modify your custom zone handlers to work with the new version of Wibble.

Compatibility with ActiveTcl 8.6.0.0b4

jblz 2010-12-18: Just wanted to mention that the latest ActiveTcl 8.6 (8.6.0.0b4) does not appear to suffer from the bug reported here , and is probably suitable for wibble. Also, you can find Tcl 8.6 tclkits built after 2010-09-15 here .

Exploiting [dict with]

jblz 2010-12-18:

Hi AMG, Having a lot of fun with wibble!

In each of the zone handlers there is a line:

dict with state request {}; dict with state options {}

I believe that request & options are sub-dicts of the state dict, and i am familiar with the usage of dict with if there is a body defined, but i am totally clueless as to what: "dict with state request {}" might accomplish.

AMG: Glad to hear it! I'm taking advantage of the fact that the variables created by [dict with] persist after it finishes executing. This basically dumps the contents of state request and state options into local variables for easier access. When the code isn't trying to modify the dict, it doesn't matter if it's inside the [dict with] body or after it. I put it after just to avoid indentation. ;^)

jbr: dict with also reloads the dict variables back into the dict variable after the body executes. I believe that I remember discussion that this step is optimized away if body is empty.

jblz: Very slick trick! Thanks for the quick response!

WebSocket again

jbr 2010-12-20 I have updated the WebSocket with a start on code for wibble.

AMG: Thanks. I should give Chrome a try one of these days. Right now JCW and I are focusing on AJAX instead, specifically because WebSocket isn't widely implemented in browsers.

Removing useless [nexthandler]

jbr 2010-12-20: I notice that the example zone handlers always end with [nexthandler $state] even when they don't modify the state. Shouldn't they just return? Then they won't incur exception processing and the overhead to update the state list in getresponse?

AMG: I doubt exception processing costs significantly more than any other kind of return, but you're right that it is a waste to modify the state list without really changing it. ;^) I'll make an update in a few minutes.

Long polling

OP - 2010-12-27 05:47:36

Hello. Long polling with Ajax is an interesting technique. Any idea on how to implement Comet like server push ? My idea would be to create a hidden iframe in my client HTML that gets refreshed by a push.script working with suspend/resume, pushing content if available then sleeping a while. Can i just flush content parts to the client socket ? What would be the Wibble way to achieve that ? Thanks for your suggestions !

AMG: I must admit ignorance of Comet, but I believe I know what you mean by push. I've already written working test code that does it. I didn't use an <iframe>, although that's an option; I used JavaScript to replace the contents of <div>s, etc., in the page, then immediately start another XMLHttpRequest to poll for more data.

In my as-yet-unpublished development version, I have rewritten [suspend] and [resume] to form a notification system. As in the current version, they take arguments indicating what events to wait for or what event has happened, respectively. In addition, [suspend] takes an optional timeout, and [resume] takes the socket name(s) of the coroutine(s) to notify (can be "all" for all coroutines interested in the event). If [resume] is called from within a coroutine, it reschedules itself to be called from the event loop, then returns. Extra arguments can be passed to [resume]; [suspend] returns these arguments as a list, preceded by the event name. [suspend] also recognizes "readable" as an event name, and I already mentioned the optional timeout; these tie into the Tcl event loop and can wake without a [resume].

There's a [resumecmd] separate from [resume] that constructs a command prefix calling [resume] for use with [after], [chan copy], etc. [resumecmd] additionally recognizes the special coroutine name "self", which it translates it to the real current coroutine name. Remember that the coroutine name is equal to the socket name. The only gotcha is that the coroutines are in the "wibble" namespace.

Content gets flushed after the zone handlers finish. Don't put a read/write loop inside a zone handler; that's already in Wibble, which calls the zone handlers between reading and writing. The zone handler system is designed to generates a response to a request, not to get the request, not to send the response. Zone handlers can also suspend the coroutine until there's data to send.

I'll put up my new code soon, along with an example showing asynchronous push notifications. For now, look at the Wibble AJAX example [L19 ], which simply notifies a client when a timer expires.

AJAX

OP - 2010-12-28 04:49:30

Playing with Wibble, coroutine and server push.

I took insiration from this page http://www.zeitoun.net/articles/comet_and_php/start I just changed the server page reference so that it points to this Wibble script:

set socket [dict get $request socket]
set ns push_$socket
namespace eval $ns {
    variable socket

    proc start s {
        variable socket $s
        chan puts $socket "HTTP/1.0 200 OK"
        chan puts $socket "Content-Type: text/html\n"
        chan puts $socket "<html><body>"
        coroutine [namespace current]::iter push
        iter
    }

    proc push {} {
        variable socket
        yield
        set i 0
        while 1 {
            puts "iter $i"
            if { [eof $socket] } { break }
            chan puts $socket "<script type=\"text/javascript\">"
            chan puts $socket "window.parent.comet.printServerTime('[clock format [clock seconds]] ($i)');"
            chan puts $socket "</script>"
            chan flush $socket
            after 1000 [namespace current]::iter
            yield $i
            incr i
        }
    }
}

${ns}::start $socket

AMG, this is a basic test i made before reading your answer to my question (i am nowhere in understanding Wibble and coroutine magic)

Basically, it works (the client page displays updated time and counter information) and the server keeps running and responding to requests.

I am trying to avoid chaining Ajax requests or playing with long polling requests because UI parts already heavilly rely on Ajax calls. I have yet to discover what may be wrong with this straight approach (i have to fully understand your previous comments).

Thanks for your comments (by the way thanks to all contributors to Wibble, it rocks !)

AMG: AJAX is an integral feature of Wibble (there's an inside joke in that statement if you know where to look), so you don't need to implement it in your zone handler. Here's how I would implement the same thing. It doesn't need to create its own coroutines, nor do its own I/O. Also it doesn't rely on Comet; it sends the client all the JavaScript it needs for AJAX. I put this in a file called index.html.tmpl:

% dict set response header content-type text/html
% if {[info exists query] && [dict exists $query update]} {
%   if {![dict exists $query now]} {
%     suspend {} 1000
%   }
[enhtml [clock format [clock seconds]]]
% } else {
<html><head><script type="text/javascript">
function update(initialize) {
  var xhr, span = document.getElementById("time");
  if (window.XMLHttpRequest) {
    xhr = new XMLHttpRequest();
  } else {
    xhr = new ActiveXObject("Microsoft.XMLHTTP");
  }
  xhr.onreadystatechange = function() {
    if (xhr.readyState == 4 && xhr.status == 200) {
      span.innerHTML = xhr.responseText;
      update(false);
    }
  };
  xhr.open("GET", (initialize ? "?update&now" : "?update"), true);
  xhr.send();
}
</script></head><body onload="update(true)">
  The current time is: <span id="time"></span>
</body></html>
% }

The [suspend] command in the currently published version of Wibble doesn't have the timeout feature, so for now you will have to replace the "suspend {} 1000" line with this:

% after 1000 [resume timeout]
% suspend timeout

Multiple cookies

ls6 2011-07-04:

Well, I needed a solution to the multiple cookies problem, so here's a small modification of the wibble code.

The idea behind this modification is as follows:

We can have only one entry with a particular key in any given dictionary while the HTTP header allows multiple Set-Cookie fields.
So we have to put more than one cookie into the wibble's 'header' dictionary (separated by '|' for example) and then
split them apart for sending.

Basically I've added one 'if' statement and the old code is still in 'else' clause.

P.S. I've included a snippet larger than necessary to make it easier to locate the change.

# Send the response header to the client.
chan puts $socket "HTTP/1.1 [dict get $response status]"
foreach {key val} [dict get $response header] {
    set normalizedkey [lsearch -exact -sorted -inline -nocase {
        Accept-Ranges Age Allow Cache-Control Connection
        Content-Disposition Content-Encoding Content-Language
        Content-Length Content-Location Content-MD5 Content-Range
        Content-Type Date ETag Expires Last-Modified Location Pragma
        Proxy-Authenticate Retry-After Server Set-Cookie Trailer
        Transfer-Encoding Upgrade Vary Via Warning WWW-Authenticate
    } $key]
    if {$normalizedkey ne ""} {
        set key $normalizedkey
    }
    #--modification starts here
    #allow multiple cookies (separated by |)
    if {$normalizedkey eq {Set-Cookie}} {
        foreach line [split $val |] {
            chan puts $socket "$key: $line"
        }
    } else {
        foreach line [split $val \n] {
            chan puts $socket "$key: $line"
        }
    }
    #--modification ends here
}
chan puts $socket ""

# If requested, send the response content to the client.

Setting multiple cookies looks like this:

dict set response header set-cookie {one=1|two=2|three=3}

AMG: I have the multiple cookie problem resolved as of today: 2011-11-24 [L20 ]. Thanks for your interim solution. My approach was far more complicated and invasive: change the way response headers are formatted. However, I think it's the right approach for long-term maintainability, and it was part of my original ambition, as shown by the fact that its structure mirrors that of the request headers.

Cookie convenience procs

jblz: These procs are taken from the ngci package, and modified for use with wibble.

proc ::wibble::cookie cookie {
    upvar header cheader
    if {[dict exists $cheader cookie $cookie ""]} {
       return [dict get $cheader cookie $cookie ""]
    }
}

proc ::wibble::setcookie args {
    upvar response cresponse
    array set opt $args
    dict set cresponse header set-cookie $opt(-name) "" $opt(-value)
    foreach extra {path domain} {
        if {[info exists opt(-$extra)]} {
            dict set cresponse header set-cookie $extra $opt(-$extra)
        }
    }
    if {[info exists opt(-expires)]} {
        dict set cresponse header set-cookie expires [list abstime $opt(-expires)]
    }
    if {[info exists opt(-secure)]} {
        dict set cresponse header set-cookie secure ""
    }
}

AMG: I updated this code for the new header formatting style, though I admit I didn't test. This should fix the problem you were having with not being able to set more than one cookie per request. Personally, I wouldn't use these wrapper procs; I'd just directly read and manipulate the request and response dicts; it's easy now. ;^)

[dehex] performance

SEH 2012-04-01: I'm studying wibble in hopes of replacing tclhttpd with it. Overall it looks great, but I still may want to carry over some of the best of tclhttpd's code to my project. As a small example, tclhttpd has a shorter, seemingly tighter proc for url-decoding strings as compared to wibble's dehex. To wit:

proc ::wibble::dehex str {
    set pos 0
    while {[regexp -indices -start $pos {%([[:xdigit:]]{2})} $str range code]} {
        set char [binary format H2 [string range $str {*}$code]]
        set str [string replace $str {*}$range $char]
        set pos [expr {[lindex $range 0] + 1}]
    }
    return $str
}

compare to:

proc UrlDecodeData data {
    regsub -all {([][$\\])} $data {\\\1} data
    regsub -all {%([0-9a-fA-F][0-9a-fA-F])} $data  {[format %c 0x\1]} data
    return [subst $data]
}

for good measure, here's ncgi's decode:

proc ::ncgi::decode str {
    # rewrite "+" back to space
    # protect \ from quoting another '\'
    set str [string map [list + { } "\\" "\\\\"] $str]

    # prepare to process all %-escapes
    regsub -all -- {%([A-Fa-f0-9][A-Fa-f0-9])} $str {\\u00\1} str

    # process \u unicode mapped chars
    return [subst -novar -nocommand $str]
}

Is there any evident reason to prefer one of these approaches over the others (leaving aside ncgi's handling of "+")? If one wishes wibble's performance to scale, small refinements may become crucial.

AMG: Thanks for the input! I wrote Wibble without looking at the source for tclhttpd or any other server, so it's very likely that there's plenty of room to tactically requisition good ideas. :^) Refinements are definitely welcome. To the best of my knowledge, Wibble has never been benchmarked, aside from a note that it's somehow incompatible with siege [L21 ]. Regarding [UrlDecodeData], there's some magic in there. Let me work it out:

regsub -all {([][$\\])} $data {\\\1} data: prefix brackets, dollar signs, and backslashes with backslashes
regsub -all {%([0-9a-fA-F][0-9a-fA-F])} $data {[format %c 0x\1]} data: replace % followed by two hexadecimal digits with a string that will later be interpreted as a command substitution
return [subst $data]: perform all the command substitutions inserted in the previous step, and remove all the extra backslashes inserted in first step

[::ncgi::decode] does very nearly the same thing, though it trusts subst's -novariables and -nocommands options rather than internally putting backslashes in front of dollar signs and brackets. Whenever I see [subst] given options, I immediately think "BUG", but I think this combination is actually safe. The bug, of course, is that [subst] is powerless to stop substitutions performed inside command substitution. It also can't stop command and backslash substitution when used to form the array element name in a variable substitution. But I don't see anything wrong with this exact combination of options. Also, the other difference in [::ncgi::decode] is that it uses \u instead of command substitution. Seems that should be faster.

A little commentary. What I do in Wibble is a straightforward translation of the logic required to do the conversion. The other two procedures employ trickery to get what they want in a roundabout way. They are also likely to be much faster. I've found that when a script performs less looping and manipulation and instead leaves more of the work to a command that's implemented in C, it runs faster, sometimes much faster, and it's usually worth massaging the data a little bit to fit the conventions imposed by that command.

Benchmark time!

[andy@toaster|~]$ cat /proc/cpuinfo
model name      : Pentium II (Deschutes)
cpu MHz         : 348.184
[andy@toaster|~]$ tclsh
% info patchlevel
8.6b2
% set str {[exit]%5bexit%5d$tcl_version%24tcl_version%555test%7e}
% dehex $str
[exit][exit]$tcl_version$tcl_versionU5test~
% UrlDecodeData $str
[exit][exit]$tcl_version$tcl_versionU5test~
% decode $str
[exit][exit]$tcl_version$tcl_versionU5test~
% time {dehex $str} 1000
1679.468 microseconds per iteration
% time {UrlDecodeData $str} 1000
1570.29 microseconds per iteration
% time {decode $str} 1000
795.562 microseconds per iteration

As expected, [UrlDecodeData] is faster because all the looping and processing is done in C rather than Tcl, and [::ncgi::decode] is much faster because it lets [subst] do the work internally rather than have to chain to [format].

So, here's the new version of [::wibble::dehex]:

proc ::wibble::dehex {str} {
    subst -novariables -nocommands\
        [regsub -all {%([[:xdigit:]]{2})} [string map {\\ \\\\} $str] {\\u00\1}]
}

This is [::ncgi::decode], translated to use the pithy trainwreck coding style used throughout Wibble. Performance:

% time {dehex $str} 1000
785.424 microseconds per iteration

That's 213.8% as fast as the original code! Also, if these timings seem high to you, keep in mind that they're for a 350MHz Pentium II.

Category Wibble