Version 78 of Wibble discussion

Updated 2012-09-21 17:01:51 by SEH

AMG: This page is for design discussion on the subject of the Wibble web server. For questions about how to use Wibble, see Wibble help. For discussion of bugs, see Wibble bugs. Older discussion can be found in the Wibble discussion archive.

Fetching backrefs...

SEH -- On the implementation page I've imposed on Andy's invitation to make code changes by submitting a version I've labeled 0.4.1.

AMG: Thank you! I was wondering when someone would take me up on this.

SEH: The changes are primarily directed toward performance. I'm excited about putting Wibble to use, but I want to make sure it scales as effectively as possible before trying to use it in heavy-duty environments. Behavior is unchanged by default in this version, new features are activated by setting namespace variables, see below:

AMG: Performance was not one of my design goals, but it's definitely very appreciated, and I should set it as a new goal.

  • SEH: I've eliminated many redundant function calls by storing the result of the first call in a variable and using the variable subsequently. For example, every connection makes several calls to clock seconds. At low loads it hardly matters, but if you're handling dozens or hundreds of connections per second, that's a lot of redundant calls. In this case I created a namespace variable called clock_seconds which is updated by a proc that calls itself every quarter second. All procs in Wibble read that variable rather than call clock seconds directly.

AMG: Regarding clock_seconds: On this system, [time] shows that getting the time from the variable takes only 60% as long as calling [clock seconds]. That's a good improvement. The question is whether quarter second timing is the best resolution. Perhaps it would be better to update the time whenever being called from the event loop, i.e. at the beginning of [process] and in [icc::get] immediately following [yield].

Regarding info_coroutine: Thank you for drawing my attention to this. [info coroutine] takes surprisingly long to complete. [time] on a coroutine that repeatedly yields its name takes 2.96 times as long when the coroutine calls [info coroutine] each time versus returning the value of a local variable.

Regarding [decode]: The name is rather generic; I'd prefer something more descriptive so that the name gives some clue to as to when it is useful. However, I can't think of any place it would need to be used outside of [dequery], so maybe it shouldn't be its own proc at all. One option is to embed it as a lambda inside [dequery], but that still leaves the issue of code duplication with [dehex]. To deal with all these issues at the same time, I suggest dropping [decode] in favor of adding a second "$premap" argument to [dehex] with a defaulted value of {\\ \\\\}. [dequery] can call [dehex] with the second argument instead set to {+ " " \\ \\\\}, and [dehex] would in turn pass this argument to [string map]. Also, update the comment for [dehex] to mention that $premap, if specified, must include {\\ \\\\} elements. A perhaps slower alternative would be for $premap to default empty and for [decode] to always [lappend] these two elements. SEH -- Any of these would work fine I'm sure. I had a secret agenda in doing it this way, that being I could overload the proc entirely and replace it with an equivalent call from websh, which due to being coded in C results in substantial speedup.

  • SEH: Right now Wibble loops through every handler looking for path matches for every connection. OK when the number of handlers is small, but I envision usages of Wibble where there are dozens or hundreds of handlers. The new version creates a hierarchically-structured dict in parallel with the list of handlers. Default behavior is unchanged, but if the namespace variable prequalify_headers is set to 1, only headers that match the request path will be extracted from the dict and evaluated. Thus irrelevant handlers are excluded from the loop before it starts.

AMG: How does this interact with rewriting the request path? I only see one call to [get_handlers], so I'm concerned that a stale handler list may be called after rewriting the request path. SEH -- Yes, quite so, which is why I made the dafault behavior unchanged. In my version, I envision that if a path is rewritten by a handler, the handler returns with retryrequest, which results in a recursive call to getrequest with a fresh set of handlers generated by get_handlers.

  • SEH: Wibble stores post information in the request dict and passes it as part of the dict to handlers to be processed as desired. But post data, e.g. in the case of multiple file uploads, is potentially huge. Copying it multiple times down the call stack poses the threat of taking up a lot of time and memory. I added the option to store post data in local variables and put only the call stack level in the request dict, so a custom handler can use upvar to access the data. The option is activated by setting the namespace variable conserve_post_memory to 1.

AMG: I'm not convinced this actually conserves any memory, since the post dictionary value ought to be shared. Is there a reason why it wouldn't be? If so, I'd rather fix that reason than make handlers have to use [upvar] to find the post data. SEH -- Definitely would like to follow up on this. I don't know enough about Tcl internals to be certain; I assume that when a variable is passed to a proc in its args list, a copy is made of the variable in the new scope and the original value is pushed onto the stack. The copy can be edited without affecting the original value, and then is discarded when the proc returns. Is that accurate? I was hoping to save having to push multiple copies of the post data onto the stack as the call tree is traversed. If I'm wrong on this I'm more than happy to rip the code out.

  • SEH: One non-performance addition I made was to resolve a problem I had with the basic architecture of how Wibble handles requests. A handler is free to rewrite the path of a request and thus cause it to match on subsequent handlers it otherwise wouldn't have. But Wibble only makes one pass through the list of handlers. What if the handler you wanted the new path value to match on has already gone by in the loop? I added a new proc that a handler can use to return with, analogous to sendresponse and nexthandler, called resetrequest. This leads to a recursive call to getrequest with a new request state dict, which causes the new path value to be matched against all handlers once again. This works especially well with the new prequalify_headers feature, which ensures that only relevant handlers are tried on each getrequest attempt.

AMG: My great preference would be to redo the zone handler system such that the sequence of zone handlers is actually an executable Tcl script. This may provide the feature you request in a cleaner, more flexible fashion. As it stands, the zone handler list is effectively a script that's interpreted by [getresponse] rather than Tcl, and the script language provides no looping or branching (except exiting), but it does have the ability to fork such that subsequent handlers are executed in parallel, as part of a breadth-first search. Per MS, it would be prohibitively expensive to implement coroutine cloning, so I don't think I can have this feature. I might have to just suck it up and add line labels and gotos. :^)

I'd be interested to see a use care for restarting the zone handler search. What situations can you think of in which this would be helpful? SEH -- The use case is when the path value is radically rewritten, for example if I wanted to alias one path to another completely different one. As it stands with W0.4, I would have to balance carefully the handler aliasing with the ordering within the zonehandlers var to make sure the aliased path always hit its intended handler. And if I tried to do this with multiple paths I might hit a tangle I couldn't resolve. My solution gordian knots the problem by restarting the getrequest procedure, which starts the handler processing loop again with a refreshed handler list, thus guaranteeing that the aliased path will hit its intended handler. In this case I am willing to take a performance hit for the sake of conceptual completeness.


Charset, gzip, and expanding the zone handler system

AMG: While pondering the charset problem, in particular trying to figure out how to define the HTTP charset/Tcl encoding map, it occurred to me that this is very similar to the content-type problem, which I solved using a zone handler. The big difference setting the content-type via zone handler and setting the charset via zone handler is that the latter is impossible. :^) The reason it's impossible is that the content-type can be set (guessed) in advance, whereas applying the charset can only be done after the content is generated. The current Wibble architecture terminates all zone handlers when [sendresponse] is called, so by then it's too late to run a charset zone handler. Before calling [sendresponse], each zone handler could call a charset helper proc to apply the encoding to the response content and to add the charset attribute to the content-type response header.

But then I thought, this is again very similar to what needs to be done for gzip. It's a processing phase between content generation and sending the content to the client. Now I have two cases suggesting that customizable response postprocessing could be useful after [sendresponse] but before sending to the client. The custom sendcommand key could do that, but it would also have to do all the other send tasks (or chain to [defaultsend]), plus the sendcommand would have to be put in the response dict by each zone handler).

I wonder if this postprocessor system should be made as flexible as the zone handler system... probably no, I can't see a reason why it would need to be only available for parts of the site. But then again, perhaps it could be used to implement stuff beyond what I listed above. For instance, the site may have common headers and footers, which the postprocessor could prepend and append to the content, but only to certain regions of the site.

Here's an idea. Expand the zone handler system to permit multiple phases of processing. Whenever [sendresponse] is called, the current phase terminates and the next begins. The last phase, which may be hard-coded, is to call the sendcommand. Something like this, perhaps:

    ::wibble::handle /vars vars
    ::wibble::handle / dirslash root $root
    ::wibble::handle / indexfile root $root indexfile index.html
    ::wibble::handle / contenttype typetable {...}
    ::wibble::handle / staticfile root $root
    ::wibble::handle / scriptfile root $root
    ::wibble::handle / templatefile root $root
    ::wibble::handle / dirlist root $root
    ::wibble::handle / notfound
    ::wibble::phasebarrier
    ::wibble::handle / applycharset
    ::wibble::handle / compressgzip

One open question is whether or not multiple, alternative states would survive the phase barrier. Let's say that a second state is produced somewhere in the first phase, the first state results in a [sendresponse] such that the second phase is entered, but the second phase fails to itself call [sendresponse]. Should the second state be pursued, or should the default 501 error be sent? I think I know how to do the multiple state thing, but I will need to be convinced that it's a good idea before I do it. It would have the drawback of both attempting to serve index.html and generate a directory listing whenever a directory is requested. Only the first response would actually be sent to the client, but even so the second one would be generated.

Here's another way of looking at things. Each phase is basically a meta-zone handler, which recursively enters [getresponse] or a functional equivalent. To make this work, I can change [getresponse] to take its configuration from an argument rather than a global variable, and I can make it follow the same conventions as a normal zone handler (takes state dict argument, calls [sendresponse] or [nexthandler] to return). Also I would have to move the 501 error generation to [process], which is fine.

If I do as I describe, the zone handler system would branch into a tree which can:

  • Work as it currently does, i.e. one phase.
  • Work using multiple phases, executed one after the other.
  • Work using a hierarchy of phases, if you're crazy enough to create such a thing.

All sorts of things would be possible. The hard part is coming up with uses for the new flexibility. :^) There have been requests for more powerful forms of zone handling. Maybe this will help.

So, how would this be configured? The current system is nice because it makes it easy to substitute variables into the list of zone handlers. Having a linear sequence of phases would not present a problem, since a single [phasebarrier] command would be all that's necessary to signal that it's time to start putting handlers into a new phase. But if the zones are in a hierarchy, it will likely be necessary to employ nesting, which generally means brace quoting, which conflicts with variable substitution. Right about now I really wish I had parentheses as a shortcut for [list]...

JBR 2011-1203 I like this but think that the syntax outlined above is clunky. You could add an arg to wibble::handle to indicate the phase to apply this zone handler to or add a new command wibble::phase and use it like this:

wibble::phase generation
wibble::handle / ...
wibble::handle / ...

wibble::phase delivery
wibble::handle / ...
...

Just some ideas. Thanks John

AMG: That's cool, thanks. Another option would be to include the phase name in the argument to [handle], but I think I prefer your approach, since it's more convenient to write, and this part of the code is intended to resemble a configuration file.

I need to spend some time thinking about how best to implement this. I have a couple options in mind, which I will be evaluating.

By the way, in an earlier draft of the above I wrote "stage" instead of "phase", but I changed because the word "stage" appeared far too similar to "state", to the point where I was misreading what I had written.


2011 11 25 Andy - Thanks for the new wibble code dump. I agree that we need to move to a real source control system. Please just make a choice and go there. I thought that I mailed you feedback but sorry if I didn't. I posted new working WebSocket code to the wiki. I still have several zone handlers that I haven't shared yet though.

The chunked encoding exists so that the server can start sending content without knowing the content size.

JBR

AMG: Thanks for the report. See [L1 ] for my thoughts on chunked encoding from server to client. At present, Wibble doesn't try to send chunked, only receive. A custom sendcommand can use chunked if it likes, but I really ought to put chunked into [defaultsend] for the case of contentchan with no contentsize.

As for the source control system, I don't want to officially make the move until I find a way to drag along all the documentation, including page revision history.


htmlgen + wibble

jblz: UPDATE 1-1-11 (HAPPY NEW YEAR!)

I may move this to its own page if i end up using htmlgen in a serious way, but for the meantime, this is how to use htmlgen with wibble:

  1. Download xmlgen from sourceforge
  2. Decompress the directory, and place it in a directory listed in your $::auto_path
  3. Because htmlgen creates a proc named "script" (for the "script" html tag, of course), you will need to rename the "script" zonehandler to something like "ascript"
  4. Create a file such as the following in your wibble $root directory, with a name like foo.html.script
# this is a reimplementation of the code in the /vars zonehandler
# adapted to using htmlgen

package require htmlgen
namespace import ::htmlgen::*

dict set response header content-type text/html

::htmlgen::buffer ::DOC {
    html ! {
        head ! {
            title - Playing Vars
            style type=text/css - {
                body {font-family: monospace}
                table {border-collapse: collapse; outline: 1px solid #000}
                th {white-space: nowrap; text-align: left}
                th, td {border: 1px solid #727772}
                tr:nth-child(odd) {background-color: #ded}
                tr:nth-child(even) {background-color: #eee}
                th.title {background-color: #8d958d; text-align: center}
            }
        }
        body ! {
            div id=wrapper ! {
                table ! {
                    foreach {dictname dictvalue} $state {
                        if {$dictname in {request response}} {
                            set dictvalue [dumpstate $dictvalue]
                        }
                        tr ! {
                            th class=title colspan=2 + {
                                [enhtml $dictname]
                            }
                        }
                        foreach {key value} $dictvalue {
                            tr ! {
                                th + {[enhtml $key]}
                                td + {[enhtml $value]}
                            }
                        }
                    }
                }
            }
        }
    }
}
    
dict append response content $::DOC

A few points worth mentioning

  • the default zonehandler sends .script files as text/plain, so make sure you change the "response header content-type" to text/html
  • because we have to send the finished page to the response dict, you'll need to use ::htmlgen::buffer to capture the output

Thanks to kbk in tcler's chat for helping me with wrangling htmlgen.

AMG: I updated the above code for the change from [dumprequest] to [dumpstate]. I don't have htmlgen, so I'd appreciate it if you could test for me.


ISO8859-1 limitation

dzach: What is the purpose of this line in ::wibble::process?

 } elseif {[dict exists $response content]} {
   dict set response content [encoding convertto iso8859-1 [dict get $response content]]

Content that is already in iso8859-1 is not converted. Content that has been already converted to utf-8 by a user defined zone handler cannot be converted back to iso8859-1 with this code.

AMG: The problem is that Wibble doesn't officially support anything other than ISO8859-1, since that is HTTP's assumed character set when nothing else is explicitly specified. It's quite possible to add that feature, but I just haven't done so. Actually, it's now more possible than ever due to the vastly improved header parsing code.

AMG: I put a little bit of thought into encodings. It seems to me that it's a mistake to have the zone handler advertise the encoding for outgoing data, since at that point it's just a Tcl string. Instead, the send procedure should just convert to the client's favorite encoding, disregarding those not supported by Tcl, then add the charset to the content-type. I'll have to make a table mapping from HTTP encoding names to Tcl encoding names. Has someone already made this table?

I'm not entirely sure what to do with data streamed from disk or other channel. Unlike the string case, I don't positively know the source encoding. Neither can I convert it to the client's favorite encoding, nor can I correctly advertise the existing charset. For the moment I stream it unmolested and hope the browser guesses right. I guess a zone handler could be written to inject a charset according to user-supplied configuration options, then the send code can convert if the charset isn't among those accepted by the client.

Another issue: What do I do about characters not in the target charset? Better question: how do I know which characters aren't in the target charset? They get translated to "?", but so do real "?"'s, and thanks to variable-width encodings (e.g. utf-8), I wouldn't be able to reliably map from the encoded string back to the source, short of encoding one character at a time. Also, is "?" present in every charset? For HTML and maybe XML, I should be able to encode the foreign characters as HTML entities, if only I knew which ones needed to be encoded. It looks like I'm at the mercy of the [encoding] command here. Dear everybody, Please use Unicode! Love, Andy.

SEH 20120403 -- There's a version of wibble 0.1 in github [L2 ]. It looks like dzach edited a line in wibble::getrequest [L3 ], which in ver. 0.3 would be equivalent to changing:

    regsub -all {(?:/|^)\.(?=/|$)} [dehex $path] / path 

to:

    regsub -all {(?:/|^)\.(?=/|$)} [encoding convertfrom utf-8 [dehex $path]] / path 

The note on the edit is: "UTF-8 characters are not decoded properly. Converting from utf-8 does it, while it does not change iso8859-1 chars." Does this achieve anything of value, and should it be incorporated into the latest wibble version?

AMG: It might work for one particular browser in one particular configuration, but it's nonstandard. The HTML <form> is supposed to have an accept-charset attribute to specify the encoding. See my discussion with dzach: [L4 ]. I may end up incorporating that change anyway, but I'd also have to require the users to always use accept-charset="utf-8" in all <form>s. As far as I can tell, that's the only way to get all major, current browsers to behave alike. The linked page includes my browser compatibility survey.


Fossil

AMG: Several folks have suggested that Wibble has outgrown its current ad-hoc revision control and issue management system, which would be this wiki. I mostly agree. The current system requires a lot of manual labor, and I don't often have the time. Also I would like to expand the discussion, examples, documentation, etc. without feeling guilty that I'm hijacking somebody else's wiki. But I'm conflicted; there's a lot that I like about hosting Wibble on this wiki. In particular, it's very open; anyone can make changes and comments without getting an account or using a special client or anything like that. It's popular, it's maintained, it looks nice, it has a couple spiffy features, there are many great local pages to link to, other pages link back to Wibble, you can search page contents as well as titles, etc.

Fossil comes closest to the current system, and I am seriously considering it. In fact, I've already made a couple abortive attempts to import Wibble into a Fossil repository. The main thing that's stopping me is Fossil's markup interpretation for commit and ticket messages. If I could get a decoration mode instead of a markup mode, I'd be happy, but the current system requires a lot of work in order to get correct formatting (e.g. not losing square brackets, which all by itself is a deal-breaker). Also those workarounds break the command-line interface in order to get the common web interface pages right, and I will have to undo them again later when I finally get the decoration mode I need. (Tcl/Tk's own Fossil import has the same issues that Wibble does.)

Fossil has a built-in wiki which is nice for hosting discussion. I just played with the Fossil wiki a bit. It works, but it's as minimal as can be. However, it does support most HTML, so tables and such are possible. All in all, I'd prefer a few extra features, mostly to ease importing the pages and history from this wiki into Fossil. For instance, italic text and bold text are bracketed with <i>...</i> and <b>...</b> instead of double and triple apostrophes, respectively. Likely I will have to write a script to convert the markup. Also there's no double-bracket rule, so inserting brackets verbatim (which will be a very common task) requires that the brackets be spelled "&\#91;" and "&\#93;" (without the \'s; I had to insert those to dodge a bug in our wiki [L5 ]), or "<nowiki>[</nowiki>" and "<nowiki>]</nowiki>", which is even uglier. In fact, I think the lack of a double-bracket rule is also a deal-breaker, the same as the message markup problem described above.

Fossil also has a versioned documentation system, which makes it easy to get the documentation that matches any particular version of the code. That's a definite plus over the current ad-hoc system Wibble currently has, where you have to do some sleuthing to find the right version of the documentation pages. Of course, I haven't been good about updating the documentation anyway. ;^) But if I could version the documentation alongside the code, I would definitely make the effort to keep them in sync.

The Fossil wiki versioning is separate from the code/documentation versioning. They're on the same timeline, which makes correlation possible but not automatic. But they're still essentially separate. This creates a problem. I'd like for the documentation, examples, etc. pages to be in the wiki, open to editing and discussion. But for me to version them with the code, they have to be files. The only thing I can figure is to have manually synchronized parallel copies, both as files and as wiki pages. The wiki pages will only be regarded as documenting the latest version of Wibble, whereas the files are authoritative for whichever version of Wibble they came with. This seems like I'm trading one problem for another. Right now I have to tie all the pages together manually, but that's easier than manually copying everything between the wiki and files. I don't have to change the formatting of the files, but I might still want to, since if they're wiki-formatted then the user will need Fossil to properly read them.

I think what I need is a tool to extract a snapshot of the wiki, which I can commit as files. Maybe this tool can also prerender everything into HTML. If I am to open up the official documentation pages for editing and commenting, there will be discussion interspersed with normative verbiage. Most of the time when reading documentation you will want to skip the discussion, but it can be valuable on occasion. So it makes sense to be able to specially bracket the discussion and show/hide it using JavaScript/CSS trickery or simply by exporting two copies of each file, the official version and the annotated version.

Suggestions? Comments?

escargo 2011-03-19 - OK. These comments are from somebody sitting on the sidelines, but maybe they will be helpful, or at least thought-provoking.

Some of the problems you are having sound like features missing from Fossil. Perhaps they can be added. Where you say, "I will have to write a script...," that sounds like where a feature needs to be added to Fossil. Alternatively, maybe there needs to be a feature added to this wiki that makes exporting easier for other programs to import, an exchange format almost.

Where you mention, "manually synchronized parallel copies," that sounds like a need for a function that can link file and wiki page contents. (If I recall correctly, Fossil is really storing things in a data base, and probably exporting them or serving them as files or pages respectively as required. That would mean making Fossil understand some embedded query string that tells it to provide some tagged version of data as an exported file or current wiki page, if I understand correctly.) Perhaps the way to proceed is to enhance the underlying software so that moving to Fossil is less painful and that Fossil would better support how you need it to work. Your requirements might not be unique.


git

jnc: I don't know how you feel about Git, but I imported Wibble into a git repo: http://github.com/jcowgar/wibble . When working with GitHub, you have a decent bug tracker, a wiki and a **very** easy means to have user contribute changes/bug fixes.

dzach: Good to see that jnc. I added some code that I had hanging around for some time to the GitHub repo: gzip support, Cross-Origin Resource Sharing HTTP response headers [L6 ] , and a few more things. Hopefully this will be the new home for wibble's code from now on. AMG ?

AMG: Thanks for the show of support, but I still don't feel positively about officially hosting Wibble in git. Today I took another stab at getting Wibble into Fossil. I seem to have overcome the wikification problem for ticket titles and comments, but I still have to take a look at check-in comments. Also I need to find a way to semi-automate importing this wiki, or portions thereof, into the Fossil database. This is because this wiki serves as Wibble's documentation, and the documentation should be versioned and delivered alongside the source. Opening the versioned wiki (or a branch thereof) to editing via the web interface also remains an open issue.

jnc (Nov 25 2011): Just happy to see Wibble progressing and moving into any SCM system. Fossil is a good choice. About documentation, I am not sure what you mean. You wish to maintain documentation here on wiki.tcl.tk? Why not just make a doc/ directory in fossil and put the contents of the wiki pages into the doc/ directory? Update on the wiki, then when making a release, just wget/curl the wiki source pages you wish to include in the fossil repo?

AMG: I consider all this discussion to be part of the documentation. Yes, I plan to make a directory like you describe, but this wiki has a different format than that of Fossil. I need a conversion script, but none exists at the moment. This script needs to do more than simply convert formats; it must also be able to differentiate between internal links (pointers to other pages included in the Wibble archive) and external links (pointers to pages here in the Tcl Wiki). But yes, once I have that script, I plan to do just as you say.

jnc: I'd see maintaining documentation in two different formats and a conversion script as problematic. The fossil wiki format is very limited, making this conversion more difficult. Then each time either format changed or allowed new syntax, the script would be broken. I would choose just one format and stick with that. With fossil you do not have to use the wiki. You can disable access to the whole thing, to everyone. Or you can create a wiki page that simply points to the wiki here. It'd be a shame to have wibble progress waiting on a conversion script between incompatible wiki formats.

AMG: While Fossil's wiki may be limited, it does also support HTML, which will definitely do the job. There already exists a script to convert from the Tcl wiki markup to HTML; you're looking at its output right now. ;^) I could just use wget to harvest the generated HTML from this wiki and put it in Fossil, though it would be appropriate to postprocess away the page framing and fix up the internal/external links.


HTTP status codes

AMG: Here's a useful pictorial guide to HTTP status codes: [L7 ] [L8 ].