Wibble discussion

AMG: This page is for design discussion on the subject of the Wibble web server. For questions about how to use Wibble, see Wibble help. For discussion of bugs, see Wibble bugs. Older discussion can be found in the Wibble discussion archive.

Fetching backrefs...

SEH -- On the implementation page I've imposed on Andy's invitation to make code changes by submitting a version I've labeled 0.4.1.

AMG: Thank you! I was wondering when someone would take me up on this.

SEH: The changes are primarily directed toward performance. I'm excited about putting Wibble to use, but I want to make sure it scales as effectively as possible before trying to use it in heavy-duty environments. Behavior is unchanged by default in this version, new features are activated by setting namespace variables, see below:

AMG: Performance was not one of my design goals, but it's definitely very appreciated, and I should set it as a new goal.

SEH: I've eliminated many redundant function calls by storing the result of the first call in a variable and using the variable subsequently. For example, every connection makes several calls to clock seconds. At low loads it hardly matters, but if you're handling dozens or hundreds of connections per second, that's a lot of redundant calls. In this case I created a namespace variable called clock_seconds which is updated by a proc that calls itself every quarter second. All procs in Wibble read that variable rather than call clock seconds directly.

AMG: Regarding clock_seconds: On this system, [time] shows that getting the time from the variable takes only 60% as long as calling [clock seconds]. That's a good improvement. The question is whether quarter second timing is the best resolution. Perhaps it would be better to update the time whenever being called from the event loop, i.e. at the beginning of [process] and in [icc::get] immediately following [yield].

Regarding info_coroutine: Thank you for drawing my attention to this. [info coroutine] takes surprisingly long to complete. [time] on a coroutine that repeatedly yields its name takes 2.96 times as long when the coroutine calls [info coroutine] each time versus returning the value of a local variable.

Regarding [decode]: The name is rather generic; I'd prefer something more descriptive so that the name gives some clue to as to when it is useful. However, I can't think of any place it would need to be used outside of [dequery], so maybe it shouldn't be its own proc at all. One option is to embed it as a lambda inside [dequery], but that still leaves the issue of code duplication with [dehex]. To deal with all these issues at the same time, I suggest dropping [decode] in favor of adding a second "$premap" argument to [dehex] with a defaulted value of {\\ \\\\}. [dequery] can call [dehex] with the second argument instead set to {+ " " \\ \\\\}, and [dehex] would in turn pass this argument to [string map]. Also, update the comment for [dehex] to mention that $premap, if specified, must include {\\ \\\\} elements. A perhaps slower alternative would be for $premap to default empty and for [decode] to always [lappend] these two elements.

SEH -- Any of these would work fine I'm sure. I had a secret agenda in doing it this way, that being I could overload the proc entirely and replace it with an equivalent call from websh, which due to being coded in C results in substantial speedup.

AMG: I take it this websh command does exactly the same thing as your [decode]. Does that mean it's generally useful? The processing it does it replace %HH with the named byte (wonder how that's supposed to interact with Unicode) and + with space. I believe this is called URL encoding. Hex encoding ([enhex]/[dehex]) works the same except spaces map to %20.

SEH: Right now Wibble loops through every handler looking for path matches for every connection. OK when the number of handlers is small, but I envision usages of Wibble where there are dozens or hundreds of handlers. The new version creates a hierarchically-structured dict in parallel with the list of handlers. Default behavior is unchanged, but if the namespace variable prequalify_headers is set to 1, only headers that match the request path will be extracted from the dict and evaluated. Thus irrelevant handlers are excluded from the loop before it starts.

AMG: How does this interact with rewriting the request path? I only see one call to [get_handlers], so I'm concerned that a stale handler list may be called after rewriting the request path.

SEH -- Yes, quite so, which is why I made the default behavior unchanged. In my version, I envision that if a path is rewritten by a handler, the handler returns with retryrequest, which results in a recursive call to getrequest with a fresh set of handlers generated by get_handlers.

AMG: I actually did not want rewriting to involve restarting, since that could result in a loop which would lock up the server. Instead I found it sufficient for the cases I could envision for the zone handler list to be sorted such that rewriting is done sooner rather than later. For anything else, there's [redirect], and a HTTP 307 command could be added.

SEH: Wibble stores post information in the request dict and passes it as part of the dict to handlers to be processed as desired. But post data, e.g. in the case of multiple file uploads, is potentially huge. Copying it multiple times down the call stack poses the threat of taking up a lot of time and memory. I added the option to store post data in local variables and put only the call stack level in the request dict, so a custom handler can use upvar to access the data. The option is activated by setting the namespace variable conserve_post_memory to 1.

AMG: I'm not convinced this actually conserves any memory, since the post dictionary value ought to be shared. Is there a reason why it wouldn't be? If so, I'd rather fix that reason than make handlers have to use [upvar] to find the post data.

SEH -- Definitely would like to follow up on this. I don't know enough about Tcl internals to be certain; I assume that when a variable is passed to a proc in its args list, a copy is made of the variable in the new scope and the original value is pushed onto the stack. The copy can be edited without affecting the original value, and then is discarded when the proc returns. Is that accurate? I was hoping to save having to push multiple copies of the post data onto the stack as the call tree is traversed. If I'm wrong on this I'm more than happy to rip the code out.

AMG: Tcl uses copy-on-write for its values (a.k.a. Tcl_Objs). So long as writing (i.e. modification) isn't done on values that have more than one reference, no copying occurs. Building and passing around data structures of any size is simply an exercise in passing references.

SEH: One non-performance addition I made was to resolve a problem I had with the basic architecture of how Wibble handles requests. A handler is free to rewrite the path of a request and thus cause it to match on subsequent handlers it otherwise wouldn't have. But Wibble only makes one pass through the list of handlers. What if the handler you wanted the new path value to match on has already gone by in the loop? I added a new proc that a handler can use to return with, analogous to sendresponse and nexthandler, called resetrequest. This leads to a recursive call to getresponse with a new request state dict, which causes the new path value to be matched against all handlers once again. This works especially well with the new prequalify_headers feature, which ensures that only relevant handlers are tried on each getresponse attempt.

AMG: My great preference would be to redo the zone handler system such that the sequence of zone handlers is actually an executable Tcl script. This may provide the feature you request in a cleaner, more flexible fashion. As it stands, the zone handler list is effectively a script that's interpreted by [getresponse] rather than Tcl, and the script language provides no looping or branching (except exiting), but it does have the ability to fork such that subsequent handlers are executed in parallel, as part of a breadth-first search. Per MS, it would be prohibitively expensive to implement coroutine cloning, so I don't think I can have this feature. I might have to just suck it up and add line labels and gotos. :^)

I'd be interested to see a use case for restarting the zone handler search. What situations can you think of in which this would be helpful?

SEH -- The use case is when the path value is radically rewritten, for example if I wanted to alias one path to another completely different one. As it stands with W0.4, I would have to balance carefully the handler aliasing with the ordering within the zonehandlers var to make sure the aliased path always hit its intended handler. And if I tried to do this with multiple paths I might hit a tangle I couldn't resolve. My solution gordian knots the problem by restarting the getrequest procedure, which starts the handler processing loop again with a refreshed handler list, thus guaranteeing that the aliased path will hit its intended handler. In this case I am willing to take a performance hit for the sake of conceptual completeness.

AMG: As I mention above, use [redirect] (HTTP 301) or HTTP 307. Is there some reason why these are inadequate or unwelcome for your application?

Also, I would like to hear your thoughts on what I said regarding recasting the zone handler configuration as an executable script. There are several important features I'd like to add to Wibble (virtual hosts, character encoding, compression) which I think would be best expressed using this approach.

SEH -- (non-threaded plenary response to preserve readability:)

Re [decode] - Link to websh doc on decoding/encoding utilities: [L1 ]. ::web::uridecode takes about 10% of the time ::wibble::decode requires.

Re [redirect] - HTTP redirect is the traditional approach, but it is increasingly being discouraged for sites aspiring to high performance (See [L2 ]). My goal is to remove obstacles to scalability so I can use Wibble in high-demand commercial environments. What is [redirect] after all but a means of refreshing [getresponse] by getting the whole internet involved? Let's eliminate the middleman. The infinite looping dangers are the same as far as I can see regardless.

Re post data - Thanks for the insights (forehead slap). I'll have to re-evaluate my changes. I still worry about the point in [getresponse] where you composite the request and response dicts into a system dict. Is there a copy there, or does Tcl remember that the system dict is a composite of two separate origin dicts? And there are handlers which rewrite the request part of the state dict. Presumably a copy happens there?

Re recasting of zone handler processing - My gut reaction is that it would be a shame to lose what I think is one of Wibble's best aspects: the sweet spot you've found between simplicity and power in the zone handler approach. My initial suggestion would be not to touch it, but to add a post-processing hook in ::wibble::process that works analogously to the zone handlers; let the user register post-processing handlers to which the state dict will be passed, then let the handler code decide what to do with the response data.

# end plenary response.

AMG: I can certainly add [enuri] and [deuri] to behave like [enhex] and [dehex] but with space/plus transposition. The names may look weird, but they're consistent. "en" and "de" are short for "encode" and "decode", and "uri" is the name of the coding scheme. These commands will be used by [enquery] and [dequery]. I'm tempted to implement [enuri] and [deuri] in terms of [enhex] and [dehex], and I will if performance doesn't suffer too much. Either way, I'll be satisfying your goal of providing a hook for inserting [web::uriencode] and [web::uridecode] into Wibble.

I'm still looking for a use case where it's not possible to perform redirect-type decisions very early in the zone handler sequence. Presumably, the server's not going to want to waste any time generating page content until it's sure it's not going to have to start over with a different URI. The one place where stock Wibble uses redirection ([dirslash]) actually happens first (*) in the list, and it could easily do it without the [redirect] command; the only reason it doesn't is to avoid content duplication in caches. (* Actually it's second, only because [vars] is there for demonstration and debugging.)

Regarding looping: Modern browsers already detect redirect loops. Adding this code to Wibble will only slow it down, and doing it properly would involve maintaining a history of state dictionaries and repeatedly scanning them to see if any old ones have been revisited, while ignoring fields which can differ without changing outcomes; this would be a lot like "solving" the Halting Problem. I say, let the browser do its job, and keep the server simple enough that it can't get stuck in an infinite loop, at least until there's a solid use case demonstrating the need for loops.

SEH -- I understand where you're coming from. In the end it will be easy enough for me to customize my own implementations to boost performance. Besides saving HTTP round trips, the use case for me becomes obvious when using the prequalify_handlers feature. In that scenario all but a handful of handlers will be excluded from consideration before the loop even starts. In that case it will be almost certainly necessary to refresh the handler list and start over if a handler rewrites the path more than trivially. As I said I'm looking forward to when there are hundreds of handlers and it will be impractical to loop through them all on every connection.

Dicts share data as easily as lists or variables or arguments. The internal representation of a dict is nothing more than a doubly linked lists of key/value Tcl_Obj pointer pairs, indexed by a hash table on the keys' string representations. Putting one dict inside another simply increments the first dict's reference count. No copy is done unless that child dict is later modified, if at modification time its reference count is greater than one.

SEH -- Excellent. Life is good. Will rip out more code.

AK Regarding Tcl's copy-on-write, etc. and evaluating if/when it happens you should always distinguish between the data structure itself, like list, dict, versus the elements of said data structure. Many operations are only on the data structure and while that may (See K) require copying to amend it, the elements themselves are not changed, and thus not copied themselves, but simply referenced from the new. This is true for all list operations. For dict we have some exceptions, i.e. 'dict append/incr/lappend' operate on an element, so that element must be copied, beyond copying the structure. Still, the other elements are unchanged and reused by reference.

My concept for zone handlers was to turn the traditional concept of features inside-out. Web servers (most applications, really) usually more features than desired, and you have to track them down and turn them off, plus you have to hope some vestige doesn't unexpectedly assert itself. I instead thought to create a framework in which a web server could be implemented by having the user add the appropriate features, such as the ability to send a file when requested by name. Whatever's left out, is not merely disabled, it's actually not even part of the server.

I don't think changing the zone handler list to a script would negatively impact its ease of use. Let's do an example:

if {[zonematch /vars]} {
    vars
}
dirslash root $root
indexfile root $root indexfile index.html
contenttype typetable {...}
staticfile root $root
scriptfile root $root
templatefile root $root
dirlist root $root
notfound

Doesn't seem all that complicated to me. Plus, this optimizes zone handler matching, and it opens the door to more advanced or custom matching, such as virtual hosting. Moreover, the pieces can be organized into [proc]s in order to keep things manageable in a large site with many virtual hosts. If [getresponse] is reduced to a wrapper around [try], it can be nested to enable intercepting intermediate responses in order to apply further processing, such as compression, character encoding, and common site styling.

The only reason I can't do this Right Now is because of the state system concept, wherein there can be more than one extant state dictionary. Each state dictionary in the system must be processed in parallel, basically a breadth-first search. [getresponse] tries each handler against each state before it's willing to advance to the next handler. This behavior is analogous to a process forking and its clones synchronizing their execution to run in lockstep. Sadly, Tcl cannot fork a coroutine in progress. But perhaps I can find another paradigm. Why do I even want more than one state dictionary? One sample Wibble zone handler uses this feature: [indexfile]. [indexfile] can't know in advance whether the server will be able to satisfy requests for the directory or the index file. [dirlist] may not be in the zone handler list, or the index file might not exist; but hey, maybe it doesn't exist, but [templatefile] knows how to produce it anyway. The solution I chose was to make [indexfile] not commit to anything, but instead propose two possible state dictionaries, either of which might result in a satisfactory result.

Your suggestion about a post-processing hook appears to be very similar to my notion of dividing the zone handler list into phases. When [sendresponse] (or a renamed version thereof) is called, processing fast-forwards to the next phase barrier, at which point more stuff can be done. One way to look at it: phase barriers are line labels, and [sendresponse] is goto. If the line labels are named and are addressed by the goto, more logic is possible, though I hesitate to permit backward jumps due to infinite loop concerns.

Okay, moving on from zone handlers...

When is [promote_handler] useful? I always assumed the zone handler list would be a one-time configuration thing and wouldn't change throughout execution of the server.

SEH -- I envision a configuration process analogous to tclhttpd, where arbitrary scripts may be loaded at startup. I may want to make each such script responsible for loading and positioning its handlers, rather than having a single centralized handler-setting action.

I'm looking at the [chan pending input] call for contentchan. Does that really work? I don't think this will reliably indicate how much can be read from the channel, only how much can be read before Tcl is forced to replenish any I/O buffers. By the way, the use case I had in mind for contentchan was serving SQLite incrblobs. (Due to an SQLite bug, this doesn't actually work: [L3 ].)

SEH -- In the broadest sense you're right. But I believe a sensibly-written handler which returns a contentchan should not return until all data is known to have been packed into the channel. The chan manual page indicates that chan write actions will block until all data to be written is guaranteed accepted by the OS. Thus I believe that the only scenarios that line won't work under will be perverse and have problems beyond not reporting accurate data size. I could be wrong here too.

The line return [getresponse] looks like it could be optimized: tailcall getresponse. Also, what's the [unset] immediately before it for?

SEH -- A slip-up. Already cleaned up.

Charset, gzip, and expanding the zone handler system

AMG: While pondering the charset problem, in particular trying to figure out how to define the HTTP charset/Tcl encoding map, it occurred to me that this is very similar to the content-type problem, which I solved using a zone handler. The big difference setting the content-type via zone handler and setting the charset via zone handler is that the latter is impossible. :^) The reason it's impossible is that the content-type can be set (guessed) in advance, whereas applying the charset can only be done after the content is generated. The current Wibble architecture terminates all zone handlers when [sendresponse] is called, so by then it's too late to run a charset zone handler. Before calling [sendresponse], each zone handler could call a charset helper proc to apply the encoding to the response content and to add the charset attribute to the content-type response header.

But then I thought, this is again very similar to what needs to be done for gzip. It's a processing phase between content generation and sending the content to the client. Now I have two cases suggesting that customizable response postprocessing could be useful after [sendresponse] but before sending to the client. The custom sendcommand key could do that, but it would also have to do all the other send tasks (or chain to [defaultsend]), plus the sendcommand would have to be put in the response dict by each zone handler).

I wonder if this postprocessor system should be made as flexible as the zone handler system... probably no, I can't see a reason why it would need to be only available for parts of the site. But then again, perhaps it could be used to implement stuff beyond what I listed above. For instance, the site may have common headers and footers, which the postprocessor could prepend and append to the content, but only to certain regions of the site.

Here's an idea. Expand the zone handler system to permit multiple phases of processing. Whenever [sendresponse] is called, the current phase terminates and the next begins. The last phase, which may be hard-coded, is to call the sendcommand. Something like this, perhaps:

    ::wibble::handle /vars vars
    ::wibble::handle / dirslash root $root
    ::wibble::handle / indexfile root $root indexfile index.html
    ::wibble::handle / contenttype typetable {...}
    ::wibble::handle / staticfile root $root
    ::wibble::handle / scriptfile root $root
    ::wibble::handle / templatefile root $root
    ::wibble::handle / dirlist root $root
    ::wibble::handle / notfound
    ::wibble::phasebarrier
    ::wibble::handle / applycharset
    ::wibble::handle / compressgzip

One open question is whether or not multiple, alternative states would survive the phase barrier. Let's say that a second state is produced somewhere in the first phase, the first state results in a [sendresponse] such that the second phase is entered, but the second phase fails to itself call [sendresponse]. Should the second state be pursued, or should the default 501 error be sent? I think I know how to do the multiple state thing, but I will need to be convinced that it's a good idea before I do it. It would have the drawback of both attempting to serve index.html and generate a directory listing whenever a directory is requested. Only the first response would actually be sent to the client, but even so the second one would be generated.

Here's another way of looking at things. Each phase is basically a meta-zone handler, which recursively enters [getresponse] or a functional equivalent. To make this work, I can change [getresponse] to take its configuration from an argument rather than a global variable, and I can make it follow the same conventions as a normal zone handler (takes state dict argument, calls [sendresponse] or [nexthandler] to return). Also I would have to move the 501 error generation to [process], which is fine.

If I do as I describe, the zone handler system would branch into a tree which can:

Work as it currently does, i.e. one phase.
Work using multiple phases, executed one after the other.
Work using a hierarchy of phases, if you're crazy enough to create such a thing.

All sorts of things would be possible. The hard part is coming up with uses for the new flexibility. :^) There have been requests for more powerful forms of zone handling. Maybe this will help.

So, how would this be configured? The current system is nice because it makes it easy to substitute variables into the list of zone handlers. Having a linear sequence of phases would not present a problem, since a single [phasebarrier] command would be all that's necessary to signal that it's time to start putting handlers into a new phase. But if the zones are in a hierarchy, it will likely be necessary to employ nesting, which generally means brace quoting, which conflicts with variable substitution. Right about now I really wish I had parentheses as a shortcut for [list]...

JBR 2011-1203 I like this but think that the syntax outlined above is clunky. You could add an arg to wibble::handle to indicate the phase to apply this zone handler to or add a new command wibble::phase and use it like this:

wibble::phase generation
wibble::handle / ...
wibble::handle / ...

wibble::phase delivery
wibble::handle / ...
...

Just some ideas. Thanks John

AMG: That's cool, thanks. Another option would be to include the phase name in the argument to [handle], but I think I prefer your approach, since it's more convenient to write, and this part of the code is intended to resemble a configuration file.

I need to spend some time thinking about how best to implement this. I have a couple options in mind, which I will be evaluating.

By the way, in an earlier draft of the above I wrote "stage" instead of "phase", but I changed because the word "stage" appeared far too similar to "state", to the point where I was misreading what I had written.

2011 11 25 Andy - Thanks for the new wibble code dump. I agree that we need to move to a real source control system. Please just make a choice and go there. I thought that I mailed you feedback but sorry if I didn't. I posted new working WebSocket code to the wiki. I still have several zone handlers that I haven't shared yet though.

The chunked encoding exists so that the server can start sending content without knowing the content size.

JBR

AMG: Thanks for the report. See [L4 ] for my thoughts on chunked encoding from server to client. At present, Wibble doesn't try to send chunked, only receive. A custom sendcommand can use chunked if it likes, but I really ought to put chunked into [defaultsend] for the case of contentchan with no contentsize.

As for the source control system, I don't want to officially make the move until I find a way to drag along all the documentation, including page revision history.

htmlgen + wibble

jblz: UPDATE 1-1-11 (HAPPY NEW YEAR!)

I may move this to its own page if i end up using htmlgen in a serious way, but for the meantime, this is how to use htmlgen with wibble:

Download xmlgen from sourceforge
Decompress the directory, and place it in a directory listed in your $::auto_path
Because htmlgen creates a proc named "script" (for the "script" html tag, of course), you will need to rename the "script" zonehandler to something like "ascript"
Create a file such as the following in your wibble $root directory, with a name like foo.html.script

# this is a reimplementation of the code in the /vars zonehandler
# adapted to using htmlgen

package require htmlgen
namespace import ::htmlgen::*

dict set response header content-type text/html

::htmlgen::buffer ::DOC {
    html ! {
        head ! {
            title - Playing Vars
            style type=text/css - {
                body {font-family: monospace}
                table {border-collapse: collapse; outline: 1px solid #000}
                th {white-space: nowrap; text-align: left}
                th, td {border: 1px solid #727772}
                tr:nth-child(odd) {background-color: #ded}
                tr:nth-child(even) {background-color: #eee}
                th.title {background-color: #8d958d; text-align: center}
            }
        }
        body ! {
            div id=wrapper ! {
                table ! {
                    foreach {dictname dictvalue} $state {
                        if {$dictname in {request response}} {
                            set dictvalue [dumpstate $dictvalue]
                        }
                        tr ! {
                            th class=title colspan=2 + {
                                [enhtml $dictname]
                            }
                        }
                        foreach {key value} $dictvalue {
                            tr ! {
                                th + {[enhtml $key]}
                                td + {[enhtml $value]}
                            }
                        }
                    }
                }
            }
        }
    }
}
    
dict append response content $::DOC

A few points worth mentioning

the default zonehandler sends .script files as text/plain, so make sure you change the "response header content-type" to text/html
because we have to send the finished page to the response dict, you'll need to use ::htmlgen::buffer to capture the output

Thanks to kbk in tcler's chat for helping me with wrangling htmlgen.

AMG: I updated the above code for the change from [dumprequest] to [dumpstate]. I don't have htmlgen, so I'd appreciate it if you could test for me.

ISO8859-1 limitation

dzach: What is the purpose of this line in ::wibble::process?

 } elseif {[dict exists $response content]} {
   dict set response content [encoding convertto iso8859-1 [dict get $response content]]

Content that is already in iso8859-1 is not converted. Content that has been already converted to utf-8 by a user defined zone handler cannot be converted back to iso8859-1 with this code.

AMG: The problem is that Wibble doesn't officially support anything other than ISO8859-1, since that is HTTP's assumed character set when nothing else is explicitly specified. It's quite possible to add that feature, but I just haven't done so. Actually, it's now more possible than ever due to the vastly improved header parsing code.

AMG: I put a little bit of thought into encodings. It seems to me that it's a mistake to have the zone handler advertise the encoding for outgoing data, since at that point it's just a Tcl string. Instead, the send procedure should just convert to the client's favorite encoding, disregarding those not supported by Tcl, then add the charset to the content-type. I'll have to make a table mapping from HTTP encoding names to Tcl encoding names. Has someone already made this table?

I'm not entirely sure what to do with data streamed from disk or other channel. Unlike the string case, I don't positively know the source encoding. Neither can I convert it to the client's favorite encoding, nor can I correctly advertise the existing charset. For the moment I stream it unmolested and hope the browser guesses right. I guess a zone handler could be written to inject a charset according to user-supplied configuration options, then the send code can convert if the charset isn't among those accepted by the client.

Another issue: What do I do about characters not in the target charset? Better question: how do I know which characters aren't in the target charset? They get translated to "?", but so do real "?"'s, and thanks to variable-width encodings (e.g. utf-8), I wouldn't be able to reliably map from the encoded string back to the source, short of encoding one character at a time. Also, is "?" present in every charset? For HTML and maybe XML, I should be able to encode the foreign characters as HTML entities, if only I knew which ones needed to be encoded. It looks like I'm at the mercy of the [encoding] command here. Dear everybody, Please use Unicode! Love, Andy.

SEH 20120403 -- There's a version of wibble 0.1 in github [L5 ]. It looks like dzach edited a line in wibble::getrequest [L6 ], which in ver. 0.3 would be equivalent to changing:

    regsub -all {(?:/|^)\.(?=/|$)} [dehex $path] / path

to:

    regsub -all {(?:/|^)\.(?=/|$)} [encoding convertfrom utf-8 [dehex $path]] / path

The note on the edit is: "UTF-8 characters are not decoded properly. Converting from utf-8 does it, while it does not change iso8859-1 chars." Does this achieve anything of value, and should it be incorporated into the latest wibble version?

AMG: It might work for one particular browser in one particular configuration, but it's nonstandard. The HTML <form> is supposed to have an accept-charset attribute to specify the encoding. See my discussion with dzach: [L7 ]. I may end up incorporating that change anyway, but I'd also have to require the users to always use accept-charset="utf-8" in all <form>s. As far as I can tell, that's the only way to get all major, current browsers to behave alike. The linked page includes my browser compatibility survey.

Fossil

AMG: Several folks have suggested that Wibble has outgrown its current ad-hoc revision control and issue management system, which would be this wiki. I mostly agree. The current system requires a lot of manual labor, and I don't often have the time. Also I would like to expand the discussion, examples, documentation, etc. without feeling guilty that I'm hijacking somebody else's wiki. But I'm conflicted; there's a lot that I like about hosting Wibble on this wiki. In particular, it's very open; anyone can make changes and comments without getting an account or using a special client or anything like that. It's popular, it's maintained, it looks nice, it has a couple spiffy features, there are many great local pages to link to, other pages link back to Wibble, you can search page contents as well as titles, etc.

Fossil comes closest to the current system, and I am seriously considering it. In fact, I've already made a couple abortive attempts to import Wibble into a Fossil repository. The main thing that's stopping me is Fossil's markup interpretation for commit and ticket messages. If I could get a decoration mode instead of a markup mode, I'd be happy, but the current system requires a lot of work in order to get correct formatting (e.g. not losing square brackets, which all by itself is a deal-breaker). Also those workarounds break the command-line interface in order to get the common web interface pages right, and I will have to undo them again later when I finally get the decoration mode I need. (Tcl/Tk's own Fossil import has the same issues that Wibble does.)

Fossil has a built-in wiki which is nice for hosting discussion. I just played with the Fossil wiki a bit. It works, but it's as minimal as can be. However, it does support most HTML, so tables and such are possible. All in all, I'd prefer a few extra features, mostly to ease importing the pages and history from this wiki into Fossil. For instance, italic text and bold text are bracketed with <i>...</i> and <b>...</b> instead of double and triple apostrophes, respectively. Likely I will have to write a script to convert the markup. Also there's no double-bracket rule, so inserting brackets verbatim (which will be a very common task) requires that the brackets be spelled "&\#91;" and "&\#93;" (without the \'s; I had to insert those to dodge a bug in our wiki [L8 ]), or "<nowiki>[</nowiki>" and "<nowiki>]</nowiki>", which is even uglier. In fact, I think the lack of a double-bracket rule is also a deal-breaker, the same as the message markup problem described above.

Fossil also has a versioned documentation system, which makes it easy to get the documentation that matches any particular version of the code. That's a definite plus over the current ad-hoc system Wibble currently has, where you have to do some sleuthing to find the right version of the documentation pages. Of course, I haven't been good about updating the documentation anyway. ;^) But if I could version the documentation alongside the code, I would definitely make the effort to keep them in sync.

The Fossil wiki versioning is separate from the code/documentation versioning. They're on the same timeline, which makes correlation possible but not automatic. But they're still essentially separate. This creates a problem. I'd like for the documentation, examples, etc. pages to be in the wiki, open to editing and discussion. But for me to version them with the code, they have to be files. The only thing I can figure is to have manually synchronized parallel copies, both as files and as wiki pages. The wiki pages will only be regarded as documenting the latest version of Wibble, whereas the files are authoritative for whichever version of Wibble they came with. This seems like I'm trading one problem for another. Right now I have to tie all the pages together manually, but that's easier than manually copying everything between the wiki and files. I don't have to change the formatting of the files, but I might still want to, since if they're wiki-formatted then the user will need Fossil to properly read them.

I think what I need is a tool to extract a snapshot of the wiki, which I can commit as files. Maybe this tool can also prerender everything into HTML. If I am to open up the official documentation pages for editing and commenting, there will be discussion interspersed with normative verbiage. Most of the time when reading documentation you will want to skip the discussion, but it can be valuable on occasion. So it makes sense to be able to specially bracket the discussion and show/hide it using JavaScript/CSS trickery or simply by exporting two copies of each file, the official version and the annotated version.

Suggestions? Comments?

escargo 2011-03-19 - OK. These comments are from somebody sitting on the sidelines, but maybe they will be helpful, or at least thought-provoking.

Some of the problems you are having sound like features missing from Fossil. Perhaps they can be added. Where you say, "I will have to write a script...," that sounds like where a feature needs to be added to Fossil. Alternatively, maybe there needs to be a feature added to this wiki that makes exporting easier for other programs to import, an exchange format almost.

Where you mention, "manually synchronized parallel copies," that sounds like a need for a function that can link file and wiki page contents. (If I recall correctly, Fossil is really storing things in a data base, and probably exporting them or serving them as files or pages respectively as required. That would mean making Fossil understand some embedded query string that tells it to provide some tagged version of data as an exported file or current wiki page, if I understand correctly.) Perhaps the way to proceed is to enhance the underlying software so that moving to Fossil is less painful and that Fossil would better support how you need it to work. Your requirements might not be unique.

git

jnc: I don't know how you feel about Git, but I imported Wibble into a git repo: http://github.com/jcowgar/wibble . When working with GitHub, you have a decent bug tracker, a wiki and a **very** easy means to have user contribute changes/bug fixes.

dzach: Good to see that jnc. I added some code that I had hanging around for some time to the GitHub repo: gzip support, Cross-Origin Resource Sharing HTTP response headers [L9 ] , and a few more things. Hopefully this will be the new home for wibble's code from now on. AMG ?

AMG: Thanks for the show of support, but I still don't feel positively about officially hosting Wibble in git. Today I took another stab at getting Wibble into Fossil. I seem to have overcome the wikification problem for ticket titles and comments, but I still have to take a look at check-in comments. Also I need to find a way to semi-automate importing this wiki, or portions thereof, into the Fossil database. This is because this wiki serves as Wibble's documentation, and the documentation should be versioned and delivered alongside the source. Opening the versioned wiki (or a branch thereof) to editing via the web interface also remains an open issue.

jnc (Nov 25 2011): Just happy to see Wibble progressing and moving into any SCM system. Fossil is a good choice. About documentation, I am not sure what you mean. You wish to maintain documentation here on wiki.tcl.tk? Why not just make a doc/ directory in fossil and put the contents of the wiki pages into the doc/ directory? Update on the wiki, then when making a release, just wget/curl the wiki source pages you wish to include in the fossil repo?

AMG: I consider all this discussion to be part of the documentation. Yes, I plan to make a directory like you describe, but this wiki has a different format than that of Fossil. I need a conversion script, but none exists at the moment. This script needs to do more than simply convert formats; it must also be able to differentiate between internal links (pointers to other pages included in the Wibble archive) and external links (pointers to pages here in the Tcl Wiki). But yes, once I have that script, I plan to do just as you say.

jnc: I'd see maintaining documentation in two different formats and a conversion script as problematic. The fossil wiki format is very limited, making this conversion more difficult. Then each time either format changed or allowed new syntax, the script would be broken. I would choose just one format and stick with that. With fossil you do not have to use the wiki. You can disable access to the whole thing, to everyone. Or you can create a wiki page that simply points to the wiki here. It'd be a shame to have wibble progress waiting on a conversion script between incompatible wiki formats.

AMG: While Fossil's wiki may be limited, it does also support HTML, which will definitely do the job. There already exists a script to convert from the Tcl wiki markup to HTML; you're looking at its output right now. ;^) I could just use wget to harvest the generated HTML from this wiki and put it in Fossil, though it would be appropriate to postprocess away the page framing and fix up the internal/external links.

HTTP status codes

AMG: Here's a useful pictorial guide to HTTP status codes: [L10 ] [L11 ].

Category Wibble