Version 58 of Literate programming in a wiki

Updated 2007-12-16 10:53:23 by dkf

19feb03 jcw - Here's a thought... Could we extend/use the wiki model to do Literate Programming?

The idea of Literate Programming is by Donald E. Knuth (see also [L1 ]):

  • Donald E. Knuth. "Literate Programming (1984)" in Literate Programming. CSLI, 1992, pg. 99:
        Let us change our traditional attitude to the construction of
        programs: Instead of imagining that our main task is to instruct
        a computer what to do, let us concentrate rather on explaining
        to human beings what we want a computer to do.

Wikit (any wiki in fact) has some striking similarities:

  • it can easily mix free text and source code
  • it has a way to cross-reference / hyperlink pages
  • page titles make natural chapter/section names

Would there be value in bringing these things together? I think so:

  • a single wikit document can be an entire self-documenting project (or package)
  • think of collaborative work, by altering the wiki through the web
  • there is a local mode, which may become good enough to edit with

Just rattling the cage a bit...

RS joins in rattling - you've been reading my mail!

KBK comes to the cage door to see who's been rattling it. I think that a number of us have been approaching literate programming on the Wiki obliquely for some time. There are pages (credit RS with the idea) that are both Wiki pages and fully functional Tcl scripts; my own Solving cryptograms is a good example. You can simply cut and paste such a page (or, more often, a portion of the page between rules) out of the Wiki into your Tcl interpreter and go. Yet the page looks like a typical Wiki page, not like a mass of source code.

KBK Clearly, others have been thinking along the same lines as well. Michael Cleverly's wiki-reaper (or escargo's wish-reaper based on it) goes in a different direction - assuming that all preformatted text in the page is code, and discarding the rest. It's convenient, in that people editing a page don't need to insert # at the start of paragraphs or set off lengthy discourse with if 0 { ... }. That's also convenient to the reader of the page. While I use # at start of paragraph as the best of the available options, we probably need some sort of postprocessor, perhaps a modified source command.

KBK We also need a better way to maintain the use-mention distinction. Commentary needs a way of talking about the code - perhaps with examples - without those examples getting executed. I've seen Richard accomplish this task by putting if 0 { ... } in places, but that's a pretty dreadful approach. Perhaps we could identify "code to be executed" in another way from "other preformatted text?"

KBK As we develop the scheme, we'll also need supporting tools (starting with an Emacs mode, but continuing to integrate our new notation into the development environment). One of the problems I always had with Knuth's web system was the amount of time a developer had to spend running utilities like weave and tangle to produce documentation and code, and the difficulty of tying the generated code back to the literate source. Surely with the glue that Tcl provides, we can do better.

KBK In short, jcw, you've once again put your finger on the pulse of the community.


RS replies: Yes, the dual-use files (valid Tcl; valid WikiML) make developing, testing, documenting terribly easy - and look much better than long # Courier comments... or carry images etc. No wonder I'm talking of "fun projects" ;-) But if 0 {...} does not appear dreadful to me - it is the recommended way for block comments, and even supported by the byte-code compiler. With auto-wrapping or non-wrapping editors, overlong # lines are less attractive - and the Wiki pages I write mostly start as text files that can fully be Tcl-sourced, so I can test and document on a single file. Only after I put them on the Wiki, they sometimes get edited to lose that dual-use capability.


KBK speaks for CL. JCW innovates beyond the latter's ability to keep up.


AM has some experience using Will Duquette's "expand" for the purpose of intermixing test code with explanatory descriptions. The idea was elegant (IMHO), but requires more elaboration. (Wondering if this will turn into pet project #53 or not ;)


NEM has been thinking about other uses of Metakit databases and Tcl, which might fit in with these thoughts somehow. The idea in particular being to move away from packages and files and just have procs which are in a MK database and loaded as needed. With the wiki approach, it would be nice to brainstorm ideas in a wiki, which results in some procs being created which can then automatically be used in any other wiki page. You could perhaps have a sort of package mechanism built on this:

 page require 8411

or

 page require "Literate programming in a wiki"

would include any code on this page, for instance.


escargo One problem with Literate Programming versus a Wiki is that the intended output from Literate Programming is so much different. In a literate programming source file you have program documentation and the program, but not necessarily in any particular order. You run the tangle and weave processes to produce finished documentation and finished source code. The processes satisfy any constraints imposed by the document production software (such as TeX or LaTeX) and the compiler (such as Pascal, which requires declarations to be in specific places).

In some sense, the Wiki is also like Source Code In Database (SCID). (I think there is a quotation from Kent Beck, contrasting Smalltalk code developed in its native IDE to Java code developed in an ordinary editor, something along the lines of, "Source code in files, how quaint.")

It would be wonderful to find a way to marry Literate Programming and the wiki, but this is similar to trying to combine a linear narrative to the wiki. The two problems are related.


AM What about this for a solution: the "Bauersachs braces" (*) are a very good start, but we could replace them with a more explicit construct:

   proc comment { text } { }

   comment {

This is a descriptive text that can be mixed in with any code

   } 

The Wiki formatter can remove the literal text "comment {" (and the accompanying close-brace) but restore them whenever the reader presses the "Download" button (which would be new to the page layout, next to "Edit ...".

Mind you: random thoughts in the hours before a complete sunrise.

(*) Richard Suchenwirth has a double surname, and I could not think of a proper word starting with R or S to achieve the alliteration I felt was appropriate. Hence this solution :)


RS ponders this variation:

 interp alias {} NB. {} if 0

Somehow "NB." feels more natural (and shorter) to me...


jcw - Arjen, Richard, you're assuming pages are code, and then need to find a way to insert prose.

How about the other way around - i.e. the more web-like (in the Lit Prog sense) approach of extracting code? All it takes is a bit of string manip, perhaps a modified "source" command, as was also mentioned. For example, it could look for lines starting as "<whitespace> @ {name} {body}" and keep going as long as the code brackets are not balanced:

This is text in wiki-markup style...

  @ {A simple demo} {
    while {$i > 0} {
      puts hello
      @ {First part of the demo loop}
      puts world
      @ {Second part of the demo loop}
    }
  }

And this is more text.

Where "@" would be defined as lookup-and-eval when given one arg, and as define-and store when given two?

The fun begins if you make the lookup interactive when the lookup fails - i.e. you could write incomplete code, and when an unknown part is needed, suspend, go back to wiki-edit mode, and resume?

Or what about code coverage: run tests (all within this same framework), and report which snippets were run, and which weren't.

Or profiling: track eval counts, even time spent...

Each wiki page would become a point-of-entry. Run that page, and something specific is activated. It would provide a way to set up documented test suites. But also to automate at a higher level: a page which when "run" generates the plain code, for packaging and distibution? Another page "generates" basic docs. Yet another fetches code from a repository, or even submits this code? In some scenario's, "running a page" could become the actual usage interface - i.e. distribute a tool as a wiki, and make the docs be the launchpad for certain tasks?

Note that due to the design of wikit, it is feasible to have a wikit.tkd file which represents such a "web", and to make it generate normal runnable scripts inside itself as VFS-based starkit. Or a separate starkit or a more traditional set of files, of course.


DKF - Fascinating page so far, but one question. What modes of usage are we envisioning here? Are we talking about ways of having pages that can be viewed as both Tcl scripts and HTML documents? Or ways of embedding code to be executed as part of preparing a page? Or some interesting combination of the two? Or something else that I've not had enough coffee yet to think of?

jcw - The former, I think. This is not code to drive a fancy "active wiki", but a wiki to represent a real-world app/tool/project. IOW, if this thing existed, I'd want to develop with it. Make it as active as possible (i.e. a short code-edit-debug cycle), but keep deployment in mind - a release would consist of activating the whole thing in a certain way, which produces a final app or package. Development docs removed, even bytecoded. IOW, my interest in this is to see whether such an approach could be a complete IDE.

Dreaming a bit more... - It could be useful to see these "@ {name} { ... }" Tcl snippets colorized properly. Maybe even hyperlinked, and/or popping up the page name on mouse-over? And to support C/C++ code with critcl. And to be able to edit just a single snippet, preferably also with an external editor. And to be able to "register" snippet collections, so they can have optional attribution and licensing, even authentification perhaps.

Note that these snippets are not procs, btw - they are really macro's. Procs fit in like any other bit of Tcl code, but the whole point of Literate Programming as I read D. Knuth's concept is to structure at the conceptual/algorithmic level - with tags which are independent of scoping rules. Procs are for code re-use, modularity, and to define an API.

DKF - I still don't understand, but it sounds interesting anyway. :^)

Are you saying that these labels have no meaning in the code that you'd get out of a page, and would in effect be elided from it or just present as comments or something like that? If so, that's no problem with me. :^)

RS: At the end of Macro facility for Tcl I have added a tentative implementation of @ which appears to work in simple tests.


escargo I have developed some tcl code of my own using Noweb[L2 ]. One of the valuable pieces of output is the cross-reference that lists all the variables and procedures. When HTML is output, you get links that take you to the definitions and uses.

Leo: Literate Editor with Outlines[L3 ] uses Python and Tkinter to implement a portable literate programming tool.

This kind of raises some issues of presentation with the wiki. As implemented, the wiki does not have much of a hierarchy; it is (apparently) a self-contained set of pages.

Is having a hierarchical presentation a key feature of a literate programming version of the wiki? (This relates to the linear presentation issue I mentioned earlier.)

Is having the automatic production of tables of contents and cross-references something that fits easily into the current wiki framework?

One of the reasons I used literate programming is so that I could easily put my source code files under configuration control with the documentation and code synchronized together. Can that be done with the wiki?


Lars H: I'd like to point out that the WEB family of literate programming systems probably isn't the best place to start if you're going to think about literate programming for Tcl. Many aspects of WEB were included because they are useful for languages in the Algol family, but they make less sense for languages like Tcl where program = data = string.

A more relevant style of literate programming is probably that which is used for LaTeX -- the docstrip system. This is much more lightweight than the WEB system and its descendants, since the heart of it is a simple mark-up distinguishing "codelines" from "comments", and the process of "docstripping" a file is in the simplest case merely that of copying those lines that are codelines to a new file. Unlike the case with WEB, the code is generally not reordered, so there is no need for a tangle program to put everything pack in place. Nor is there any separate weaving step involved, because the comments are so formatted that LaTeX can typeset them directly (if the standard doc package is loaded).

The strongest argument for taking this system seriously is however that it is the most successful literate programming system around! I'd eastimate around 90% of all LaTeX packages one encounters today use the doc/docstrip system of literate programming. WEB is nowhere near that, not even in the branch of TeX extensions (pdf-TeX, Omega, ...).

Docstrip is also nice in that it allows one to mix different but related pieces of code in different languages. I've got one project (to appear) that uses Tcl, MetaPost, MetaFont, and PostScript. Since I'm using docstrip to generate the various input files from a .dtx source file, I can e.g. put the MetaPost code for outputting a piece of data next to the Tcl code that parses that piece of output from MetaPost. This makes it much simpler to ensure that the different components stay compatible.

I'd should also mention that I've written a LaTeX (class and) package tclldoc for better supporting docstrip-style literate programming in Tcl. You can get that from here [L4 ] (WWW browsing of CVS archive, I'm afraid; it isn't the proper home for it). Two examples of what the typeset output can look like are [L5 ] (PDF, 300K) and [L6 ] (PDF, 98K), but I should really clean things up and make proper releases of these things.

And I digress, because the topic of this page is literate programming in a Wiki rather than LaTeX. Naturally this implies that one should use Wiki (not LaTeX) markup for the text parts, but I believe that the code part of docstrip would be useful to look at. The main problem posed above was that of how to distinguish "code part of the program" from "code part of the comments". One thing one can do here is to insert gaurd lines that act as markup making this distinction. With docstrip these guard lines begin with %<, but for Wiki I suspect it might be more useful to build on the ---- markup for section separators. What about

 The following procedure computes something useful
 ----*script
   proc foo {d n} {
      set alpha [expr {atan(1)*8/$n}]
 alpha is now the midpoint angle of a side in a regular n-gon.
      set s [expr {sin($alpha/2)}]
      set r2 [expr {$d*$d/(2*$s*$s)]
 ----/script
 we don't want to do that as
      set r2 [expr {$d*$d/(1-cos($alpha))}]
 because cancellation makes that much less accurate for large n.
 ----*script
      expr {$r2*sin($d)}
   }

which should be formatted something like

(BEGIN EXAMPLE)

The following procedure computes something useful

  proc foo {d n} {
     set alpha [expr {atan(1)*8/$n}]

alpha is now the midpoint angle of a side in a regular n-gon.

     set s [expr {sin($alpha/2)}]
     set r2 [expr {$d*$d/(2*$s*$s)]

we don't want to do that as

     set r2 [expr {$d*$d/(1-cos($alpha))}]

because cancellation makes that much less accurate for large n.

     expr {$r2*sin($d)}
  }

(END EXAMPLE)

i.e., these special ---- that are guards rather than section separators should not be visible in the formatted page. They can however be used as directions by programs that extract the "script" from a page. The * and / are derived from docstrip syntax. * signifies a "begin guard" whereas / signifies an "end guard". Thus the script part of the above would be the code lines between an * and a /:

  proc foo {d n} {
     set alpha [expr {atan(1)*8/$n}]
     set s [expr {sin($alpha/2)}]
     set r2 [expr {$d*$d/(2*$s*$s)]
     expr {$r2*sin($d)}
  }

The actual "extraction of the script" could be a function of the Wiki -- pages that have an extractable script could have a "download script" link at the bottom!

jcw - Thx for this info, lots of new things to digest. I am now wondering how the practice compares - web/weave/tangle appears to promote non-linear-coding, which sort of really appeals to me. How does that compare with docstrip's code-as-is-with-text-in-between approach? Isn't part of literate programming about being able to leave out detail, putting a descriptive phrase in instead? Let me add that I have no experience with either style, btw.

escargo - I am not sure that docstrip[L7 ] (but apparently not [L8 ]) counts as literate programming[L9 ]. Given the quality of output I saw at the web site, I have my doubts about its fidelity. (To me it appears as if the generated HTML is missing something, but it's hard to say if that is because of bad input or bad processing.)

I also could not find any mention of docstrip in the Literate Programming FAQ or in the Literate Programming web ring[L10 ].

It certainly may be a documentation tool, but again I don't think it's for literate programming (which is not the same as embedded documentation).

Embedded Documentation

While the goals of literate programming and embedded documentation are similar, the processes for them are much different. In literate programming, the documentation and program are mixed in the same file or files.

In embedded documentation, the documentation is embedded in the source of the program.

In literate programming, the generated documentation can explain the code (or expound on the code) in any order the writer feels to be natural.

I'll leave off for now and let somebody else have a turn.

Lars H The key difference between literate programming and embedded documentation is that in in the former it is the exposition that is primary, whereas the code comes second. WEB's features for rearranging small pieces of code so that you can say things like

  if res<0 then @<deal with missing data error>;

have their advantages, but their original inclusion in the system was simply motivated by restrictions in the Pascal standard (first one must define all types, then all variables, then all procedures, etc.) that one must not be imposed in a literate programming environment. Tcl (and LaTeX) are much more relaxed languages in this respect (just think of what can be done with uplevel and info), so such a feature wouldn't be that useful in this case. If most of your procedures are "snippet-length" to begin with, then there is no need for a feature that would allow you to cut them up further.


AM I took this page and a conversation with jcw to work out a dual idea: programmed story telling. Have a look - far from complete, but fun anyway.


RS tries out the possibilities of "literate explanations", and screenshots close to the code that produced them, in iFile: a little file system browser.


escargo 28 Feb 2003 - I have an idea strike while I was commuting this morning: What if a wiki instance could be treated as a community-based IDE? The pages would hold the code. (I can even see where there might be a way of holding requirements and testing data in some of the pages.)

The key thing might be that the unknown procedure would need to have a clever way of finding the right definition for a procedure inside wiki pages.

There would also need to be a way to specify the start of an application. (There might even be more than one application in the wiki at a time, or even multiple versions.)

Thus the wiki not only documents an application, it records design discussions, design decisions, potentially testing requirements and other requirements, and just about anything else.

Perhaps toggling a wiki between editable and noneditable makes the wiki executable as an application (or not).

What do you think?


01mar06 jcw - Three years have passed. Literate programming in a wiki is now available - see http://literateprograms.org/ and http://literateprograms.org/LiteratePrograms:How_to_write_an_article ... it looks interesting!

escargo 2 Mar 2006 - Thank you for providing that link. I see that the noweb [L16 ] literate programming system is the basis for the literate programming part of the wiki.

It's also interesting to see how wikis converged with testing: Fit: Framework for Integrated Test [L17 ] (reviewed by CL here [L18 ]) and FitNesse [L19 ] (reviewed here [L20 ]).

It surely seems like the approaches could be combined.


2006 March 8 dcoetzee - Hey all, I'm the founder of literateprograms.org. I'm rather embarassed now that I didn't realise discussions of this concept had been going on and even been tried out for years before I thought of it (and I didn't even give credit to your excellent probing of this idea). I hope you'll excuse my ignorance. I think I've addressed some of your concerns, but as of this moment I have no intention of using a wiki to host a large programming project, as there are some difficult fragility issues to address related to this. It'd be great if some of you could come down and give contributing at my wiki a try - there's plenty of unexplored territory, and thanks to some excellent advertising on Lambda the Ultimate, at this moment we're up to 61 articles and 37 registered users. Hope to see you there.

escargo - How did you find this reference to your wiki? (Maybe something about 'referrer' in clicks to your site?)

And I chanced upon another reference to it here, from a web site I hadn't known about: http://lambda-the-ultimate.org/node/1336

dcoetzee - escargo: I found the page just by searching Google for "literate programs wiki" (without quotes). I was trying to see if they'd crawled me yet and how high up I would be. I really should have tried that search sooner. Regarding the URL, that's the Lambda the Ultimate reference I was talking about; I was up on their front page for about a day, which got me a lot of attention.

escargo - Sorry; I didn't put 2 + 2 together to get 4 (i.e., I didn't realize that "Lambda the Ultimate" was a site or blog). By the time I came back to this page, I'd forgotten that you had mentioned it.

Did you check out the Fit and FitNesse links above? You could incorporate unit tests and literate programs into the same framework.

dcoetzee - That's a great link, escargo. Unfortunately that system makes an assumption of trusted users that I can't afford to make. I'd like to incorporate unit tests, but first I'll need to set up a secure jail for running code on the server. I like their concept of input-output tables and colouring them up right on the page - this could help to discover quickly when users are unwittingly introducing bugs, which is a big problem with code listings on Wikipedia.

escargo 9 Mar 2006 - Note this security flaw in noweb: http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2005-3342