Version 25 of A wiki-like markup language for the text widget

Updated 2005-10-20 19:52:09

March 5 2002

One of the things I think would be great for tklib, or even built-in to the text widget itself, is some sort of mark-up language that makes it easier to display rich text.

For example, sometimes I want to write a little splash screen, or maybe a simple "About..." box, or tiny help file to go with an application. The only choice at the moment to display rich text (bullets, italics, etc) is to hand-apply tags to ranges of text.

What I would like is a way to do something like this:

 set miniHelp {
  Thank you for trying this program. Here are the most
  important features:

  * feature 1
  * feature 2
    ...
 }

 toplevel .help
 pack [text .help.text]
 .help.text render $miniHelp

(there appears to be a bug in this wiki -- my preformatted text above is showing bullets when rendered, rather than the asterisks I intended to show up)

It might be convenient to just use the tcler's wiki rendering code, but I find that to be bit cumbersome, and rather difficult to understand. Plus, it's rather intertwined with wikit, and a tad bit buggy. But still a marvel to behold.

So..... I figured I'd brainstorm on this a bit and see how much interest there is in building a library to handle this. Mainly I'm just rambling to myself, so bear with me.

FWIW, I have code that implements a lot of this already, but have never quite found the time to package it all up.


What if there were an HTML widget in the core? Then the example above would become:

    set miniHelp {
        <p>Thank you for trying this program. Here are the most
        important features:</p>

        <ul>
        <li>feature 1</li>
        <li>feature 2</li>
        </ul>
    }

    toplevel .help
    pack [html .help.html]
    .help.html render $miniHelp

Wouldn't this be better than a wiki extension?


Point of discussion 1: How bad would it be to deviate from the format used by the tcler's wiki?

A couple years ago I implemented my own wiki (alas, lost to the corporate giants that laid me off a while ago...). In it I changed a few things from the way the tcler's wiki works. For example, bullets in my wiki were done without leading spaces.

Here's an example:

  • this is a level 1 bullet (no leading spaces)
    • this is level two; this item is spread across
       three physical lines in the source, and will 
       automagically be combined in the rendered output
  • another level 1 bullet

I also used sequences of "#" for numbered lists:

    # this will show up with a leading "1."
    ## this will show up as "1." (or a.) 
    ## this will show up as "2." (or b.) 
    # this will be "2."
    # this will be "3.", and so on...

I stole that idea from the meatball wiki [L1 ]. Also stolen from the meatball wiki are definition lists: ";term:definition". Unlike that wiki, my indented text has leading ">"'s instead of colons.

    ;concept1:definition1
    ;concept2:definition2

    > indented one level
    >> indented two levels

inline emphasis I'm still waffling on. For example, do we stick with sequences of single quotes, or do we go with common usage on usenet, such as *bold* and _underline_. I see advantages to both.


Point of discussion 2: "paragraph" versus "line" parsing

In most wikis today, lines are tackled one at a time, and the format of that line is guessed at based on its leading characters. When I wrote my wiki code a while back, I found this to be difficult to handle. Well behaved text was easy enough, but it left for some annoying edge cases.

I propose a paragraph-oriented solution. The source text would be split into paragraphs (separated by \n\n). The first N characters of the first line define the format for the whole paragraph. The advantage is that, within a paragraph, having one logical line broken into several physical lines is not a problem. Look at the bullet example above. Because we are in a "bulleted paragraph", lines that don't have bullets can be concatenated to the previous lines. This makes it easy to continue lines without having to resort to backslashes at the end of the lines.

In addition, this can give added control to the user. For example, if someone wants bulleted items to not have blank lines between them, they can all be put into a single paragraph. If you want spaces between them, make each bullet a separate paragraph. For numbered items, the numbers can start over for each paragraph. So you can have blocks of sequential numbers, a space, and then start all over again. (I'm probably not describing that very well)

I also find this conceptually easier to understand. Just say each paragraph is treated as a unit, and the first few characters describe how that block of data is handled.

Finally, by parsing along paragraph lines, it limits "bad" markup to a single paragraph. For example, if you have a dangling ] or something like that, it only affects one paragraph.


Point of discussion 3: "pluggable formats"

I've wondered about the ability to mimic unix's shebang method to describe how to process the contents of a wiki page. For example, say we wanted a wiki page to display tabular data. It might be marked up like this (using #? instead of #! to avoid confusion...)

    #? tabular
    row1 column1 | column2 | column3
    row2 column1 | column2 | column3

By default, wiki pages would be, well, wiki pages. But if you have a unique page you want to display in a unique format, that could be supported. We could have formats for address books, conference schedules, code listings, etc.

Another feature of this would be for forward-compatibility. Assume for the sake of argument we build a tklib module that groks the new format. We could use this feature within wikit to allow pages to be migrated. Existing pages could be preprocessed to include "#?old-style-wiki" (or something to that effect), and the renderer could know how to render that. As pages are hand-converted (or automatically converted) to the new format, those leading lines could be removed.


FWIW, there's a new standalone and quick rendering engine for wiki markup (95% of what this wiki does) inside Kiwi, see [L2 ]. Look for the file "wikihtml.tcl", it's intended to become used for a wikit successor (sample output in [L3 ]). Perhaps it can be retargeted at the text widget? -jcw


Hmmm. That certainly looks easier to understand than the existing wiki rendering engine. My specific goal is to get something that renders in a tk text widget with a minimum of fuss.


See also http://tkoutline.sourceforge.net/wiki/41.html and http://tkoutline.sourceforge.net/wiki/39.html for information on WYSIWYG wiki syntax highlighting library being developed for the tkoutline application.


There have been several markup conventions for rich text. At least one made it to an RFC [insert reference here] and one has been popular in the Mac community for a number of years [insert working URL for setext here], and then there are the Formatting Rules for the wiki

This thing you're talking about is called setext, and you should read this if you're interested in it: http://docutils.sourceforge.net/mirror/setext/setext_concepts_Aug92.etx.txt --ro


There is also the markup in Almost Free Text [L4 ] --escargo (10/25/2002)


The Iwidget scrolledhtml widget is extremely easy to use, and renders basic html nicely.


See htext and Structured teXt for similar ideas.


Category GUI