Version 31 of A wiki-like markup language for the text widget

Updated 2010-06-11 19:25:02 by LarrySmith

ZhangWeiwu This page should have been renamed to "lightweight markup language". When this page is first created, the concept of lightweight markup language is not as ubiquity as now, referring to it as an "idea" is okay. Now it should be referred to as an existing concept. With the same reason I removed the description of the idea. Will the wikimaster agree the renaming, and change this page name? If so, remove this very paragraph too, please.

LV In general, pages don't get renamed. Instead, someone who strongly believes that a page should be named something else copies the text from one page to another and deletes the original. Hopefully, that doesn't trigger a series of responses from others moving things back and forth between two pages. In my opinion, most of the original topic from this page could indeed be moved to a page called light-weight markup languages. I however also see a purpose for this page, which would focus on creating a library to provide a markup language, specifically wiki-like, which could be somehow plugged into a text widget.

There exists many lightweight markup languages for easy creation or display of rich content; for example, those with many features, like asciidoc, and the language of this very wiki, and some have only a few features, like that of the Thunderbird email client which interprets email messages following a convention.

This page's purpose is to collect and categorize libraries that offer some lightweight markup language feature. Sometimes it is difficult to tell a lightweight markup language from a simplified use of a heavier markup language, like HTML. Thus the reader of this page might be interested in the page HTML widgets too. To be clear on scope, this page doesn't address the support of HTML.

Larry Smith I can't help but feel that any kind of markup is just so...20th century. A wiki really should use a WYSIWYG editor that understands enough HTML markup for the wiki to function. CKEditor (formerly FCKEditor) is one such tool, TinyMCE is another . Such tools obviate the need for dealing with the ugly, slimey innards of html markup while also eliminating the need to learn the markup - which inevitably differs from other otherwise similar wiki's. The differences with markups between (for example) tiddlywiki and this wiki are enough to drive me to the man pages nearly every time I post.

Librarymarkup languageLatest releasesupport imagessupport hyperlinkas widgetas CGI
tkoutlinen.a. *2008NoNoEdit and ViewNo
kiwikiwi2001?YesNoEdit and View

* Developers who make use of it must define their own markup language. See also http://tkoutline.sourceforge.net/wiki/41.html and http://tkoutline.sourceforge.net/wiki/39.html for information on a WYSIWYG wiki syntax highlighting library being developed for the tkoutline application.


Point of discussion 1: How bad would it be to deviate from the format used by the tcler's wiki?

A couple years ago I implemented my own wiki (alas, lost to the corporate giants that laid me off a while ago...). In it I changed a few things from the way the tcler's wiki works. For example, bullets in my wiki were done without leading spaces.

Here's an example:

  • this is a level 1 bullet (no leading spaces)
    • this is level two; this item is spread across
       three physical lines in the source, and will 
       automagically be combined in the rendered output
  • another level 1 bullet

I also used sequences of "#" for numbered lists:

    # this will show up with a leading "1."
    ## this will show up as "1." (or a.) 
    ## this will show up as "2." (or b.) 
    # this will be "2."
    # this will be "3.", and so on...

I stole that idea from the meatball wiki [L1 ]. Also stolen from the meatball wiki are definition lists: ";term:definition". Unlike that wiki, my indented text has leading ">"'s instead of colons.

    ;concept1:definition1
    ;concept2:definition2

    > indented one level
    >> indented two levels

inline emphasis I'm still waffling on. For example, do we stick with sequences of single quotes, or do we go with common usage on usenet, such as *bold* and _underline_. I see advantages to both.


Point of discussion 2: "paragraph" versus "line" parsing

In most wikis today, lines are tackled one at a time, and the format of that line is guessed at based on its leading characters. When I wrote my wiki code a while back, I found this to be difficult to handle. Well behaved text was easy enough, but it left for some annoying edge cases.

I propose a paragraph-oriented solution. The source text would be split into paragraphs (separated by \n\n). The first N characters of the first line define the format for the whole paragraph. The advantage is that, within a paragraph, having one logical line broken into several physical lines is not a problem. Look at the bullet example above. Because we are in a "bulleted paragraph", lines that don't have bullets can be concatenated to the previous lines. This makes it easy to continue lines without having to resort to backslashes at the end of the lines.

In addition, this can give added control to the user. For example, if someone wants bulleted items to not have blank lines between them, they can all be put into a single paragraph. If you want spaces between them, make each bullet a separate paragraph. For numbered items, the numbers can start over for each paragraph. So you can have blocks of sequential numbers, a space, and then start all over again. (I'm probably not describing that very well)

I also find this conceptually easier to understand. Just say each paragraph is treated as a unit, and the first few characters describe how that block of data is handled.

Finally, by parsing along paragraph lines, it limits "bad" markup to a single paragraph. For example, if you have a dangling ] or something like that, it only affects one paragraph.


Point of discussion 3: "pluggable formats"

I've wondered about the ability to mimic unix's shebang method to describe how to process the contents of a wiki page. For example, say we wanted a wiki page to display tabular data. It might be marked up like this (using #? instead of #! to avoid confusion...)

    #? tabular
    row1 column1 | column2 | column3
    row2 column1 | column2 | column3

By default, wiki pages would be, well, wiki pages. But if you have a unique page you want to display in a unique format, that could be supported. We could have formats for address books, conference schedules, code listings, etc.

Another feature of this would be for forward-compatibility. Assume for the sake of argument we build a tklib module that groks the new format. We could use this feature within a wikit to allow pages to be migrated. Existing pages could be preprocessed to include "#?old-style-wiki" (or something to that effect), and the renderer could know how to render that. As pages are hand-converted (or automatically converted) to the new format, those leading lines could be removed.


Hmmm. That certainly looks easier to understand than the existing wiki rendering engine. My specific goal is to get something that renders in a tk text widget with a minimum of fuss.


There have been several markup conventions for rich text. At least one made it to an RFC (Are you thinking of [L2 ]?) and one has been popular in the Mac community for a number of years [L3 ], and then there are the Formatting Rules for the wiki

This thing you're talking about is called setext, and you should read this if you're interested in it: http://docutils.sourceforge.net/mirror/setext/setext_concepts_Aug92.etx.txt --ro


There is also the markup in Almost Free Text [L4 ] --escargo (10/25/2002)


See htext and Structured teXt for similar ideas.


See also Notebook App's markup.


GRIDPLUS has a text widget markup facility [L5 ].


LV From a purely markup point of view, AK's work in doctools allows one to mark up in doctools, then generate wiki, html, and other markups.