[Lars H], 2010-02-03:
Some months ago, while working on [OpenMath] things, I grew tired of the [XML] syntax — it just felt cluttered and clumpsy with all those `</>="'` characters. I wanted something more lightweight (for me as author), so where to turn, if not to [Tcl]? First for sketching things out, and later as a format that programs would actually process, I began to use something I dubbed TDL (for Tcl Data Language, or perhaps — in analogy with Tool Command Language and [Tool Protocol Language] — Tool Data Language).

First, an example to illustrate the idea. A small fragment of [OpenMath] is
    <OMA>
        <OMS cd="symocat1" name="label"/>
        <OMS cd="Hopf-algebra" name="mult"/>
        <OMA>
            <OMS cd="list1" name="list"/>
            <OMV name="a"/>
        </OMA>
        <OMA>
            <OMS cd="list1" name="list"/>
            <OMV name="b"/>
            <OMV name="c"/>
        </OMA>
    </OMA>
This is rather cluttered. The same thing as TDL could be
    OMA {
        /OMS symocat1 label
        /OMS Hopf-algebra mult
        OMA {/OMS list1 list; /OMV a}
        OMA {/OMS list1 list; /OMV b; /OMV c}
    }
or if not so specialised
    OMA {
        OMS cd symocat1 name label
        OMS cd Hopf-algebra name mult
        OMA {OMS cd list1 name list; OMV name a}
        OMA {
            OMS cd list1 name list
            OMV name b
            OMV name c
        }
    }

** Basic principles **

1. A general XML element is transformed to a Tcl command with the 
element name as command name, and the contents as a final "body" 
argument. Other arguments must occur in pairs and encode the 
attributes in the usual key–value style. Hence

   <element attr1="val1" attr2="val2"></element>

is equivalent to

   element attr1 val1 attr2 val2 {}

and also to

   element attr1 val1 attr2 val2

since parity allows recognising the case of a missing body argument.

2. A sequence of XML elements is transformed to a sequence of commands, 
i.e., to a script. Hence

    <element1/><element2 attr2="val2"/>

is equivalent to

    element1; element2 attr2 val2

and also to
   
    element1
    element2 attr2 val2

3. Command names which do not fit the XML syntax for element names
are used to express things other than elements and can have other 
syntaxes, e.g. arguments identified by position. In particular, 
the "/" command is used to encode character data in a sequence of 
elements. Thus

    <OMI>3</OMI>

can be encoded as

    OMI {/ 3}

"/" can have any number of arguments, which are concatenated in the manner of [append].

4. Elements can sometimes be given an alternative encoding, in the form
of a positional command. These commands will usually have a "/" 
prepended to the element name. The syntax is usually that required 
attributes (if any) and character data contents (if any) are 
mandatory arguments, but no general rule defining such commands 
exist; each must be defined explicitly.

Example: The OM elements
    <OMS cd="arith1" name="times"/>
    <OMV name="x"/>
    <OMI>5</OMI>
could be expressed as
    /OMS arith1 times
    /OMV x
    /OMI 5
but this would be context-dependent.


** Downside **

Some probably think it should be a criminal offense to invent a new syntax for what is almost XML, and there are no doubt interoperability problems lurking. On the other hand, if it allows me to be more productive, then I mostly think it is a good thing. 


** Basic operations **

For robust and informative handling of syntax errors, it would probably be necessary to use something like [parsetcl] to parse TDL, but if we're content with parsing valid TDL and throwing an error on the rest, then things are much easier. The trick is to parse TDL data by evaluating it in an empty slave interpreter.

======
namespace eval prettyTDL {
   interp create -safe [list [namespace current]::theinterp]
   theinterp hide namespace
   theinterp invokehidden namespace delete ::
}
======

The first operation on TDL code will be to prettyprint it. 
The central command point for prettyprinting is the '''prettyprint'''
procedure, which has the call syntax
    :   '''prettyTDL::prettyprint''' ''script'' ?''option'' ''value'' ...?
and returns the prettyprinting of the ''script''. The supported ''option''s are:
   -indent:   Basic indent string for the code block. Defaults to the empty string.
   -step:   Indent step, as a string to append to the -indent. Defaults to three spaces.

======
proc prettyTDL::prettyprint {script args} {
   set res ""
   array set O {-indent "" -step {   }}
   array set O $args
   theinterp eval $script
   return $res
}
======
The way the prettyprinting works is that each command appends the prettyprinted form of itself, preceded by the appropriate indentation and followed by a newline, to the local variable `res` in this procedure. The local array `O` has two entries `-indent` and `-step` which contain the current values of these parameters.

In most cases, that appending is taken care of by the [unknown] command in the slave interpreter, which is an alias to the following procedure in the master interpreter.
======
prettyTDL::theinterp alias unknown [namespace current]::prettyTDL::unknown
proc prettyTDL::unknown {name args} {
   upvar 1 res res O O
   switch -regexp -- $name {
      {^[[:alpha:]_:][[:alnum:]_:.-]*$} {
         if {[llength $args] % 2} then {
            set body [lindex $args end]
            set args [lreplace $args end end]
         } else {
            set body ""
         }
         append res $O(-indent) [linsert $args 0 $name]
         if {[regexp {\S} $body]} then {
            append res " \{\n" [
               prettyprint $body {*}[array get O]\
                 -indent $O(-indent)$O(-step)
            ] $O(-indent) \}
         }
         append res \n
      }
      default {
         append res $O(-indent) [linsert $args 0 $name] \n
      }
   }
}
proc prettyTDL::slash {args} {
   upvar 1 res res O(-indent) indent
   set L [list /]
   for {set i 0} {$i < [llength $args]} {incr i} {
      set n [string first \n [lindex $args $i]]
      if {$n<0} then {
         lappend L [lindex $args $i]
         continue
      }
      if {$n>0} then {
         lappend L [string range [lindex $args $i] 0 $n-1]
      }
      append res $indent $L { \n} \n
      set L [list /]
      lset args $i [string replace [lindex $args $i] 0 $n]
      incr i -1
   }
   if {[llength $L] > 1} then {
      append res $indent $L \n
   }
}
prettyTDL::theinterp alias / [namespace current]::prettyTDL::slash
======

For example:
 % prettyTDL::prettyprint {OMA {/OMS arith1 plus; /OMV a; /OMV b}}
 OMA {
    /OMS arith1 plus
    /OMV a
    /OMV b
 }


To be continued…


<<categories>> Category Data Serialization Format | Category XML