Version 0 of vertcl

Updated 2014-08-02 15:58:08 by pooryorick

vrtcl, by PYK, is an extensible Tcl notation for hierarchical data. It is a work in progress, and its current state is embryonic.

Description

vrtcl was conceived in the course of pondering JSON, which bills itself as a "fat-free alternative to XML". Although JSON is lean, Tcl syntax is leaner and cleaner. Other Tcl formats such as huddle exist, but vrtcl is different in some key characteristics. The goal of vrtcl is to be more readable than other formats without sacrificing expressivity, and also to be extensible.

The path to vertical began with a comparison between a JSON example and one plausible Tcl equivalent:

JSON:

{
    "title": "Example Schema",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}

Tcl:

title {Example Schema}
type object
properties {
    firstName {
        type string
    }
    lastName {
        type string
    }
    age {
        description {Age in years}
        type integer
        minimum 0
    }
}
required {firstName lastName}

Here is another example from json.org, in XML, JSON, and Tcl:

<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
    <glossary><title>example glossary</title>
        <GlossDiv><title>S</title>
            <GlossList>
                <GlossEntry ID="SGML" SortAs="SGML">
                    <GlossTerm>Standard Generalized Markup Language</GlossTerm>
                    <Acronym>SGML</Acronym>
                    <Abbrev>ISO 8879:1986</Abbrev>
                    <GlossDef>
                        <para>A meta-markup language, used to create markup
languages such as DocBook.</para>
                        <GlossSeeAlso OtherTerm="GML">
                        <GlossSeeAlso OtherTerm="XML">
                    </GlossDef>
                    <GlossSee OtherTerm="markup">
                </GlossEntry>
            </GlossList>
        </GlossDiv>
    </glossary>
{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}
glossary {
    title {example glossary}
    GlossDiv {
        title S
        GlossList {
            GlossEntry {
                ID SGML
                SortAs SGML
                GlossTerm {Standard Generalized Markup Language}
                Acronym SGML
                Abbrev {ISO 8879:1986}
                GlossDef {
                    para {
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }
                    GlossSeeAlso {GML XML}
                }
                GlossSee markup
            }
        }
    }
}

In the Tcl examples above, the format is that of a script where each command takes exactly one argument and knows the type of that argument. Even though JSON expresses types in its syntax, JSON data is usually also consumed by a script which already knows what type to interpret the data as, since it conforms to a scheme that is usually documented as part of a service API. In most cases, the additional type syntax of JSON is not needed.

The first type information that comes to mind for the Tcl format is the ability to distinguish between a word and a list. One approach in the spirit of Tcl would be to interpret all values as lists, making sure that where a single value is intended, it is formatted as a list containing one item:

{
    title {{Example Schema}}
    type object
    properties {
        firstName {
            type string
        }
        lastName {
            type string
        }
        age {
            description {{Age in years}}
            type integer
            minimum 0
        }
    }
    required {firstName lastName}
}
glossary {
    title {{example glossary}}
    GlossDiv {
        title S
        GlossList {
            GlossEntry {
                ID SGML
                SortAs SGML
                GlossTerm {{Standard Generalized Markup Language}}
                Acronym SGML
                Abbrev {{ISO 8879:1986}}
                GlossDef {
                    para {{
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }}
                    GlossSeeAlso {GML XML}
                }
                GlossSee markup
            }
        }
    }
}

This is already a reasonable format to work with. A more full-featured notation would support type information and other metadata in an extensible way. In JSON, it is possible to differentiate between the string "3.14159" and the number 3.14159, but all other numeric types are inferred by the format of the number. This is a hint that explicit typing is often not needed. In JSON, this numeric type inference stems from from the origins of JSON in Javascript. Given that JSON data is typically described in the documentation for an API, the explicit double-quote syntax to differentiate numbers and strings is in practice probably always redundant with the documentation, and therefore not really necessary. Where type information is used by the consuming program, it would probably always be trivial (excepting of course political issues such as backwards compatibility and organizational inertia) to add a field to the API to specify the type of the value, rather than relying on the syntax of the serialization format.

TODO: Find real-world examples of programs that actually rely on this distinction

The Tcl examples above are valid scripts in which each "command" takes exactly one argument. Where explicit typing is desired, a variation could be employed in which a command given only one argument treats it as its list value, and when it is given two arguments, treats the first argument as metadata, and the second as the value:

glossary {
    title {{example glossary}}
    GlossDiv record {
        title S
        GlossList record {
            GlossEntry record {
                ID SGML
                SortAs SGML
                GlossTerm {{Standard Generalized Markup Language}}
                Acronym SGML
                Abbrev {ISO 8879:1986}
                GlossDef record {
                    para {{
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }}
                    GlossSeeAlso list {GML XML}
                }
                GlossSee markup
            }
        }
    }
}

Specification

The vrtcl data format, which is a syntactically-valid Tcl script, is specified as follows:

Unless otherwise noted, terminology defined in the dodekalogue has the syntax described there, but sometimes with redefined semantics.

A document is an independent record. A record is a script composed of fields. Each field is a command whose semantics depend on the number of words in the command:

0
The field is unnamed. Its value is the first word of the field.
1
The first word is the name of the field, and second word is its value.
2
The first word is the name of the field, the second word is the metadata for the value, and the third word is the value, the type of which depends on the metadata.

The type of the metadata is record. If the first field of the metadata contains only one word, the name of the field is type, and the value of the field is the first word. If the second field ontains only one word, the name of the field is itype, the value of the field is the first word.

The built-in metadata fields are:

type
the type of the value. The default is list
itype
the default type of item values in containers like record and list. The default itype is field

The built-in types are:

record or rec or r
The default itype is field.
field or fld or f
A command composed of at most three elements. The type of the value of the field is specified by the type field of the metadata field. The default itype is record.
word or w
a word.
array or arr or a
A record in which the fields are not named and a field is composed of at most two words. In the two-word case, the first word is the metadata, and the second is the value.
list or lst or l
A Tcl list. The default itype is list.
null
A null type has no value, and is written like this:
number or num or n
a number in any format undersood by expr, except octal

null is written like this:

field1 null {}

or, in an array:

null {}

When the name of the first field of a document is meta, the field is the metadata for the document.

The built-in document metadata fields are:

types
a record in which each field describes the types of values that may be encountered in the document. The name of each field is the name of the type
elements
a record in which each field describes the fields that may appear in the document.

The built-in fields of a metadata types record are:

defaults
the default values for the metadata fields of the type.

The built-in fields of a metadata elements record are

contains
a record describing fields that may appear with the field

Examples

#! /bin/env tclsh

#this is a vrtcl document

transaction {
    location 83391
    customer 17611
    date {2014 05 13}
    time 07:23:00
    items list {eggs milk}
    #also valid:
    #items {{eggs milk}}
    amount 8.32
}

transaction {
    location 16912
    customer 17611
    date {2014 07 13}
    time 18:47:17
    items list {donuts {ice cream}}
    #also valid:
    #items {{donuts {ice cream}}}
    amount 14.71 
}
#! /bin/env tclsh

#This is a vrtcl document

META {
    types {
        #change the `itype` of record to `array`
        record {
            defaults {
                itype array
            }
        }

        #change the `itype` for list to command
        list {defaults {itype command}}

    }
}

#the default itype for a record is now "array", so there are no record names 

{one two {{three four}}}
#if the itype of list had not been changed to command, it would have been:
#{one two {three four}}
record {
    name Capaneus
    son Sthenelus
}

One advantage of vrtcl over huddle is that there is no need for a keyword like HUDDLE in the data. Values that look like vrtcl data can be given the word or list types, and those values will not be misinterpreted as part of the structure of the vrtcl record. vrtcl records can be composed of other vrtcl records.

In huddle the type of each value is explicit in the notation. vrtcl relies on its rules to obviate the need for explicit type notation in the common cases.

Here are some comparisons with huddle:

#huddle
HUDDLE {D {
    a {s b} c {s d}}}

#vrtcl
{} {a b; c d}


#huddle
HUDDLE {
    L {{s e} {s f} {s g} {s h}}}

#vertcl
{e f g h}

#vertcl, declaring the elements as `type` word
{} {list; word} {e f g h}

#vertcl, when in an array
list {e f g h}

#vertcl, each element having its own type
{} record {
    {} word e
    {} number 3.14159
    {} list {one two {three four}}
    #the same as above
    {} {one two {three four}}
    #the same as a above
    {} {list; field} {one two {{three four}}}
}

#vertcl, each element having its own type, `array` syntax

{} array {
    word e
    number 3.14159
    list {one two {{three four}}}
    #the same as above
    {one two {three four}}
    #the same as above
    {list; field} {one two {{three four}}}
}



#huddle
HUDDLE {D {bb {D {a {s b} c {s d}}} cc {L {{s e} {s f} {s g} {s h}}}}}

#vertcl
{} {bb {a b; c d}; cc list {e f g h}}

#HUDDLE
{L {
    {D {
        bb {
            D {
                a {s b}
                c {s d}}}
        cc {L {
                {s e} {s f} {s g} {s h}}}}}
    {s p}
    {L {{s q} {s r}}}
    {s s}}}

#vertcl
{} array {
    record {
        bb {
            a b
            c d
        }
        cc list {e f g h}
    }
    p
    {q r}
    s
}

HUDDLE {D {a {L {{D {c {s 1}}} {D {d {L {{s 2} {s 2} {s 2}}} e {s 3}}}}} b {L {{D {f {s 4} g {s 5}}}}}}}

#vertcl
{} {
    a array {
        record {
            c 1
        }
        record {
            d {2 2 2}
            e 3
        }
    }
    b array {
        record {
            f 4
            g 5
        }
    }
}

#vrtcl, using lists instead of arrays
#vertcl
{} {
    a {list; record} {
        {c 1}
        {d {2 2 2}; {e 3}}
    }
    b {list; record} {
        { f 4; g 5 }
    }
}

Page Authors

PYK