example glossary

'''vrtcl''', by [PYK], is an extensible Tcl notation for hierarchical data. It
is a work in progress, and its current state is '''embryonic'''. 


** Description **

'''vrtcl''' was conceived in the course of pondering [JSON], which bills itself
as a "[http://www.json.org/fatfree.html%|%fat-free alternative to] [XML]".
Although JSON is lean, Tcl syntax is leaner and cleaner.  Other Tcl formats
such as [huddle] exist, but vrtcl is different in some key characteristics.
The goal of vrtcl is to be more readable than other formats without sacrificing
expressivity, and also to be extensible.

The path to vertical began with a comparison between a JSON example and one
plausible Tcl equivalent:

'''JSON''':

======none
{
    "title": "Example Schema",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}
======


'''Tcl''':

======none
title {Example Schema}
type object
properties {
    firstName {
        type string
    }
    lastName {
        type string
    }
    age {
        description {Age in years}
        type integer
        minimum 0
    }
}
required {firstName lastName}
======

Here is another [http://www.json.org/example.html%|%example] from
json.org, in [XML], [JSON], and [Tcl]:

======none
<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
    <glossary><title>example glossary</title>
        <GlossDiv><title>S</title>
            <GlossList>
                <GlossEntry ID="SGML" SortAs="SGML">
                    <GlossTerm>Standard Generalized Markup Language</GlossTerm>
                    <Acronym>SGML</Acronym>
                    <Abbrev>ISO 8879:1986</Abbrev>
                    <GlossDef>
                        <para>A meta-markup language, used to create markup
languages such as DocBook.</para>
                        <GlossSeeAlso OtherTerm="GML">
                        <GlossSeeAlso OtherTerm="XML">
                    </GlossDef>
                    <GlossSee OtherTerm="markup">
                </GlossEntry>
            </GlossList>
        </GlossDiv>
    </glossary>
======

======none
{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "Acronym": "SGML",
                    "Abbrev": "ISO 8879:1986",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}
======

======none
glossary {
    title {example glossary}
    GlossDiv {
        title S
        GlossList {
            GlossEntry {
                ID SGML
                SortAs SGML
                GlossTerm {Standard Generalized Markup Language}
                Acronym SGML
                Abbrev {ISO 8879:1986}
                GlossDef {
                    para {
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }
                    GlossSeeAlso {GML XML}
                }
                GlossSee markup
            }
        }
    }
}
======


In the Tcl examples above, the format is that of a [script] where each
[command] takes exactly one argument and knows the [type] of that argument.
Even though [JSON] expresses types in its syntax, JSON data is usually also
consumed by a script which already knows what type to interpret the data as,
since it conforms to a scheme that is usually documented as part of a service
API. In most cases, the additional type syntax of [JSON] is not needed.

The first type information that comes to mind for the Tcl format is the ability
to distinguish between a [dodekalogue%|%word] and a `[list]`.  One approach in
the spirit of Tcl would be to interpret all values as lists, making sure that
where a single value is intended, it is formatted as a [list] containing one
item:

======none
{
    title {{Example Schema}}
    type object
    properties {
        firstName {
            type string
        }
        lastName {
            type string
        }
        age {
            description {{Age in years}}
            type integer
            minimum 0
        }
    }
    required {firstName lastName}
}
======

======none
glossary {
    title {{example glossary}}
    GlossDiv {
        title S
        GlossList {
            GlossEntry {
                ID SGML
                SortAs SGML
                GlossTerm {{Standard Generalized Markup Language}}
                Acronym SGML
                Abbrev {{ISO 8879:1986}}
                GlossDef {
                    para {{
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }}
                    GlossSeeAlso {GML XML}
                }
                GlossSee markup
            }
        }
    }
}
======

This is already a reasonable format to work with.  A more full-featured
notation would support type information and other metadata in an extensible
way.  In JSON, it is possible to differentiate between the string "3.14159" and
the number 3.14159, but all other numeric types are inferred by the format of
the number.  This is a hint that explicit typing is often not needed.  In JSON,
this numeric type inference stems from from the origins of JSON in
[Javascript].  Given that JSON data is typically described in the documentation
for an API, the explicit double-quote syntax to differentiate numbers and
strings is in practice probably always redundant with the documentation, and
therefore not really necessary.  Where type information is used by the
consuming program, it would probably always be trivial (excepting of course
political issues such as backwards compatibility and organizational inertia) to
add a field to the API to specify the type of the value, rather than relying on
the syntax of the serialization format.

[[TODO: Find real-world examples of programs that actually rely on this
distinction]]


The Tcl examples above are valid scripts in which each "[command]" takes
exactly one argument.  Where explicit typing is desired, a variation could be
employed in which a command given only one argument treats it as its list 
value, and when it is given two arguments, treats the first argument as
metadata, and the second as the value:


======none
glossary {
    title {{example glossary}}
    GlossDiv record {
        title S
        GlossList record {
            GlossEntry record {
                ID SGML
                SortAs SGML
                GlossTerm {{Standard Generalized Markup Language}}
                Acronym SGML
                Abbrev {ISO 8879:1986}
                GlossDef record {
                    para {{
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }}
                    GlossSeeAlso list {GML XML}
                }
                GlossSee markup
            }
        }
    }
}
======


** Specification **

The '''vrtcl''' [data format], which is a [dodekalogue%|%syntactically-valid]
Tcl [script], is specified as follows:

Unless otherwise noted, terminology defined in the [dodekalogue] has the syntax
described there, but sometimes with redefined semantics.

A '''document''' is an independent record.  A record is a [script] composed of
fields.  Each field is a [command] whose semantics depend on the number of
words in the command:

   0:   The field is unnamed. Its '''value''' is the first word of the field.

   1:   The first word is the '''name''' of the field, and second word is its value.

   2:   The first word is the name of the field, the second word is the '''metadata''' for the value, and the third word is the value, the type of which depends on the metadata.

The type of the metadata is `record`. If the first field of the metadata
contains only one word, the name of the field is `type`, and the value of the
field is the first word. If the second field ontains only one word, the name of
the field is `itype`, the value of the field is the first word.

The '''built-in''' metadata fields are: 

   '''`type`''':   the type of the value.  The default is `list`

   '''`itype`''':   the default `type` of item values in containers like `record` and `list`.  The default `itype` is `field`
   

The built-in types are:

   '''`record`''' or '''`rec`''' or '''`r`''':   The default itype is `field`.

   '''`field`''' or '''`fld`''' or '''`f`''':   A command composed of at most three elements. The type of the value of the field is specified by the `type` field of the metadata field.  The default `itype` is `record`.
   
   '''`word`''' or '''`w`''':   a word.

   '''`array`''' or '''`arr`''' or '''`a`''':   A record in which the fields are not named and a field is composed of at most two words.  In the two-word case, the first word is the metadata, and the second is the value.
    
    '''`[list]`''' or '''`lst`''' or '''`l`''':   A Tcl list.  The default itype is `list`.
    
    '''`null`''':   A `null` type has no value, and is written like this:

    '''`number`''' or '''`num`''' or '''`n`''':   a number in any format undersood by `[expr]`, except octal


`null` is written like this:

======
field1 null {}
======

or, in an `array`:

======
null {}
======

When the name of the first field of a document is '''`meta`''', the field is
the metadata for the document.

The '''built-in''' document metadata fields are:

   `types`:   a record in which each field describes the types of values that may be encountered in the document.  The name of each field is the name of the `type`

   `elements`:   a record in which each field describes the fields that may appear in the document.
   

The built-in fields of a metadata `types` record are:

   `defaults`:   the default values for the metadata fields of the type.


The built-in fields of a metadata `elements` record are

   `contains`:   a record describing fields that may appear with the field


** Examples **

======
#! /bin/env tclsh

#this is a vrtcl document

transaction {
    location 83391
    customer 17611
    date {2014 05 13}
    time 07:23:00
    items list {eggs milk}
    #also valid:
    #items {{eggs milk}}
    amount 8.32
}

transaction {
    location 16912
    customer 17611
    date {2014 07 13}
    time 18:47:17
    items list {donuts {ice cream}}
    #also valid:
    #items {{donuts {ice cream}}}
    amount 14.71 
}
======


======none
#! /bin/env tclsh

#This is a vrtcl document

META {
    types {
        #change the `itype` of record to `array`
        record {
            defaults {
                itype array
            }
        }

        #change the `itype` for list to command
        list {defaults {itype command}}

    }
}

#the default itype for a record is now "array", so there are no record names 

{one two {{three four}}}
#if the itype of list had not been changed to command, it would have been:
#{one two {three four}}
record {
    name Capaneus
    son Sthenelus
}

======


----


One advantage of vrtcl over [huddle] is that there is no need for a keyword
like `HUDDLE` in the data.  Values that look like `vrtcl` data can be given the
`word` or `list` types, and those values will not be misinterpreted as part of
the structure of the vrtcl record.  vrtcl records can be composed of other
vrtcl records.

In [huddle] the type of each value is explicit in the notation.  vrtcl relies
on its rules to obviate the need for explicit type notation in the common
cases. 

Here are some comparisons with [huddle]:

======
#huddle
HUDDLE {D {
    a {s b} c {s d}}}

#vrtcl
{} {a b; c d}


#huddle
HUDDLE {
    L {{s e} {s f} {s g} {s h}}}

#vertcl
{e f g h}

#vertcl, declaring the elements as `type` word
{} {list; word} {e f g h}

#vertcl, when in an array
list {e f g h}

#vertcl, each element having its own type
{} record {
    {} word e
    {} number 3.14159
    {} list {one two {three four}}
    #the same as above
    {} {one two {three four}}
    #the same as a above
    {} {list; field} {one two {{three four}}}
}

#vertcl, each element having its own type, `array` syntax

{} array {
    word e
    number 3.14159
    list {one two {{three four}}}
    #the same as above
    {one two {three four}}
    #the same as above
    {list; field} {one two {{three four}}}
}


#huddle
HUDDLE {D {bb {D {a {s b} c {s d}}} cc {L {{s e} {s f} {s g} {s h}}}}}

#vertcl
{} {bb {a b; c d}; cc list {e f g h}}

#HUDDLE
{L {
    {D {
        bb {
            D {
                a {s b}
                c {s d}}}
        cc {L {
                {s e} {s f} {s g} {s h}}}}}
    {s p}
    {L {{s q} {s r}}}
    {s s}}}

#vertcl
{} array {
    record {
        bb {
            a b
            c d
        }
        cc list {e f g h}
    }
    p
    {q r}
    s
}

HUDDLE {D {a {L {{D {c {s 1}}} {D {d {L {{s 2} {s 2} {s 2}}} e {s 3}}}}} b {L {{D {f {s 4} g {s 5}}}}}}}

#vertcl
{} {
    a array {
        record {
            c 1
        }
        record {
            d {2 2 2}
            e 3
        }
    }
    b array {
        record {
            f 4
            g 5
        }
    }
}

#vrtcl, using lists instead of arrays
#vertcl
{} {
    a {list; record} {
        {c 1}
        {d {2 2 2}; {e 3}}
    }
    b {list; record} {
        { f 4; g 5 }
    }
}
======


** Questions and Comments **

[AMG]: Is this called vrtcl or vertcl?  Both spellings appear on this page.  Do these two terms refer to distinct yet related concepts?

----
[AMG]: ''"Each field is a command whose semantics depend on the number of words in the command:"''  Customary Tcl terminology counts the name of the command as the first word of the command.  Here, you probably mean to say arguments instead of words.

----
[AMG]: How does this compare to [TDL]?  See also the configuration file format I derived from TDL and implemented in [Config file using slave interp].

I'll answer my own question.  TDL and my derived format target [XML] (including attributes), whereas you target JSON (which lacks XML-like attributes).  When attributes are not used, my format matches your basic specification, though not your explicit type tagging and extra list encoding variations.

----
[AMG]: I have no JSON experience, so I would very much like to know whether it's possible for a program to make decisions based on the ''type'' of a value.  This is essentially the same question as whether or not explicit type tagging has any value.  You argue that it does not ([EIAS] and all that), but then proceed to present a way to incorporate type tagging, with only a TODO note to research whether or not there's any reason to do so.  Perhaps someone else could fill us in.

Like you, I argue that inline type tagging is not useful because the consumer already knows what types to expect.  In other words, the schema is embedded in the program itself.

That's not to say explicit schemas aren't useful.  They are good documentation, they make it possible for a generic program to validate a document, and they may assist compression.  But merely embedding types in the document does not constitute a true schema.  It cannot protect against misspelled field names, and it fails to document data relationships and unused features.  So explicit schemas really ought to be external documents.  In short, I see no benefit to embedding type tags in this application.

Well, actually, I do see one benefit, and maybe this is what you had in mind.  Since other formats require type tags, and since it's generally impossible to unambiguously identify the "type" of a Tcl word, compatibility with other formats requires type tagging.  That type tagging can go inline or in an explicit schema.

As you may guess from what I said about schemas, I prefer the latter approach, though I freely admit that the former is much easier to implement.  There is a third approach, which is to write a program that implicitly embeds a schema and whose purpose is conversion.  This surely works, but I think of it as a one-off sort of solution, whereas this page is dedicated to describing a general approach, so again I think an explicit schema would be more appropriate.

----
[AMG]: What is the point of applying an extra level of list encoding to each command's argument?  It's already a single word on account of being one argument.  I don't see how this helps in disambiguating its type, if that's even a goal (see above).

----
[AMG]: The format you present is script-oriented, meaning that you use newlines as field separators.  Do you also allow semicolons?  How about substitutions?  Comments?  Double quotes instead of braces?  Backslashes?  Blank lines?  Is whitespace significant, e.g. indenting?  What about non-data commands such as looping and conditionals, such as those I demonstrate in [Config file using slave interp]?

Except for those showing type tagging, all the examples you present could instead be viewed as [dict]s, since newlines work as list/dict element separators just as well as any other form of whitespace.  What benefit does your approach have over nested dicts?

** Page Authors **

   [PYK]:   


<<categories>> JSON | data format