vrtcl

Difference between version 32 and 33 - Previous - Next
'''vrtcl''', by [PYK], is an extensible Tcl notation for hireprarchesenticalng dan
[tree%|%ordered tree] of [value%|%values]. It
 is a work in progress, and its 
current state is '''embryonic'''. 



** Description **
'''vrtcl''' is a record-centric [data format] that seamlessly integrates Tcl
values and their various interpretations.  It supports type indicators for
values, but makes it easy to avoid them when they aren't wanted or needed.  The
goal of vrtcl is to be more readable and concise than other formats without
sacrificing expressivity, and also to be extensible.

vrtcl was conceived as an alternative to [JSON], which bills itself as a
"[http://www.json.org/fatfree.html%|%fat-free alternative to] [XML]".  Although
JSON is lean, Tcl syntax is leaner and cleaner.  Other Tcl formats such as
[huddle] and [TDL] exist, but vrtcl is different in some key characteristics.

One of the principle ideas behind this format is that while container formats
such as lists or records need to be specified somehow, data types for
scalar values do not.  For example, JSON provides syntax to differentiate
between strings and numbers, but the programs that use the values as data are
written in a type-aware manner, and the extra syntax ends up as a useless
artifact that arose due to the tight relationship between JSON and Javascript. 



** Specification **

Unless otherwise noted, terminology defined in the [dodekalogue%|%Tcl rules]
carries the meaning defined there. 


*** Scripts ***

The '''vrtcl''' [data format] is a [dodekalogue%|%syntactically-valid] Tcl
[script], and is called a '''`record`'''.  Each command in the record is
interpreted according to its type, which by default, is '''`field`'''.

A comment whose first non-whitespace character is `!` is a directive.  The
following directives are built-in:

   '''`PUSH`''':   the next command is interpreted as a information record that applies to the remainder of the record.

   '''`POP`''':   revert to the previous information



*** Built-in Command Types ***

   '''`field`''' or '''`fld`''' or '''`f`''':   A command composed of at most three words, as described in '''Fields'''.
   
   '''`cell`''' or '''`cel`''' or '''`c`''':   A command composed of one word which is the '''value''', or composed of two words which are the information record and the value, respectively.



*** Fields ***

The number of words in a field determine the interpretation of the words:

   '''one''':   The word is the '''value'''.

   '''two''':   The first word is the '''name''' of the field, and second word is its value.

   '''three''':   The first word is the name of the field, the second word is the '''information record''' for the field, the third word is the value.  A information record is the second word rather than the third to facilitate parsing a field incrementally from a stream without reading the entire value first.


*** Values ***

The type of a value may be specified by an applicable information record. If
there is no type indicated by an applicable information record, the value is
not interpreted.

The built-in types available for use in information records are:

    '''`number`''' or '''`num`''' or '''`n`''':   A number . Interpreted as per `[expr]`.

   '''`value`''' or '''`val`''' or '''`v`''':   A value with no interpretation.

    '''`[list]`''' or '''`lst`''' or '''`l`''':   A Tcl list.

    '''`[dict]`''' or '''`lst`''' or '''`l`''':   A Tcl dictionary.
    
    '''`null`''' or '''`nul`''' or '''`z`''':   A `null` type has no value.  See '''Null''' below.

    '''`record`''' or '''`rec`''':   An embedded vrtcl record.

    '''`reference`''' or '''`ref`''':   A reference to another command.

    '''`computed`''':   A computed value.  The given value is used by the system to compute the final value. 


*** References ***

A reference is a record that locates another command or sequence of commands in
the record hierarchy.  If a reference is a script substitution, the system
executes the script to resolve the reference. Otherwise, the value is a system
identifier for another command.


*** Information Record ***

A information record takes effect prior to processing of any value it
accompanies, and specifies, among other things, the interpretation the value.
A information record that accompanies a PUSH directive is in effect for the
remaining commands in the record. 

If the first command of the information record contains only one word and
that word is a number as understood by `[expr]` or if the entire information record
consists of only one field and the value is not an
type known by the system, that value is a system
identifier for the value that the information record accompanies.  Otherwise,
if the first command of the information record contains only one word, it is
a field whose name is `type`, and whose value is the word. If the second
command contains only one word, it is a field whose name is `ctype` and whose
value is that word.

The '''built-in''' information fields are: 
   

The built-in fields of a information `type` field are:

   '''`ctype`''':   The default type for commands in the record.

   '''`contains`''':   a record in which fields identify which elements can occur in the value, along with constraints on its the occurrence.

The built-in fields of a information `elements` record are

   '''`type`''':   



*** Default information record ***

What follows is the default information record for a record.  A vrtcl
interpreter is initialized with this information.  The dictionary indicates
the names of fields that may appear in the information record, and what they
signify. 

======
dictionary {
    ctype {
        description {
            The type of the items in a container such as a record.
        }
    }

    type {
        description {
           The type any value the information record is applied to.
        }
    }

    types {
        description {
           A record in which each field is a describes the types of values of items
           in the record The name of each field is the name of the described
           `type`.
        }
    }

    id {
        description {
            A system identifier for the field this record is part of.
        }
    }

    ref {
        description {
            A reference to another record, which is to be considered the actual
            value.  In this case, the value for the current record is simply a
            presentational label for the referenced value.
        }
    }

    triples {
        description {
            If true, indicates that all commands in values of type "record"
            contain exactly three words.  This allows the parser to efficiently
            process records in a stream.  Commands must then contain three
            words.
        }
    {

}

type record
ctype fild

# Except for record, whose default type is "field", all other values are by
# default implicit.


directives {

    PUSH {
        }

    POP {
        }


}
types {
    cell {
        type record
            alias c
    }
    field {
        type record 
            alias f
    }
    list {
            alias l
    }
    number {
        alias {num n}
    }
    null {
        alias {nul u}
    }
    record {
        alias {rec r}
    }
    referene {
        alias ref
    }
    value {
        alias {val v}
    }
}
======



*** Null ***

A null value is represented like this:

======
field1 null {this value is discarded}
======

or, for a `cell`:

======
null {this value is discarded}
======



** Examples **

======
#! /bin/env tclsh

#this is a vrtcl document

transaction rec {
    location 83391
    customer 17611
    date list {2014 05 13}

    # alternatively, a value that is not a dictionary, and thus can't be
    # interpreted as a sub-record:
    date {{2014 05 13}}

    time 07:23:00
    items list {eggs milk}

    # also valid, but since it isn't explicit that the value is a list, its
    # interpretation is open:
    items {{eggs milk}}
    amount 8.32
}

transaction rec {
    location 16912
    customer 17611
    date list {2014 07 13}
    time 18:47:17
    items list {donuts {ice cream}}

    # also valid, but since it isn't explicit that the value is a list, its
    # interpretation is open:
    items {{donuts {ice cream}}}
    amount 14.71 
}
======


======none
#! /bin/env tclsh

#! PUSH

#This is a vrtcl document

{
    types {
        #change the `ctype` of record to `cell`
        record {
            ctype cell
        }

    }
}

#the ctype for a record is now "cell", so there are no record names

list {one two {three four}}

#also valid, but it isn't explicit that the value is a list:
{one two {three four}}

rec {
    name Capaneus
    son Sthenelus
}

======


----


One advantage of vrtcl over [huddle] is that there is no need for a keyword
like `HUDDLE` in the data.  vrtcl records can be composed of other
vrtcl records.  Values that look like `vrtcl` data can be given the
`value` or `list` types, so that they are not interpreted as part of
the structure of the vrtcl record.

In [huddle], the type of each value is explicit in the notation.  vrtcl relies
on its rules to obviate the need for explicit type notation in the common
cases. 

Here are some comparisons with [huddle]:

======
# huddle:
HUDDLE {D {
    a {s b} c {s d}}}

# vrtcl:
a b; c d

# vrtcl The rules dictate that the value is a record:
{my data} {a b; c d}

# vrtcl, explictly indicate that the value is a record:
{my data} rec {a b; c d}


#Or change the default type of value to "list":

#! PUSH
{
    types {
        field {
            type list
        }
    }
}
# Since default type is now list, this value contains four items, and the last
# character of the second item is a semicolon:
{my data} {a b; c d}

### next example ###


#huddle
HUDDLE {
    L {{s e} {s f} {s g} {s h}}}

#vrtcl
{} list {e f g h}

#vrtcl, if the record ctype is "cell"
list {e f g h}

#vrtcl, an uninterpreted value
{{e f g h}}


#vrtcl, each element having its own type, the the empty string as its name
{} {
    {} word e
    {} number 3.14159

    # An explicit list:
    {} list {one two {three four}}

    #the same as above, but the list is implicit
    {} {{one two {three four}}}

        # A value with no name, that isn't a sub-record, and is open to
        # interpretation:
    {{one two {three four}}}

}

#or, if the record ctype is "cell": 

{} {ctype cell} {
    word e
    number 3.14159

    list {one two {three four}}

        # or, with no interpretation
    {{one two {three four}}}

}

### next example ###


# huddle
HUDDLE {D {bb {D {a {s b} c {s d}}} cc {L {{s e} {s f} {s g} {s h}}}}}

# vrtcl:
bb {a b; c d}; cc {{e f g h}}

# vrtcl, explicit value types:
bb rec {a b; c d}; cc list {e f g h}

## next example ##

# huddle:
HUDDLE {L {
    {D {
        bb {
            D {
                a {s b}
                c {s d}}}
        cc {L {
                {s e} {s f} {s g} {s h}}}}}
    {s p}
    {L {{s q} {s r}}}
    {s s}}}


# vrtcl:
{
    bb {
        a b
        c d
    }
    cc list {e f g h}
}
p
{{q r}}
s

# vrtcl, more explicit:
#! PUSH
{
    ctype Cell
}
rec {
    bb {
        a b
        c d
    }
    cc list {e f g h}
}
p
list {q r}
s

## Next example ##

HUDDLE {D {a {L {{D {c {s 1}}} {D {d {L {{s 2} {s 2} {s 2}}} e {s 3}}}}} b {L {{D {f {s 4} g {s 5}}}}}}}

#vrtcl
a {c 1; {d {{2 2 2}}; e 3}; b {f 4; g 5}}


#vrtcl, same as above, with newlines and indentation
a {
    c 1
    {d {{2 2 2}}; e 3}
    b {f 4; g 5}
}
======



** Development **

'' '''Note:'''  The examples in this section do not conform to the current
specification of vrtcl, but describe what came up in the course of its
design. ''

The path to vrtcl began with a comparison between a JSON example and one
plausible Tcl equivalent:

'''JSON''':

======none
{
    "title": "Example Schema",
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string"
        },
        "lastName": {
            "type": "string"
        },
        "age": {
            "description": "Age in years",
            "type": "integer",
            "minimum": 0
        }
    },
    "required": ["firstName", "lastName"]
}
======


'''Tcl''':

======none
title {Example Schema}
type object
properties {
    firstName {
        type string
    }
    lastName {
        type string
    }
    age {
        description {Age in years}
        type integer
        minimum 0
    }
}
required {firstName lastName}
======

Here is another [http://www.json.org/example.html%|%example] from
json.org, in [XML], [JSON], and [Tcl]:

======none
<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
    <glossary><title>example glossary</title>
        <GlossDiv><title>S</title>
            <GlossList>
                <GlossEntry ID="SGML" SortAs="SGML">
                    <GlossTerm>Standard Generalized Markup Language</GlossTerm>
                    <Acronym>SGML</Acronym>
                    <Abbrev>ISO 8879:1986</Abbrev>
                    <GlossDef>
                        <para>A meta-markup language, used to create markup
languages such as DocBook.</para>
                        <GlossSeeAlso OtherTerm="GML">
                        <GlossSeeAlso OtherTerm="XML">
                    </GlossDef>
                    <GlossSee OtherTerm="markup">
                </GlossEntry>
            </GlossList>
        </GlossDiv>
    </glossary>
======

======none
{
    "glossary": {
        "title": "example glossary",
        "GlossDiv": {
            "title": "S",
            "GlossList": {
                "GlossEntry": {
                    "ID": "SGML",
                    "SortAs": "SGML",
                    "GlossTerm": "Standard Generalized Markup Language",
                    "GlossDef": {
                        "para": "A meta-markup language, used to create markup languages such as DocBook.",
                        "GlossSeeAlso": ["GML", "XML"]
                    },
                    "GlossSee": "markup"
                }
            }
        }
    }
}
======

======none
glossary {
    title {example glossary}
    GlossDiv {
        title S
        GlossList {
            GlossEntry {
                ID SGML
                SortAs SGML
                GlossTerm {Standard Generalized Markup Language}
                Acronym SGML
                Abbrev {ISO 8879:1986}
                GlossDef {
                    para {
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }
                    GlossSeeAlso {GML XML}
                }
                GlossSee markup
            }
        }
    }
}
======


In the Tcl examples above, the format is that of a [script] where each
[command] takes exactly one argument and knows the [type] of that argument.
Even though [JSON] expresses types in its syntax, JSON data is usually also
consumed by a script which already knows what type to interpret the data as,
since it conforms to a scheme that is usually documented as part of a service
API. In most cases, the additional type syntax of [JSON] is not needed.

The first type information that comes to mind for the Tcl format is the ability
to distinguish between a [dodekalogue%|%word] and a `[list]`.  One approach in
the spirit of Tcl would be to interpret all values as lists, making sure that
where a single value is intended, it is formatted as a [list] containing one
item:

======none
{
    title {{Example Schema}}
    type object
    properties {
        firstName {
            type string
        }
        lastName {
            type string
        }
        age {
            description {{Age in years}}
            type integer
            minimum 0
        }
    }
    required {firstName lastName}
}
======

======none
glossary {
    title {{example glossary}}
    GlossDiv {
        title S
        GlossList {
            GlossEntry {
                ID SGML
                SortAs SGML
                GlossTerm {{Standard Generalized Markup Language}}
                Acronym SGML
                Abbrev {{ISO 8879:1986}}
                GlossDef {
                    para {{
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }}
                    GlossSeeAlso {GML XML}
                }
                GlossSee markup
            }
        }
    }
}
======

This is already a reasonable format to work with.  A more full-featured
notation would support type information and other metadata in an extensible
way.  In JSON, it is possible to differentiate between the string "3.14159" and
the number 3.14159, but all other numeric types are inferred by the format of
the number.  This is a hint that explicit typing is often not needed.  In JSON,
this numeric type inference stems from from the origins of JSON in
[Javascript], which only provides one numeric type.  Given that JSON data is typically described in the documentation
for an API, the explicit double-quote syntax to differentiate numbers and
strings is in practice probably always redundant with the documentation, and
therefore not really necessary.  Where type information is used by the
consuming program, it would probably always be trivial (excepting of course
political issues such as backwards compatibility and organizational inertia) to
add a field to the API to specify the type of the value, rather than relying on
the syntax of the serialization format.

[[TODO: Find real-world examples of programs that actually rely on this
distinction]]



The Tcl examples above are valid scripts in which each "[command]" takes
exactly one argument.  Where explicit typing is desired, a variation could be
employed in which a command given only one argument treats it as its list 
value, and when it is given two arguments, treats the first argument as
metadata, and the second as the value:


======none
glossary {
    title {{example glossary}}
    GlossDiv record {
        title S
        GlossList record {
            GlossEntry record {
                ID SGML
                SortAs SGML
                GlossTerm {{Standard Generalized Markup Language}}
                Acronym SGML
                Abbrev {ISO 8879:1986}
                GlossDef record {
                    para {{
                        A meta-markup language, used to create markup
                        languages such as DocBook.
                    }}
                    GlossSeeAlso list {GML XML}
                }
                GlossSee markup
            }
        }
    }
}
======



** Questions and Comments **

[AMG]: Is this called vrtcl or vertcl?  Both spellings appear on this page.  Do these two terms refer to distinct yet related concepts?

[PYK] 2014-08-03: I was playing with both names, and I guess my mind hadn't
settled on one. I'm going to go with with "vrtcl", which will entail changing
the name of this page.  I just changed all spellings on this page to "vrtcl".

----

[AMG]: ''"Each field is a command whose semantics depend on the number of words in the command:"''  Customary Tcl terminology counts the name of the command as the first word of the command.  Here, you probably mean to say arguments instead of words.

[PYK] 2014-08-03: the vrtcl specification avoids the use of "arguments", using
"words" instead, and "first word" refers to the word in the position that
normally names a command.  This aligns with [dodekalogue%|%rule 2], which
defines the name of the command as the first word in the command.  I note
though that rule 2 is ambiguous or even contradictory, saying that the first
word is used to locate a [command procedure], but then immediately stating that all
words of the command are passed to the procedure, which is of course not true
if the command name is considered one of the words of the command.  Perhaps
rule 2 should be amended to state that "all of the remaining words are are
passed to the command procedure".

----

[AMG]: How does this compare to [TDL]?  See also the configuration file format I derived from TDL and implemented in [Config file using slave interp].

I'll answer my own question.  TDL and my derived format target [XML] (including attributes), whereas you target JSON (which lacks XML-like attributes).  When attributes are not used, my format matches your basic specification, though not your explicit type tagging and extra list encoding variations.

[PYK]: Yes. TDL riffs on [XML], supporting attributes as its metadata
feature, but vrtcl provides a full isomorphic metadata mechanism.

----

[AMG]: I have no JSON experience, so I would very much like to know whether it's possible for a program to make decisions based on the ''type'' of a value.  This is essentially the same question as whether or not explicit type tagging has any value.  You argue that it does not ([EIAS] and all that), but then proceed to present a way to incorporate type tagging, with only a TODO note to research whether or not there's any reason to do so.  Perhaps someone else could fill us in.

Like you, I argue that inline type tagging is not useful because the consumer already knows what types to expect.  In other words, the schema is embedded in the program itself.

That's not to say explicit schemas aren't useful.  They are good documentation, they make it possible for a generic program to validate a document, and they may assist compression.  But merely embedding types in the document does not constitute a true schema.  It cannot protect against misspelled field names, and it fails to document data relationships and unused features.  So explicit schemas really ought to be external documents.  In short, I see no benefit to embedding type tags in this application.

Well, actually, I do see one benefit, and maybe this is what you had in mind.  Since other formats require type tags, and since it's generally impossible to unambiguously detect the "type" of a Tcl word, compatibility with other formats requires type tagging.  That type tagging can go inline or in an explicit schema.

As you may guess from what I said about schemas, I prefer the latter approach, though I freely admit that the former is much easier to implement.  There is a third approach, which is to write a program that implicitly embeds a schema and whose purpose is conversion.  This surely works, but I think of it as a one-off sort of solution, whereas this page is dedicated to describing a general approach, so again I think an explicit schema would be more appropriate.

... I hate to say it, but there is a fourth way to go.  PLEASE do not use this.  [[[tcl::unsupported::representation]]] tells you what [C]-based type the script most recently wanted any given value to be.  Again, don't do this, since it will result in a very brittle and unpredictable program, requiring the end user to do anti-Tcl contortions to get the desired results, and even then with surprises.  The only reason I mention this at all (aside from taking the opportunity to warn against it) is because a real-world Tcl extension does it: [tcom].  And it is sometimes very shocking [http://wiki.tcl.tk/1821#pagetoc962b3244].

----

[AMG]: What is the point of applying an extra level of list encoding to each command's argument?  It's already a single word on account of being one argument.  I don't see how this helps in disambiguating its type, if that's even a goal (see above).

[PYK] 2014-10-15: the default types have been tuned since AMG made this
comment, so the extra level of braces doesn't show up as much.  It still can,
depending what types one sets for a value.  More detail below, and through the
spec and examples.

[PYK] 2015-06-07:  At the point the syntax and semantics of vrtcl have changed
a bit, and the extra levels of list encoding have all dropped out of the
examples.

----

[AMG]: The format you present is script-oriented, meaning that you use newlines as field separators.  Do you also allow semicolons?  How about substitutions?  Comments?  Double quotes instead of braces?  Backslashes?  Blank lines?  Is whitespace significant, e.g. indenting?  What about non-data commands such as looping and conditionals, such as those I demonstrate in [Config file using slave interp]?

Except for those showing type tagging, all the examples you present could instead be viewed as [dict]s, since newlines work as list/dict element separators just as well as any other form of whitespace.  What benefit does your approach have over nested dicts?

----

[PYK] 2014-08-02:  Replying to all [AMG%|%AMG's] comments in one swoop:  

Some of the rules and examples have changed overnight, making some of AMG's
comments appear a bit non-sequitor.  Sorry about that.

The
[http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html%|%Amazon
DynamoDB API Reference] selects a comparison mode based on the type of
''AttributeValueList'', so it's certainly possible for a program to react to
the type of a value, but it would also be trivial to communicate the type as
the value of an additional field, perhaps ''AttributeValueListType'', which
obviates the need to have type indicators built into the syntax.  vrtcl is very
much about clean syntax, and some of the complexity of the ruleset arises in
order to minimize syntax and maximize conciseness and legibility.

vrtcl supports type indicators in order to support lossless tranformations
to/from other formats, such as [XML] and [JSON], and also because even though
type indicators are often not needed, maybe sometimes they are, and maybe if
vrtcl supports them, they will be put to good, judicious, and even ingenuitive
use.

vrtcl aspires to be extensible like [XML] is, and [JSON] is not.  If the type
tagging is removed from the lexical level but supported in the grammar,
additional types can more easily be added.  The information record is also wide
open to other novel use.

I fully agree with your analysis of the usefulness of schemas, and the
`elements` information field is a hint at the direction vrtcl will take in
support of schemas.  So why also support type tags in record fields? Because
schemas can be a hassle to write, and making them mandatory would be a barrier
to usability in many cases.   Even if a programmer has an idea of what the data
schema is, it can take a while during development for the schema to settle
down.  In the meantime, the programmer often just wants to throw data around
and work with the data in an ad-hoc fashion for a while.  vrtcl aims to
be a superior alternative at all stages.  Besides, if both schemas and inline
information are available, interesting patterns might emerge.  In short, the
reasons are what you guessed I had in mind, and then some.

Supporting information at each field could end up giving schemas a [CSS]
characteristic, where schemas for subdocuments are built up in a cascading
manner.  The information record is a vrtcl document in its own right, giving it a flexibility
that [XML], with its special syntax for attributes, doesn't have.  How far down
the information rabbit hole will people go?  vrtcl leaves that undefined to keep
the possibilities open. 

I have no intention of using [[[tcl::unsupported::respresentation]]] :)

[PYK] 2014-10-15 2016-08-08:  vrtcl defaults have recently been tuned to reduce the brace
count.  Extra braces usually happen now in the context of a list of lists.  In
the example below, `three four` is the first item in the third list:

======
example {list; list} {one two {{three four}}}
======

If a field contains only one word, that word is the
value of the field, and is interpreted as a list. This rule minimizes the
number of characters necessary convey the structure in some common cases.  The
idea is that the programmer doesn't want to be bothered with naming the field
or providing any information.  In most cases, a `[list]` value will be exactly
what's wanted, or at least it isn't that difficult to format a value as a list
containing one item.

At points where it feels like there are too many braces, vrtcl probably
supports some syntax that will eliminate those extra braces. In the previous
example, The default `ctype` of a list `list` is also `string`, meaning that it
can be written like this:

======
example {one two {three four}}
======

Regarding semicolons, yes, they have their [dodekalogue%|%normal] meaning.
Some of the examples already illustrate semicolons, and a few examples
contain comments.  Regarding substitutions, double quotes instead of braces,
Backslashes,  blank lines, whitespace, and indenting:  Yes!  I haven't ruled
any of these things out for vrtcl, and I hope it can continue to support all of
them.  All normal Tcl processing will be performed on the words of a field
prior to its evaluation by vrtcl.  Tcl syntax makes those things easy enough to
avoid when they are unwanted. But this is where the '''embryonic''' status of
vrtcl comes into play.  Any of these features that turn out to be too
distracting or unwieldy, might still get cut, though I don't anticipate that.
As is the case with standard Tcl, substituted values will not be rescanned, but
macros such as those found in [Config file using slave interp] could be
implemented in a more controlled manner through the information "hook". 

I'm not currently considering adding explicit general scripting capability in
the form of looping, conditionals, or other programming features.  vrtcl is,
after all, a data format.  I think the [XSLT] approach, in which a separate
document is created that describes transformations to effect on other
documents, makes for a good separation of concerns.  The information records of
vrtcl turn it into the swiss cheese of data formats, so there's plenty of room
to fill in the blanks as occasion arises.

Thank you for the feedback!



** Page Authors **

   [PYK]:   


<<categories>> JSON | data format