Cloverfield - References

Cloverfield references using the {&id} word modifier is a solution to the problem of defining and serializing structures with circular references (see Cloverfield - Tridekalogue). Let's consider a tree whose nodes have to reference their root or parent node in some way (for instance, a DOM tree from XML data). E.g:

    root
    |
    +-node1
    |
    +-node2
    | |
    | +-node21
    | |
    | +-node22
    | |
    | +-node23
    |
    +-node3

Each node (including the root) is a structure that holds a value (here root, node1...), a list of children, and a reference to its root parent. Expressing this structure in plain Tcl gives:

    set tree {root {
        {node1 {}}
        {node2 {
            {node21 {}}
            {node22 {}}
            {node23 {}}
        }
        {node3 {}}
    }}

This is a tree expressed as a recursive list. Each node holds 2 elements: a name and a list of children. Nodes "know" their root and parent implicitly because they are part of the same Tcl variable. If we want to extract a given node, we lose the reference information so we have to retrieve this metadata somewhere. Typically one would have to maintain this information in variables, with some in-band signaling convention, or with specific naming conventions. For example, we could store the names of the root and parent nodes in the node structure (in-band signaling).

    set tree {root root root {
        {node1 root root {}}
        {node2 root root {
            {node21 root node2 {}}
            {node22 root node2 {}}
            {node23 root node2 {}}
        }
        {node3 root root {}}
    }}

Extracting node22 gives {node22 root node2 {}}.

Knowing that the data is stored in variable tree, one could find the root and parent nodes by searching the tree. But this is a potentially expensive operation and needs explicit coding.

Now consider the same structure with Cloverfield's {&id} word modifier:

    set tree {root {&}{} {&}{} (
        {&node1}(node1 {&}{} {&}{} {})
        {&node2}(node2 {&}{} {&}{} (
            {&node21}(node21 {&}{} {&node2}{} {})
            {&node22}(node22 {&}{} {&node2}{} {})
            {&node23}(node23 {&}{} {&node2}{} {})
        ))
        {&node3}(node3 {&}{} {&}{} {})
    }

(Note that this is the string representation of the tree. Behind the curtains, the structure maintains pointers to the referenced data.)

Here we chose to use the node name as the reference ID, but IDs are arbitrary. The tree is now expressed as a recursive list with all nodes tagged by a unique reference ID. Each node holds 4 elements: a name, a reference to the root node, a reference to the parent node, and a list of children. This string representation is similar to the above Tcl example with in-band signaling. The difference is that elements 1 and 2 respectively hold references to the root and parent nodes, whereas the Tcl version required an explicit naming convention.

Some example use:

    % set node [lindex $tree 3 2 3 2]; # 3's are for each node's children list.
    # Note: indentation and inline comments added to the below string representation for clarity.
    # Value:
    node22
    
    # Root: first reference to root is serialized. 
    {&root}(
        root
        
        # Subsequent references to root aren't serialized. Here, root and parent.
        {&root}{}
        {&root}{}
    
        # Children:
        (
            {&node1}(node1 {&root}{} {&root}{} {})
            {&node2}(node2 {&root}{} {&root}{} (
                # Siblings of node22 get serialized...
                {&node21}(node21 {&root}{} {&node2}{} {})
                
                # ... but not node22, which is the current word's root element.
                {&}{}
                
                {&node23}(node23 {&root}{} {&node2}{} {})
                
            ))
            {&node3}(node3 {&root}{} {&root}{} {})
        )
    )
        
    # Parent: reference was already serialized above.
    {&node2}{}
    
    # Childen:
    {}
    %
    %
    %
    %
    % # Get node name.
    % lindex $node 0
    node22
    %
    %
    %
    %
    % # Get node parent name.
    % lindex $node 2 0 ; # 2 is for parent.
    node2
    %
    %
    %
    %
    # Get siblings.
    % lindex $node 2 3 ; # 2 for parent, 3 for children.
    # Note: indentation and inline comments added to the below string representation for clarity.
    {&node21}(
        # Value:
        node21 
        
        # Root: first reference to root is serialized. 
        {&root}(
            root
            
            # Subsequent references to root aren't serialized.
            {&root}{}
            {&root}{}
        
            # Children:
            (
                {&node1}(node1 {&root}{} {&root}{} {})

                # node2's children list is the current word's root element.
                {&node2}(node2 {&root}{} {&root}{} {&}{})

                {&node3}(node3 {&root}{} {&root}{} {})
            )
        )

        # Parent: reference was already serialized above.
        {&node2}{}

        # Children.
        {}
    )
    
    # Next siblings: references were already serialized above.
    {&node22}(node22 {&root}{} {&node2}{} {})
    {&node23}(node23 {&root}{} {&node2}{} {})

Referenced structures are never serialized more than once (the referenced value is serialized at the first occurrence). That way, structures with circular references can be serialized in finite space, expressed as strings, and compared for equality. For a simpler example, consider this:

    % set l {a {&}{}}
    a {&}{}

The above structure is a list whose 2nd element is a reference to itself (designated by the special reference {&}). Hence:

    % lindex $l 1; # Or $l{1} using Cloverfield conventions.
    a {&}{}
    % expr {[lindex $l 1] == $l}
    1

References can be combined with argument expansion. For example:

    % set l {a {*}{&}{}}

gives an infinite (flat) list of "a". The following code loops indefinitely:

    foreach v {a {*}{&}{}} {
        puts $v
    }

The argument expansion word modifier must be placed before the reference so that the referenced value be expanded in-place in the containing list. Placing {*} after {&id} has no effect since the reference word modifier tags a single word.


Notice also the use of parentheses in place of braces, in order for the reference IDs to share the same scope. For example:

    {&a}(a {&a}{}) 
    {&a}{a {&a}{}}

In the first case, both references a designate the same reference, contrary to the second case because words are not evaluated at the same time. The first case is a self-referencing structure, whereas the second is equal to the list {a {}} value-wise.

The consequence is that the following works as expected:

    % set l1 ({&a}1 2 {&a}{})
    % set l2 ($l1 {&a}3 {&a}{})
    {{&a}1 2 {&a}{}} {&a}3 {&a}{}
    # The above equals {{1 2 1} 3 3} value-wise.

l1's references don't collide with l2's unexpectedly.

In summary, references don't cross brace boundaries, because braces delay the evaluation of their content. That is the main reason why parentheses are necessary for references to work.


I've stumbled upon Lisp's *print-circle* [L1 ] flag that controls how circular structures can be printed. This output method is surprisingly similar in principle to what Cloverfield introduces, let alone the syntax: each referred structure is first prefixed by #i= where i is an integer (likely a monotonically increasing value for each encountered ref), and each subsequent occurrence by #i#. A common point between Lisp and Cloverfield is that the value is only output when first encountered. However Lisp cannot read this format back contrary to Cloverfield, it is an output-only method.

In the following examples a circular list is built by adding a reference to itself as the last element:

Lisp:

    (let ((a (list 1 2 3)))
        (setf (cdddr a) a)
        (let ((*print-circle* t))
            (write a)
            :done))
#1=(1 2 3 . #1#)

Cloverfield:

    % set a (1 2 3 {*}{&}{})
    1 2 3 {*}{&}{}

The Cloverfield version uses the expansion modifier to include the list inline into itself.


See also