exhibit

Lars H, 2010-06-01: This is a package I didn't quite release two years ago, but which was now brought to my attention again by pdict: Pretty print a dict. Posting the manpage and code here in case of interest:

jdc 24-jan-2011 Doctool no longer supported in Wiki

<<doctool>>
[manpage_begin exhibit n 1.0]
[moddesc "Format complex data as text"]
[titledesc "Format complex or otherwise unintelligible data as text.\
  Typical applications are logging, debugging, and similar situations\
  in which data are to be shown to the user."]
[copyright "2008 Lars Hellstr\u00F6m <[email protected]>"]
[require Tcl 8.2]
[require exhibit 1.0]
[description]


[section {Introduction}]

The [package {exhibit}] package offers facilities for formatting 
(potentially complex or otherwise unintelligible) data before showing 
it to a user. It is not a serialisation toolbox (Tcl usually does 
that well by itself), but rather a presentation toolbox. 
The output is always plain text, often with the assumption that all 
characters have the same width when shown to the user. 
Indentation is used to express nesting of data. 
There is generally no limit on the length of lines in the output.
[para]

There are two basic, and several more advanced ways to make use of 
the commands in this package. The first basic method is to pass the 
data to be formatted to the command matching the outermost container 
type of that data, and be done with it. This gives a full 
presentation as long as containers (lists, dicts, and whatsnot) are 
not nested within each other.
[para]

The second basic method, which assumes your data is in the form of a 
dictionary following the [sectref {The type suffix convention}\
{type suffix convention}], is to pass it to the 
[cmd exhibit::dictbysuffix] command. 
This assumes entries in the dictionary that require special 
formatting have names ending with suffixes that specify the type of 
that entry, and selects formatting for the entries based on this 
assumption. If names are from the start chosen according to this 
convention, then this method of use can format data structures nested 
to arbitrary depth.
[para]

Finally, the formatters for container types generally take an option 
that lets you specify what formatter or formatters to use for the 
elements in that container. This provides full control over the 
formatting of nested data, at the price of a more complicated call. 
You may also write your own formatting command and have the 
formatters of this package make use of it, if none of them fulfills 
your needs.


[section {Variables}]

The package has two variables [var indent] and [var wrapwidth], which 
store defaults for the [option -indent] and [option -width] 
respectively options; changing their values changes the defaults for 
all users of the [package exhibit] package. [var indent] is used by 
all commands that take an [option -indent] option and is initialised 
to four spaces. [var wrapwidth] is initialised to [const 70] and is 
only used by [cmd dictbysuffix] (technically [cmd choice_bysuffix]) 
when it calls [cmd format_small_list].


[section {Commands}]

Most commands in the package take data to be formatted as the final 
argument. Previous arguments (if any) are usually option-value pairs 
and control how the data is formatted.
[para]

All of the following commands are in the [namespace exhibit] 
namespace. They are also all exported by this namespace. 
All command options are of the [arg -option] [arg value] type (two 
argument words per option) and all commands that take options ignore 
unrecognised options.

[list_begin definitions]
  [call [cmd format_bytes] [opt "[arg option] [arg value] ..."] [arg data]]
    This command formats binary data. For example, 
    [example {exhibit::format_bytes "Tool \x01 Command \xDC Language"}]
    returns
    [example\
     "000  54 6f 6f 6c 20 01 20 43 6f 6d 6d 61 6e 64 20 dc \
      |Tool . Command \xDC|\
    \n016  20 4c 61 6e 67 75 61 67 65                      \
      | Language|"]
    
    The available options are:
    [list_begin options]
      [opt_def -address "([arg boolean] | [const auto])"]
        Should the lines contain (as an extra first column) an "address" 
        of the first byte on that line? Default is [const auto], 
        which includes addresses when there is more than one line. 
        Addresses are always decimal and at least three digits, to 
        help distinguish them from the data bytes which are always 
        two hex digits in each column.
      [opt_def -base [arg integer]]
        The "base address" for the data, i.e., the address of the 
        first byte. Defaults to [const 0].
      [opt_def -encoding [arg name]]
        The encoding to use when translating the bytes of the data to 
        characters, for display in the last column. Defaults to 
        [const iso8859-1]. Use [const ascii] if bytes above 127 has 
        no interpretation as characters.
      [opt_def -nonprint [arg character]]
        The substitute character to put in the text representation 
        when the real character is nonprinting. Defaults to a period 
        (\u002E).
      [opt_def -notinenc [arg character]]
        The substitute character to put in the text representation 
        when the byte does not correspond to any character under the 
        specified encoding. Defaults to a question mark (\u003F).
      [opt_def -perline [arg integer]]
        The number of bytes per line. Defaults to [const 16].
      [opt_def -printable [arg regexp]]
        The regular expression that is used to distinguish printing 
        characters from nonprinting; a character is considered 
        printing if it matches this regexp. Defaults to the character 
        class [const graph] plus space (\u0020), i.e., 
        "[const "[lb][lb]:graph:[rb] [rb]"]". Use "[const {[ -~]}]" if 
        you're only prepared to handle ASCII characters in the result.
    [list_end]
    
  [call [cmd format_dict] [opt "[arg option] [arg value] ..."] [arg data]]
    This command formats a dictionary value. The result is similar to 
    that of the [cmd parray] command, with a line
    [example {keyName    = value}]
    for each entry, the order of entries are sorted by the keys, and 
    the equal signs are aligned horizontally. There is however a 
    variation: if the formatted form of a value contains a newline 
    character (i.e., is more than one line of text) then the entry is 
    formatted as
    [example "keyName    :\n    Value line 1\n    Value line 2\n   \
      Value line 3\n    ..."]
    with each line of the value indented by a specified amount, and 
    the equals sign replaced by a colon.
    [para]
    
    The available options are:
    [list_begin options]
      [opt_def -indent [arg text]]
        The string prepended to each line of the value when indenting 
        it. Defaults to the value of the [var exhibit::indent] 
        variable, which by default is set to four spaces.
      [opt_def -entrycmd [arg cmdprefix]]
        The command prefix used to format the values. Defaults to 
        [cmd format_identity], which returns them unchanged.
      [opt_def -choicecmd [arg cmdprefix]]
        A command prefix used to format the values while taking the 
        key names into account. It is called as
        [list_begin definitions]
          [call \{*\}[arg choicecmd] [arg key] [arg value]]
        [list_end]
        Specifying a [option -choicecmd] overrides the 
        [option -entrycmd], if any.
      [opt_def -sortcmd [arg cmdprefix]]
        A command prefix used to sort the entries in the dictionary, 
        i.e., determine in which order they should be listed. It is 
        called as
        [list_begin definitions]
          [call \{*\}[arg sortcmd] [arg list-of-keys]]
        [list_end]
        and defaults to "[cmd lsort] [option -dictionary]". Each element 
        in the return value from this call gives rise to one shown entry 
        in the result of [cmd format_dict]. 
    [list_end]
    
  [call [cmd format_identity] [arg string]]
    This command simply returns its argument.
    
  [call [cmd format_list] [opt "[arg option] [arg value] ..."] [arg data]]
    This command formats a list value. It is geared towards lists 
    with large elements, since the format is
[example_begin]
List element 0:
    [emph {First list element}]
List element 1:
    [emph {Second list element}]
List element 2:
    [emph {Third list element}]
End of list of length 3.
[example_end]
    The options are
    [list_begin options]
      [opt_def -formatcmd]
        The command prefix used to format the list elements. 
        Defaults to [cmd format_identity].
      [opt_def -indent [arg text]]
        As in [cmd format_dict].
    [list_end]
    
  [call [cmd format_matrix] [opt "[arg option] [arg value] ..."] [arg data]]
    This command formats a list of lists as a "matrix", with columns 
    that line up (provided all characters are the same width).
    [para]
    
    The alignment is attained by adjusting the whitespace between 
    list elements, and by inserting suitable indentation after every 
    newline in the text of an entry. The result is still a list of 
    lists: there is a `{' at the beginning of every row, a `}' at 
    the end of every row, and the entries themselves are 
    [cmd list]-quoted. This quoting is however applied to formatted 
    forms of the entries, so although it is possible to extract one 
    entry from the result using [cmd lindex], there is no reason to 
    expect that this recovers the actual matrix element. In 
    particular, if the formatted form of an entry is more than one 
    line, then indentation has been inserted into the entry text to 
    make subsequent lines line up with the first.
    [para]
    
    In a matrix with multiline entries, such as the following:
    [example {exhibit::format_matrix "{AnEntry {An entry}\
      {An entry\nwith two lines} {An entry\nwith 3\nlines}\
      {Final entry}} {1 2 3 4 5}"}]
    the bottom line in one entry lines up with the top line of the 
    next, like so:
[example_begin]
{ AnEntry {An entry} {An entry
                      with two lines} {An entry
                                       with 3
                                       lines}   {Final entry} }
{ 1        2          3                4         5            }
[example_end]
    Noteable here is also that braces delimiting an entry are 
    disregarded when aligning columns, although they are taken into 
    account when computing the widths of the the columns.
    [para]
    
    The options supported by this command are
    [list_begin options]
      [opt_def -colsep]
        The string to insert between two entries in a row. Defaults 
        to a single space.
      [opt_def -formatcmd]
        The command called to format the entries. Defaults to 
        [cmd format_identity].
    [list_end]

  [call [cmd format_small_list] [opt "[arg option] [arg value] ..."] [arg data]]
    This command formats a list value. It is geared towards "small" 
    lists that fit comfortably on a single line, but occationally 
    contain larger amounts of text.
    [para]
    
    In the single-line formatting, this command simply returns the 
    [arg data]. However, if there is a newline somewhere in the 
    [arg data] then it switches to a multiline formatting, in which 
    all whitespace between list elements is replaced by a newline, 
    to make sure every list element begins at the start of a line. 
    Hence
    [example "exhibit::format_small_list {{a b\nc d} e {f\ng h}}"]
    returns
    [example "{a b\nc d}\ne\n{f\ng h}"]
    The multiline formatting is also chosen if the [option -width] 
    option is specified and the string length of the [arg data] 
    exceeds the specified value. There are no other options for 
    this command.
    
  [call [cmd sort_asfollows] [arg given] [arg sortcmd] [arg keys]]
    This is a sorting command which places some [arg given] keys 
    first and then uses the provided [arg sortcmd] to sort remaining 
    [arg keys]. For example,
    [example {exhibit::sort_asfollows {name foo employment}\
      {lsort -dictionary} {age phone employment name address}}]
    returns
    [example {name employment address age phone}]
    i.e., a reordering of the elements in the list [arg keys], where 
    those that occur in [arg given] comes first, in the same order as 
    in [arg given], and remaining elements from [arg keys] follow, in 
    the order they would be put by the [arg sortcmd].
    
  [call [cmd dictbysuffix] [opt "[arg option] [arg value] ..."] [arg data]]
    This command formats a dictionary, and applies to each entry 
    formatting according to a "type suffix" in the key name. 
    Technically, it is just an alias to calling [cmd format_dict] 
    with a [option -choicecmd] of [cmd choice_bysuffix], so any 
    [arg option]s are interpreted by [cmd format_dict].
    
  [call [cmd choice_bysuffix] [arg key] [arg value]]
    This command formats the [arg value]. The [arg key] controls 
    which formatting is applied to the [arg value], according to the 
    [term {type suffix convention}].
    
[list_end]


[section {The type suffix convention}]

The [package exhibit] type suffix convention is that when the value 
of an entry in a dictionary requires special formatting, then the key 
of that entry should end with a suffix that specifies the "type" of the 
value, so that a suitable formatting command can be chosen. As 
implemented by [cmd choice_bysuffix], the type suffix convention 
recognises the following suffixes:
[list_begin definitions]
  [def [const :d]]
    The value is itself a dictionary.
  [def [const :D]]
    The value is itself a dictionary, and the names of its 
    entries follows the type suffix convention as well.
  [def [const :l]]
    The value is a small list (will be formatted by 
    [cmd format_small_list]).
  [def [const :L]]
    The value is a list (will be formatted by [cmd format_list]).
  [def [const :DL]]
    The value is a list of dictionaries, and each element 
    dictionary is to be formatted according to the type suffix 
    convention.
  [def [const :sl]]
    The value is a set-style list, i.e., the order of the elements 
    does not matter. The value is first sorted [option -dictionary] 
    style, and then formatted by [cmd format_small_list].
  [def [const :t]]
    The value is a point in time (number of seconds since the epoch), 
    which should be formatted by [cmd clock] [method format].
  [def [const :raw]]
    The value is a bytestring, and should be formatted by 
    [cmd format_bytes].
[list_end]
Entries not carrying any of the above suffixes are considered to be 
mere strings and are returned as-is.
[para]

The easiest way to extend this list is to define your own 
[option -choicecmd] command that handles the new type suffixes, and 
then calls [cmd choice_bysuffix] for everything else. Note that this 
does not affect how the entries of an embedded [const :D] or 
[const :DL] entry are interpreted however, so you may want to 
override these as well.


[section {Bugs, ideas, feedback}]



[manpage_end]
<<doctool>>

And so the code:

package require Tcl 8.2 ;  # For [string map]
package provide exhibit 1.0

namespace eval exhibit {
    # Standard indentation step
    variable indent "    "
    # -width choice_bysuffix uses for format_small_list
    variable wrapwidth 70
}

proc exhibit::format_identity {data} {set data}

proc exhibit::Choose_allsame {formatcmd key value} {
    eval [linsert $formatcmd end $value]
}

proc exhibit::format_dict {args} {
    array set A [lindex $args end]
    
    variable indent
    array set O {
        -entrycmd format_identity
        -sortcmd {lsort -dictionary}
    }
    set O(-indent) $indent
    array set O [lrange $args 0 end-1]
    if {![info exists O(-choicecmd)]} then {
        set O(-choicecmd) [list Choose_allsame $O(-entrycmd)]
    }
    
    ## 
     # The first pass over the data just determines the length 
     # of the longest key.
     ##
    
    set maxl 0
    foreach name [array names A] {
        if {[string length $name] > $maxl} {
            set maxl [string length $name]
        }
    }
    
    ## 
     # The second pass over the data actually formats 
     # the dictionary.
     ##
    
    set res ""
    foreach name [
        eval [linsert $O(-sortcmd) end [array names A]]
    ] {
        append res [format "%-*s" $maxl $name]
        set value [eval [linsert $O(-choicecmd) end $name $A($name)]]
        if {[string first \n $value]<0} then {
            append res { = } $value \n
        } else {
            append res { :} \n $O(-indent) [
                string map [list \n \n$O(-indent)] $value
            ] \n
        }
    }
    return [string trimright $res \n]
}


proc exhibit::format_small_list {args} {
    array set O [lrange $args 0 end-1]
    set L [lindex $args end]
    if {![string match *\n* $L]} then {
        if {![info exists O(-width)] || [string length $L]<=$O(-width)}\
        then {return $L}
    }
    set LL {}
    foreach item $L {lappend LL [list $item]}
    return [join $LL \n]
}


proc exhibit::format_list {args} {
    variable indent
    set O(-indent) $indent
    set O(-formatcmd) format_identity
    array set O [lrange $args 0 end-1]
    
    set count 0
    set res {}
    foreach item [lindex $args end] {
        append res "List element $count:" \n $O(-indent)
        append res [string map [list \n \n$O(-indent)] [
            eval [linsert $O(-formatcmd) end $item]
        ]] \n
        incr count
    }
    append res "End of list of length $count."
}


proc exhibit::format_bytes {args} {
    set data [lindex $args end]
    array set O {
        -encoding iso8859-1
        -nonprint .
        -notinenc ?
        -printable {[[:graph:] ]}
        -perline 16
        -base 0
        -addressing auto
    }
    array set O [lrange $args 0 end-1]
    
    ## 
     # The first block formats hex and text parts of each line, 
     # and stores them away in the $res list.
     ##
    
    set res {}
    set hexL {} ; set text {}
    foreach byte [split $data ""] {
        binary scan $byte H2 hex
        lappend hexL $hex
        if {![info exists char($hex)]} then {
            set ch [encoding convertfrom $O(-encoding) $byte]
            set byte2 [encoding convertto $O(-encoding) $ch]
            if {$byte2 != $byte} then {
                set char($hex) $O(-notinenc)
            } elseif {[regexp -- $O(-printable) $ch]} then {
                set char($hex) $ch
            } else {
                set char($hex) $O(-nonprint)
            }
        }
        append text $char($hex)
        
        if {[llength $hexL] >= $O(-perline)} then {
            lappend res [join $hexL] $text
            set hexL {} ; set text {}
        }
    }
    if {[llength $hexL]} then {lappend res [join $hexL] $text}
    
    ## 
     # The second block attaches addresses and 
     # generally formats the lines.
     ##
    
    set hexwidth [expr {$O(-perline)*3 - 1}]
    if {$O(-addressing) == "auto"} then {
        set O(-addressing) [expr {[llength $res] > 2}]
    }
    set digits [string length [expr {abs($O(-base))+[string length $data]-1}]]
    if {$digits < 3} then {set digits 3}
    set lineL {}
    set ofs $O(-base)
    foreach {hex text} $res {
        lappend lineL [
            if {$O(-addressing)} then {
                format "%0*d  %-*s  |%s|" $digits $ofs $hexwidth $hex $text
            } else {
                format "%-*s  |%s|" $hexwidth $hex $text
            }
        ]
        incr ofs $O(-perline)
    }
    return [join $lineL \n]
}


proc exhibit::choice_bysuffix {key value} {
    switch -glob -- $key *:raw {
        format_bytes $value
    } *:d {
        format_dict $value
    } *:D {
        format_dict -choicecmd choice_bysuffix $value
    } *:l {
        variable wrapwidth
        format_small_list -width $wrapwidth $value
    } *:sl {
        variable wrapwidth
        format_small_list -width $wrapwidth [lsort -dictionary $value]
    } *:L {
        format_list $value
    } *:DL {
        format_list -formatcmd dictbysuffix $value
    } *:t {
        clock format $value
    } default {
        set value
    }
}


proc exhibit::sort_asfollows {givenL restcmd keyL} {
    foreach key $keyL {set S($key) ""}
    set res {}
    foreach key $givenL {
        if {[info exists S($key)]} then {
            lappend res $key
            unset S($key)
        }
    }
    concat $res [eval [linsert $restcmd end [array names S]]]
}


proc exhibit::format_matrix {args} {
    array set O {
        -formatcmd format_identity
        -colsep " "
    }
    array set O [lrange $args 0 end-1]
    set data [lindex $args end]
    
    ## 
     # The first block formats all the entries, splits multiline entries 
     # into lines, and computes the widths of each column.
     # 
     # The $cwL list has the structure
     #    ( <hasMargin> <width> )*
     # where each such couple describes one column. <hasMargin> signals 
     # that there in some row is a brace that is allowed to stick out to 
     # the left, and that an extra space must be inserted before entries 
     # that aren't braced. The <width> is the actual width, in characters, 
     # including a closing brace for the entry.
     # 
     # The $entryM list is the matrix of formatted but not yet rendered 
     # matrix elements. Its rows are lists with the structure
     #   ( <listOfLinesOfEntry> <isBraced> )*
     # It may be observed that <isBraced> can be false only when 
     # <listOfLinesOfEntry> has exactly one element, since 
     # [list $something] can only contain newlines if it is braced.
     ##
    
    set cwL {}
    set entryM {}
    foreach row $data {
        set new_cwL {}
        set entryL {}
        foreach item $row {m w} $cwL {
            if {$m==""} then {set m 0}
            set text [list [eval [linsert $O(-formatcmd) end $item]]]
            set b [expr {"\{" == [string index $text 0]}]
            lappend entryL [split $text \n] $b
            foreach line [lindex $entryL end-1] adj [list $b] {
                set l [string length $line]
                if {$adj=="1"} then {incr l -1}
                if {$w=="" || $w<$l} then {set w $l}
            }
            lappend new_cwL [expr {$m || $b}] $w
        }
        lappend entryM $entryL
        set cwL $new_cwL
    }
    
    ## 
     # The second block actually renders the entries.
     ##
    
    set lineL {}
    foreach entryL $entryM {
        set line "\{"
        append line $O(-colsep)
        foreach {entry b} $entryL {m w} $cwL {
            if {[llength $entry]>1} then {
                regsub -all . $line " " indent
                append indent " "
                # Since this is a multiline element, it must be braced, 
                # and hence there is a margin; we know $m is 1 without 
                # testing.
                foreach eline [lrange $entry 0 end-1] {
                    append line $eline
                    lappend lineL $line
                    set line $indent
                }
            } elseif {$m && $b!=1} then {
                # Tricky: If the matrix rows are not all the same length, 
                # then $b may be an empty string in that test.
                append line " "
            } elseif {$b==1} then {
                incr w
            }
            
            append line [format %-*s $w [lindex $entry end]]
            append line $O(-colsep)
        }
        append line "\}"
        lappend lineL $line
    }
    return [join $lineL \n]
}
# Technically, there's a bug in that if the first row is shorter 
# than one of those that follows, then it will have nothing (rather 
# than an explicit empty string element) for the missing elements. 
# Making this consistent is however more trouble than it's worth; 
# it's good enough that it works flawlessly for proper matrices 
# (all rows of equal lengths) and does something sensibly for the rest.

namespace eval exhibit {
    interp alias {} [namespace current]::dictbysuffix {}\
      [namespace current]::format_dict -choicecmd choice_bysuffix
    namespace export {[a-z]*}
}