Lars H, 2010-06-01: This is a package I didn't quite release two years ago, but which was now brought to my attention again by pdict: Pretty print a dict. Posting the manpage and code here in case of interest:
jdc 24-jan-2011 Doctool no longer supported in Wiki
<<doctool>> [manpage_begin exhibit n 1.0] [moddesc "Format complex data as text"] [titledesc "Format complex or otherwise unintelligible data as text.\ Typical applications are logging, debugging, and similar situations\ in which data are to be shown to the user."] [copyright "2008 Lars Hellstr\u00F6m <[email protected]>"] [require Tcl 8.2] [require exhibit 1.0] [description] [section {Introduction}] The [package {exhibit}] package offers facilities for formatting (potentially complex or otherwise unintelligible) data before showing it to a user. It is not a serialisation toolbox (Tcl usually does that well by itself), but rather a presentation toolbox. The output is always plain text, often with the assumption that all characters have the same width when shown to the user. Indentation is used to express nesting of data. There is generally no limit on the length of lines in the output. [para] There are two basic, and several more advanced ways to make use of the commands in this package. The first basic method is to pass the data to be formatted to the command matching the outermost container type of that data, and be done with it. This gives a full presentation as long as containers (lists, dicts, and whatsnot) are not nested within each other. [para] The second basic method, which assumes your data is in the form of a dictionary following the [sectref {The type suffix convention}\ {type suffix convention}], is to pass it to the [cmd exhibit::dictbysuffix] command. This assumes entries in the dictionary that require special formatting have names ending with suffixes that specify the type of that entry, and selects formatting for the entries based on this assumption. If names are from the start chosen according to this convention, then this method of use can format data structures nested to arbitrary depth. [para] Finally, the formatters for container types generally take an option that lets you specify what formatter or formatters to use for the elements in that container. This provides full control over the formatting of nested data, at the price of a more complicated call. You may also write your own formatting command and have the formatters of this package make use of it, if none of them fulfills your needs. [section {Variables}] The package has two variables [var indent] and [var wrapwidth], which store defaults for the [option -indent] and [option -width] respectively options; changing their values changes the defaults for all users of the [package exhibit] package. [var indent] is used by all commands that take an [option -indent] option and is initialised to four spaces. [var wrapwidth] is initialised to [const 70] and is only used by [cmd dictbysuffix] (technically [cmd choice_bysuffix]) when it calls [cmd format_small_list]. [section {Commands}] Most commands in the package take data to be formatted as the final argument. Previous arguments (if any) are usually option-value pairs and control how the data is formatted. [para] All of the following commands are in the [namespace exhibit] namespace. They are also all exported by this namespace. All command options are of the [arg -option] [arg value] type (two argument words per option) and all commands that take options ignore unrecognised options. [list_begin definitions] [call [cmd format_bytes] [opt "[arg option] [arg value] ..."] [arg data]] This command formats binary data. For example, [example {exhibit::format_bytes "Tool \x01 Command \xDC Language"}] returns [example\ "000 54 6f 6f 6c 20 01 20 43 6f 6d 6d 61 6e 64 20 dc \ |Tool . Command \xDC|\ \n016 20 4c 61 6e 67 75 61 67 65 \ | Language|"] The available options are: [list_begin options] [opt_def -address "([arg boolean] | [const auto])"] Should the lines contain (as an extra first column) an "address" of the first byte on that line? Default is [const auto], which includes addresses when there is more than one line. Addresses are always decimal and at least three digits, to help distinguish them from the data bytes which are always two hex digits in each column. [opt_def -base [arg integer]] The "base address" for the data, i.e., the address of the first byte. Defaults to [const 0]. [opt_def -encoding [arg name]] The encoding to use when translating the bytes of the data to characters, for display in the last column. Defaults to [const iso8859-1]. Use [const ascii] if bytes above 127 has no interpretation as characters. [opt_def -nonprint [arg character]] The substitute character to put in the text representation when the real character is nonprinting. Defaults to a period (\u002E). [opt_def -notinenc [arg character]] The substitute character to put in the text representation when the byte does not correspond to any character under the specified encoding. Defaults to a question mark (\u003F). [opt_def -perline [arg integer]] The number of bytes per line. Defaults to [const 16]. [opt_def -printable [arg regexp]] The regular expression that is used to distinguish printing characters from nonprinting; a character is considered printing if it matches this regexp. Defaults to the character class [const graph] plus space (\u0020), i.e., "[const "[lb][lb]:graph:[rb] [rb]"]". Use "[const {[ -~]}]" if you're only prepared to handle ASCII characters in the result. [list_end] [call [cmd format_dict] [opt "[arg option] [arg value] ..."] [arg data]] This command formats a dictionary value. The result is similar to that of the [cmd parray] command, with a line [example {keyName = value}] for each entry, the order of entries are sorted by the keys, and the equal signs are aligned horizontally. There is however a variation: if the formatted form of a value contains a newline character (i.e., is more than one line of text) then the entry is formatted as [example "keyName :\n Value line 1\n Value line 2\n \ Value line 3\n ..."] with each line of the value indented by a specified amount, and the equals sign replaced by a colon. [para] The available options are: [list_begin options] [opt_def -indent [arg text]] The string prepended to each line of the value when indenting it. Defaults to the value of the [var exhibit::indent] variable, which by default is set to four spaces. [opt_def -entrycmd [arg cmdprefix]] The command prefix used to format the values. Defaults to [cmd format_identity], which returns them unchanged. [opt_def -choicecmd [arg cmdprefix]] A command prefix used to format the values while taking the key names into account. It is called as [list_begin definitions] [call \{*\}[arg choicecmd] [arg key] [arg value]] [list_end] Specifying a [option -choicecmd] overrides the [option -entrycmd], if any. [opt_def -sortcmd [arg cmdprefix]] A command prefix used to sort the entries in the dictionary, i.e., determine in which order they should be listed. It is called as [list_begin definitions] [call \{*\}[arg sortcmd] [arg list-of-keys]] [list_end] and defaults to "[cmd lsort] [option -dictionary]". Each element in the return value from this call gives rise to one shown entry in the result of [cmd format_dict]. [list_end] [call [cmd format_identity] [arg string]] This command simply returns its argument. [call [cmd format_list] [opt "[arg option] [arg value] ..."] [arg data]] This command formats a list value. It is geared towards lists with large elements, since the format is [example_begin] List element 0: [emph {First list element}] List element 1: [emph {Second list element}] List element 2: [emph {Third list element}] End of list of length 3. [example_end] The options are [list_begin options] [opt_def -formatcmd] The command prefix used to format the list elements. Defaults to [cmd format_identity]. [opt_def -indent [arg text]] As in [cmd format_dict]. [list_end] [call [cmd format_matrix] [opt "[arg option] [arg value] ..."] [arg data]] This command formats a list of lists as a "matrix", with columns that line up (provided all characters are the same width). [para] The alignment is attained by adjusting the whitespace between list elements, and by inserting suitable indentation after every newline in the text of an entry. The result is still a list of lists: there is a `{' at the beginning of every row, a `}' at the end of every row, and the entries themselves are [cmd list]-quoted. This quoting is however applied to formatted forms of the entries, so although it is possible to extract one entry from the result using [cmd lindex], there is no reason to expect that this recovers the actual matrix element. In particular, if the formatted form of an entry is more than one line, then indentation has been inserted into the entry text to make subsequent lines line up with the first. [para] In a matrix with multiline entries, such as the following: [example {exhibit::format_matrix "{AnEntry {An entry}\ {An entry\nwith two lines} {An entry\nwith 3\nlines}\ {Final entry}} {1 2 3 4 5}"}] the bottom line in one entry lines up with the top line of the next, like so: [example_begin] { AnEntry {An entry} {An entry with two lines} {An entry with 3 lines} {Final entry} } { 1 2 3 4 5 } [example_end] Noteable here is also that braces delimiting an entry are disregarded when aligning columns, although they are taken into account when computing the widths of the the columns. [para] The options supported by this command are [list_begin options] [opt_def -colsep] The string to insert between two entries in a row. Defaults to a single space. [opt_def -formatcmd] The command called to format the entries. Defaults to [cmd format_identity]. [list_end] [call [cmd format_small_list] [opt "[arg option] [arg value] ..."] [arg data]] This command formats a list value. It is geared towards "small" lists that fit comfortably on a single line, but occationally contain larger amounts of text. [para] In the single-line formatting, this command simply returns the [arg data]. However, if there is a newline somewhere in the [arg data] then it switches to a multiline formatting, in which all whitespace between list elements is replaced by a newline, to make sure every list element begins at the start of a line. Hence [example "exhibit::format_small_list {{a b\nc d} e {f\ng h}}"] returns [example "{a b\nc d}\ne\n{f\ng h}"] The multiline formatting is also chosen if the [option -width] option is specified and the string length of the [arg data] exceeds the specified value. There are no other options for this command. [call [cmd sort_asfollows] [arg given] [arg sortcmd] [arg keys]] This is a sorting command which places some [arg given] keys first and then uses the provided [arg sortcmd] to sort remaining [arg keys]. For example, [example {exhibit::sort_asfollows {name foo employment}\ {lsort -dictionary} {age phone employment name address}}] returns [example {name employment address age phone}] i.e., a reordering of the elements in the list [arg keys], where those that occur in [arg given] comes first, in the same order as in [arg given], and remaining elements from [arg keys] follow, in the order they would be put by the [arg sortcmd]. [call [cmd dictbysuffix] [opt "[arg option] [arg value] ..."] [arg data]] This command formats a dictionary, and applies to each entry formatting according to a "type suffix" in the key name. Technically, it is just an alias to calling [cmd format_dict] with a [option -choicecmd] of [cmd choice_bysuffix], so any [arg option]s are interpreted by [cmd format_dict]. [call [cmd choice_bysuffix] [arg key] [arg value]] This command formats the [arg value]. The [arg key] controls which formatting is applied to the [arg value], according to the [term {type suffix convention}]. [list_end] [section {The type suffix convention}] The [package exhibit] type suffix convention is that when the value of an entry in a dictionary requires special formatting, then the key of that entry should end with a suffix that specifies the "type" of the value, so that a suitable formatting command can be chosen. As implemented by [cmd choice_bysuffix], the type suffix convention recognises the following suffixes: [list_begin definitions] [def [const :d]] The value is itself a dictionary. [def [const :D]] The value is itself a dictionary, and the names of its entries follows the type suffix convention as well. [def [const :l]] The value is a small list (will be formatted by [cmd format_small_list]). [def [const :L]] The value is a list (will be formatted by [cmd format_list]). [def [const :DL]] The value is a list of dictionaries, and each element dictionary is to be formatted according to the type suffix convention. [def [const :sl]] The value is a set-style list, i.e., the order of the elements does not matter. The value is first sorted [option -dictionary] style, and then formatted by [cmd format_small_list]. [def [const :t]] The value is a point in time (number of seconds since the epoch), which should be formatted by [cmd clock] [method format]. [def [const :raw]] The value is a bytestring, and should be formatted by [cmd format_bytes]. [list_end] Entries not carrying any of the above suffixes are considered to be mere strings and are returned as-is. [para] The easiest way to extend this list is to define your own [option -choicecmd] command that handles the new type suffixes, and then calls [cmd choice_bysuffix] for everything else. Note that this does not affect how the entries of an embedded [const :D] or [const :DL] entry are interpreted however, so you may want to override these as well. [section {Bugs, ideas, feedback}] [manpage_end] <<doctool>>
And so the code:
package require Tcl 8.2 ; # For [string map] package provide exhibit 1.0 namespace eval exhibit { # Standard indentation step variable indent " " # -width choice_bysuffix uses for format_small_list variable wrapwidth 70 } proc exhibit::format_identity {data} {set data} proc exhibit::Choose_allsame {formatcmd key value} { eval [linsert $formatcmd end $value] } proc exhibit::format_dict {args} { array set A [lindex $args end] variable indent array set O { -entrycmd format_identity -sortcmd {lsort -dictionary} } set O(-indent) $indent array set O [lrange $args 0 end-1] if {![info exists O(-choicecmd)]} then { set O(-choicecmd) [list Choose_allsame $O(-entrycmd)] } ## # The first pass over the data just determines the length # of the longest key. ## set maxl 0 foreach name [array names A] { if {[string length $name] > $maxl} { set maxl [string length $name] } } ## # The second pass over the data actually formats # the dictionary. ## set res "" foreach name [ eval [linsert $O(-sortcmd) end [array names A]] ] { append res [format "%-*s" $maxl $name] set value [eval [linsert $O(-choicecmd) end $name $A($name)]] if {[string first \n $value]<0} then { append res { = } $value \n } else { append res { :} \n $O(-indent) [ string map [list \n \n$O(-indent)] $value ] \n } } return [string trimright $res \n] } proc exhibit::format_small_list {args} { array set O [lrange $args 0 end-1] set L [lindex $args end] if {![string match *\n* $L]} then { if {![info exists O(-width)] || [string length $L]<=$O(-width)}\ then {return $L} } set LL {} foreach item $L {lappend LL [list $item]} return [join $LL \n] } proc exhibit::format_list {args} { variable indent set O(-indent) $indent set O(-formatcmd) format_identity array set O [lrange $args 0 end-1] set count 0 set res {} foreach item [lindex $args end] { append res "List element $count:" \n $O(-indent) append res [string map [list \n \n$O(-indent)] [ eval [linsert $O(-formatcmd) end $item] ]] \n incr count } append res "End of list of length $count." } proc exhibit::format_bytes {args} { set data [lindex $args end] array set O { -encoding iso8859-1 -nonprint . -notinenc ? -printable {[[:graph:] ]} -perline 16 -base 0 -addressing auto } array set O [lrange $args 0 end-1] ## # The first block formats hex and text parts of each line, # and stores them away in the $res list. ## set res {} set hexL {} ; set text {} foreach byte [split $data ""] { binary scan $byte H2 hex lappend hexL $hex if {![info exists char($hex)]} then { set ch [encoding convertfrom $O(-encoding) $byte] set byte2 [encoding convertto $O(-encoding) $ch] if {$byte2 != $byte} then { set char($hex) $O(-notinenc) } elseif {[regexp -- $O(-printable) $ch]} then { set char($hex) $ch } else { set char($hex) $O(-nonprint) } } append text $char($hex) if {[llength $hexL] >= $O(-perline)} then { lappend res [join $hexL] $text set hexL {} ; set text {} } } if {[llength $hexL]} then {lappend res [join $hexL] $text} ## # The second block attaches addresses and # generally formats the lines. ## set hexwidth [expr {$O(-perline)*3 - 1}] if {$O(-addressing) == "auto"} then { set O(-addressing) [expr {[llength $res] > 2}] } set digits [string length [expr {abs($O(-base))+[string length $data]-1}]] if {$digits < 3} then {set digits 3} set lineL {} set ofs $O(-base) foreach {hex text} $res { lappend lineL [ if {$O(-addressing)} then { format "%0*d %-*s |%s|" $digits $ofs $hexwidth $hex $text } else { format "%-*s |%s|" $hexwidth $hex $text } ] incr ofs $O(-perline) } return [join $lineL \n] } proc exhibit::choice_bysuffix {key value} { switch -glob -- $key *:raw { format_bytes $value } *:d { format_dict $value } *:D { format_dict -choicecmd choice_bysuffix $value } *:l { variable wrapwidth format_small_list -width $wrapwidth $value } *:sl { variable wrapwidth format_small_list -width $wrapwidth [lsort -dictionary $value] } *:L { format_list $value } *:DL { format_list -formatcmd dictbysuffix $value } *:t { clock format $value } default { set value } } proc exhibit::sort_asfollows {givenL restcmd keyL} { foreach key $keyL {set S($key) ""} set res {} foreach key $givenL { if {[info exists S($key)]} then { lappend res $key unset S($key) } } concat $res [eval [linsert $restcmd end [array names S]]] } proc exhibit::format_matrix {args} { array set O { -formatcmd format_identity -colsep " " } array set O [lrange $args 0 end-1] set data [lindex $args end] ## # The first block formats all the entries, splits multiline entries # into lines, and computes the widths of each column. # # The $cwL list has the structure # ( <hasMargin> <width> )* # where each such couple describes one column. <hasMargin> signals # that there in some row is a brace that is allowed to stick out to # the left, and that an extra space must be inserted before entries # that aren't braced. The <width> is the actual width, in characters, # including a closing brace for the entry. # # The $entryM list is the matrix of formatted but not yet rendered # matrix elements. Its rows are lists with the structure # ( <listOfLinesOfEntry> <isBraced> )* # It may be observed that <isBraced> can be false only when # <listOfLinesOfEntry> has exactly one element, since # [list $something] can only contain newlines if it is braced. ## set cwL {} set entryM {} foreach row $data { set new_cwL {} set entryL {} foreach item $row {m w} $cwL { if {$m==""} then {set m 0} set text [list [eval [linsert $O(-formatcmd) end $item]]] set b [expr {"\{" == [string index $text 0]}] lappend entryL [split $text \n] $b foreach line [lindex $entryL end-1] adj [list $b] { set l [string length $line] if {$adj=="1"} then {incr l -1} if {$w=="" || $w<$l} then {set w $l} } lappend new_cwL [expr {$m || $b}] $w } lappend entryM $entryL set cwL $new_cwL } ## # The second block actually renders the entries. ## set lineL {} foreach entryL $entryM { set line "\{" append line $O(-colsep) foreach {entry b} $entryL {m w} $cwL { if {[llength $entry]>1} then { regsub -all . $line " " indent append indent " " # Since this is a multiline element, it must be braced, # and hence there is a margin; we know $m is 1 without # testing. foreach eline [lrange $entry 0 end-1] { append line $eline lappend lineL $line set line $indent } } elseif {$m && $b!=1} then { # Tricky: If the matrix rows are not all the same length, # then $b may be an empty string in that test. append line " " } elseif {$b==1} then { incr w } append line [format %-*s $w [lindex $entry end]] append line $O(-colsep) } append line "\}" lappend lineL $line } return [join $lineL \n] } # Technically, there's a bug in that if the first row is shorter # than one of those that follows, then it will have nothing (rather # than an explicit empty string element) for the missing elements. # Making this consistent is however more trouble than it's worth; # it's good enough that it works flawlessly for proper matrices # (all rows of equal lengths) and does something sensibly for the rest. namespace eval exhibit { interp alias {} [namespace current]::dictbysuffix {}\ [namespace current]::format_dict -choicecmd choice_bysuffix namespace export {[a-z]*} }