XML pretty-printing

[ CL means by this "display formatting".]

[Explain what's available in TclDOM and tdom.]

For TclDOM, just use the -indent option. For example,

dom::serialize $dom -indent yes

The man page [L1 ] explains the use of -indent and -indentspec.

[Explain pretty_print package--application, why not at SF, plans, ...] [License, too.] [Last one CL packaged was http://www.phaseit.net/claird/comp.lang.tcl/dom_pretty_print/dom_pretty_print-0.2.tar.Z ]


RS can't resist to contribute a quite pretty one-liner proc:

 package require tdom
 proc xmlpretty xml {[[dom parse $xml] documentElement] asXML}

 % xmlpretty "<foo a=\"b\"><bar>grill</bar><room/></foo>"
 <foo a="b">
     <bar>grill</bar>
     <room/>
 </foo>

Nicely functional (indenting is default in the asXML method), it's just that it leaks memory: the document object is never freed. So here is a cleaner, but not so compact version:

 proc xmlpretty xml {
     dom parse $xml mydom
     set res [[$mydom documentElement] asXML]
     $mydom delete
     set res
 } ;# RS

2004-03-13: Revisiting this page, I see that this is of course yet another use case for the K combinator:

 proc xmlpretty xml {
     dom parse $xml mydom
     K [[$mydom documentElement] asXML] [$mydom delete]
 }
 proc K {a b} {set a}

NEM notes that this can return to a one liner:

 proc xmlpretty xml { [[dom parse $xml doc] documentElement] asXML }

This takes advantage of the simple garbage-collection scheme built in to tDOM. When you use the syntax:

 dom parse $xml varName

tdom puts a trace on the varName, so that when it goes out of scope, the associated dom tree is deleted.


Also see XML and http://software.decisionsoft.com/software/xmlpp.pl


Here's a pure-Tcl pretty-print proc:

 proc pretty-print {xml} {
    set ident 0
    set idx1 0
    set idx2 0
    set buffer ""
    set result ""
    
    regsub -all {>\s+<} $xml {><} xml; # remove whitespace (newlines and tabs between tags)
    
    while {1} {
       set idx2 [string first >< $xml $idx1]
       if {$idx2 != -1} {
          set buffer [string range $xml $idx1 $idx2]
          
          # pre decrement if this is a closing tag
          if {[string index $buffer 1] == "/"} { incr ident -1 }
          
          append result "[string repeat \t $ident]$buffer\n"
          
          if {![regexp {^<\?|</|/>$} $buffer]} { incr ident }
          set idx1 [expr $idx2+1]
          
       } else {
          break
       }
    }
    
    append result [string range $xml $idx1 end]
 }

note: this is broken for <!-- comments --> and newlines. Adding:

  regsub -all {\n} $xml {} xml

fixes newlines.. haven't worked on the comments.