Playing SAX

Richard Suchenwirth 2002-11-24 - Reading "XML in a Nutshell", I was happy to see Tcl mentioned at least in one place (though not in the index). Code examples in that book are mostly in Java. However, their example 19-3, where, on little more than two pages of code, an XML document statistics program based on SAX (Simple API for XML) (counting elements, attributes, processing instructions, text characters) is developed, prompted me to remake it in Tcl (on half a page of code, of course ;-) The code below uses the expat SAX parser delivered with tDOM, but should run with little modifications with TclXML's parser too.

2003-07-30: added a few lines to do by-element-name statistics - RS


 #!/usr/bin/env tclsh

 package require tdom

 #--- Callbacks for certain parser events
 proc el {name attlist} {
     global g
     incr ::nEl
     incr ::nAtt [llength $attlist]
     incr g($name)
 }
 proc ch data {
    incr ::nChar [string length $data]
 }
 proc pi {target data} {
    incr ::nPi
 }

 #--- "main" loop
 if ![llength $argv] {puts "usage: $argv0 file..."}
 foreach file $argv {
     foreach i {nEl nAtt nChar nPi} {set $i 0} ;# reset counters
     set p [expat -elementstartcommand el \
            -characterdatacommand          ch \
            -processinginstructioncommand  pi ]
     if [catch {$p parsefile $file} res] {
                puts "error:$res"
     } else {
        puts "$file:\n$nEl elements, $nAtt attributes, $nChar characters,\
            $nPi processing instructions"
         foreach name [lsort [array names g]] {
             puts [format %-20s%7d $name $g($name)]
         }
    }
    $p free
 }