htmlparse is a module in the tcllib library of Tcl code.
The htmlparse package provides commands that allow libraries and applications to parse HTML in a string into a representation of their choice. (From the man page [L1 ])
Documentation can be found at http://tcllib.sourceforge.net/doc/htmlparse.html
MSW Either it's me or htmlparse gets the structure of a HTML doc wrong.
$ cat wrong.tcl package require htmlparse set indent 0 set t [struct::tree] set html {<html><head></head><body><h1>heading</h1><p>ayaken!</p></body></html>} proc painter {tree act node} { global indent if {$act == "enter"} then { incr indent puts ">[string repeat - [expr {$indent-1}]][$tree get $node type]" } else { puts "<[string repeat - [expr {$indent-1}]][$tree get $node type]" incr indent -1 } } htmlparse::2tree $html $t $t walk root -order both -type dfs {act node} {painter $t $act $node} $ tclsh wrong.tcl >root >-hmstart >--html >---head <---head >---body >----h1 >-----PCDATA <-----PCDATA >-----p >------PCDATA <------PCDATA <-----p <----h1 <---body <--html <-hmstart <root
Check where in the tree the p ended up. As child of the h1 ??
Category Package, subset Tcllib