htmlparse is a module in the [tcllib] library of Tcl code.
The htmlparse package provides commands that allow libraries and applications to parse [HTML] in a string into a representation of their choice. (From the man page [http://tcllib.sourceforge.net/doc/htmlparse.html])
Documentation can be found at http://tcllib.sourceforge.net/doc/htmlparse.html
----
''[escargo] 8 Aug 2005'' - Once you have parsed the [HTML] file and have it in a tree (thanks to
''htmlparse::2tree''), is there a convenient way to write the resulting tree back out as HTML?
Or is that supposed to be obvious?
[schlenk] - As the tree is implemented via the [struct]::tree datastructure you should be able to simply
[[call]] its walk method with a simple formatting proc to serialize the tree back to html.
The [html] package may be helpful there.
[escargo] - I'll have to dig into it a bit more, but it appears that if I want to generate opening
and closing tags, that of the eight possible traveral policies, only one would be right. It looks
like ''order == both'' and ''dfs'' (depth-first search), which provides enter and leave actions and
parent before and after children gives the opportunities to wrap subtrees in the proper opening and
closing tags. Of course, I was hoping that htmlparse could perform round-trip operations. That
way you could use 2tree, removeVisualFluff, removeFormDefs, and then output the modified tree.
No such luck.
----
[MSW] Either it's me or htmlparse gets the structure of a HTML doc wrong.
(description deleted)
[schlenk] Put a bug report on tcllib at SF for this.
[MSW] Done, #1008619.
----
Anybody know where to find an online document for the HTML [DTD] ?
Try the W3C: http://www.w3.org/TR/html4/sgml/dtd.html
----
[Category Package], subset [Tcllib]