htmlparse is a module in the [tcllib] library of Tcl code. The htmlparse package provides commands that allow libraries and applications to parse [HTML] in a string into a representation of their choice. (From the man page [http://tcllib.sourceforge.net/doc/htmlparse.html]) Documentation can be found at http://tcllib.sourceforge.net/doc/htmlparse.html ---- ''[escargo] 8 Aug 2005'' - Once you have parsed the [HTML] file and have it in a tree (thanks to ''htmlparse::2tree''), is there a convenient way to write the resulting tree back out as HTML? Or is that supposed to be obvious? [schlenk] - As the tree is implemented via the [struct]::tree data structure you should be able to simply [[call]] its walk method with a simple formatting proc to serialize the tree back to html. The [html] package may be helpful there. [escargo] - I'll have to dig into it a bit more, but it appears that if I want to generate opening and closing tags, that of the eight possible traversal policies, only one would be right. It looks like ''order == both'' and ''dfs'' (depth-first search), which provides enter and leave actions and parent before and after children gives the opportunities to wrap subtrees in the proper opening and closing tags. Of course, I was hoping that htmlparse could perform round-trip operations. That way you could use 2tree, removeVisualFluff, removeFormDefs, and then output the modified tree. No such luck. ---- [MSW] Either it's me or htmlparse gets the structure of a HTML doc wrong. (description deleted) [schlenk] Put a bug report on tcllib at SF for this. [MSW] Done, #1008619 [https://sourceforge.net/tracker/index.php?func=detail&aid=1008619&group_id=12883&atid=112883]. [HJG] 2006-12-03: Issue is still open ''[escargo] 3 Dec 2006'' - There is also an issue parsing blocks where it gets confused by JavaScript expressions that are allowed by the standard (like i tags inside of the literal strings), htmlparse gets seriously confused about the parse tree of the result. ---- ''[escargo] 18 Feb 2008'' - After digging around in the code for htmlparse, I think it is inadequate for the task I need to perform. The HTML I need to parse has JavaScript inside of of