htmlparse is a module in the [tcllib] library of Tcl code.
The htmlparse package provides commands that allow libraries and applications
to parse [HTML] in a string into a representation of their choice.
(From the man page [http://tcllib.sourceforge.net/doc/htmlparse.html])
Documentation can be found at http://tcllib.sourceforge.net/doc/htmlparse.html
----
''[escargo] 8 Aug 2005'' - Once you have parsed the [HTML] file and have it in a tree
(thanks to ''htmlparse::2tree''), is there a convenient way to write the resulting tree
back out as HTML?
Or is that supposed to be obvious?
[schlenk] - As the tree is implemented via the [struct]::tree data structure
you should be able to simply [[call]] its walk method with a simple formatting proc
to serialize the tree back to html.
The [html] package may be helpful there.
[escargo] - I'll have to dig into it a bit more, but it appears that if I want to generate opening
and closing tags, that of the eight possible traversal policies, only one would be right.
It looks like ''order == both'' and ''dfs'' (depth-first search),
which provides enter and leave actions and parent before and after children
gives the opportunities to wrap subtrees in the proper opening and closing tags.
Of course, I was hoping that htmlparse could perform round-trip operations.
That way you could use 2tree, removeVisualFluff, removeFormDefs,
and then output the modified tree.
No such luck.
----
[MSW] Either it's me or htmlparse gets the structure of a HTML doc wrong.
(description deleted)
[schlenk] Put a bug report on tcllib at SF for this.
[MSW] Done, #1008619 [https://sourceforge.net/tracker/index.php?func=detail&aid=1008619&group_id=12883&atid=112883].
[HJG] 2006-12-03: Issue is still open
''[escargo] 3 Dec 2006'' - There is also an issue parsing blocks
where it gets confused by JavaScript expressions that are allowed by the standard (like i tags inside of the literal strings), htmlparse gets seriously confused about the parse tree
of the result.
----
''[escargo] 18 Feb 2008'' - After digging around in the code for htmlparse, I think it is
inadequate for the task I need to perform. The HTML I need to parse has JavaScript inside of
of