Version 19 of html

Updated 2002-04-15 17:21:02

Documentation can be found at http://tcllib.sourceforge.net/doc/html.html

This is a module in the tcllib library of tcl code. Its purpose is to aid developers in generating HTML programmatically. Related modules are ncgi and javascript.


Many with an interest in HTML will want to know about the HTML widgets page, which discusses widgets that render HTML into a visual representation.


If only parsing of HTML is required, without rendering, htmlparse (a module in tcllib) is a possible solution. Another solution is tkHTML compiled for use without Tk.

Some Tclers advertise tDOM's XPath-oriented parser as desirable for HTML work. This seems particularly popular among European brethren. Note Jochen Loewer's report on his eBay Web scraping at the Second European Tcl/Tk Users Meeting ( http://www.tu-harburg.de/skf/tcltk/tclum2001.pdf.gz ).


A recent posting to comp.lang.tcl contained a really small example - it can be found at http://groups.google.com/groups?hl=en&frame=right&th=56ce65fa7fa8bdec&seekm=a8q1kf%24sg5%241%40bagan.srce.hr#link4

If the task is to 'pull out' some data out of a HTML page, I'm indeed a strong believer in the 'parse the HTML page into a tree and query that tree' approach. For real life problems, I claim that this approach is much simpler and easier to maintain - and for sure, you have to maintain such a thingy, because the layout of HTML pages tend to change frequently - than every regexp approach. Sure, you have to learn another query language - xpath in this case. But if you are really in the web business, there are chances you have to learn xpath anyway.

de


[Refer to htmlgen and xmlgen.]


Category Package, subset Tcllib