Version 23 of html

Updated 2002-07-10 15:18:11

HTML stands for HyperText Markup Language.


Documentation can be found at http://tcllib.sourceforge.net/doc/html.html

This is a module in the tcllib library of tcl code. Its purpose is to aid developers in generating HTML programmatically. Related modules are ncgi and javascript.


Many with an interest in HTML will want to know about the HTML widgets page, which discusses widgets that render HTML into a visual representation.


If only parsing of HTML is required, without rendering, htmlparse (a module in tcllib) is a possible solution. Other solutions include tkHTML compiled for use without Tk, and tcltidy.

Some Tclers advertise tDOM's XPath-oriented parser as desirable for HTML work. This seems particularly popular among European brethren. Note Jochen Loewer's report on his eBay Web scraping at the Second European Tcl/Tk Users Meeting ( http://www.tu-harburg.de/skf/tcltk/tclum2001.pdf.gz ).


A recent posting to comp.lang.tcl contained a really small example - it can be found at http://groups.google.com/groups?hl=en&frame=right&th=56ce65fa7fa8bdec&seekm=a8q1kf%24sg5%241%40bagan.srce.hr#link4

If the task is to 'pull out' some data out of a HTML page, I'm indeed a strong believer in the 'parse the HTML page into a tree and query that tree' approach. For real life problems, I claim that this approach is much simpler and easier to maintain - and for sure, you have to maintain such a thingy, because the layout of HTML pages tend to change frequently - than every regexp approach. Sure, you have to learn another query language - xpath in this case. But if you are really in the web business, there are chances you have to learn xpath anyway.

de


If the task is to generate HTML or XML, try xmlgen, a package within TclXML and found on http://sourceforge.net/projects/tclxml/ .


Category Package, subset Tcllib | Category Internet