'''HTML''', or '''HyperText [Markup Language]''', is a [markup language] used on the [WWW%|%World-Wide Web]. ** Parsing Tools ** [Tcllib html]: a module for generating html [htmlparse]: tools to parse html [tkHTML]: an [extension] that parses and renders HTML, compiled for use without Tk [tcltidy]: a wrapper to Tidy [tkhtml3]: the successor to [tkHTML] [tDOM]'s [XPath]-oriented parser: can be used to manipulate HTML [TclXML]: includes xmlgen for generating HTML or XML [Tclgumbo]: An interface to the Gumbo [HTML5] parsing library ** Generation Tools ** [html form generator], by [CMcC]: Generate HTML forms from Tcl lists. [MajaMaja]: structure and layout a static collection of html pages arranging a wide variety of materials [Wub]: includes a [http://wub.googlecode.com/svn/trunk/Utilities/Html.tcl%|%utility] for structured HTML tag generation [Wiki format to HTML]: ** See Also ** [HTML widgets]: discusses widgets that ''render'' HTML into a visual representation. [Web scraping]: [august html editor]: [url-encoding]: [html2text]: ** Description ** For extracting data from HTML, it's generally more robust to parse the HTML page into some document model, perhaps using [tDOM], than to hack at it with regular expressions, and then using [XPath] to find the data. If the task is to 'pull out' some data out of a HTML page, I'm indeed a strong believer in the 'parse the HTML page into a tree and query that tree' approach. For real life problems, I claim that this approach is much simpler and easier to maintain - and for sure, you have to maintain such a thingy, because the layout of HTML pages tend to change frequently - than every regexp approach. Sure, you have to learn another query language - xpath in this case. But if you are really in the web business, there are chances you have to learn xpath anyway. <> Category Package | Tcllib | Category Internet | Category Glossary | Markup Language | HTML