HTML, or HyperText Markup Language, is a markup language used on the World-Wide Web.
For extracting data from HTML, it's generally more robust to parse the HTML page into some document model, perhaps using tDOM, than to hack at it with regular expressions, and then using XPath to find the data.
If the task is to 'pull out' some data out of a HTML page, I'm indeed a strong believer in the 'parse the HTML page into a tree and query that tree' approach. For real life problems, I claim that this approach is much simpler and easier to maintain - and for sure, you have to maintain such a thingy, because the layout of HTML pages tend to change frequently - than every regexp approach. Sure, you have to learn another query language - xpath in this case. But if you are really in the web business, there are chances you have to learn xpath anyway.