20040711 [CMcC]: here's a little [HTML]/[XML]/[SGML] attribute parser.  It's iterative, but it uses regexps extensively.

    array set match {
        quote {^([a-zA-Z0-9_-]+)[ \t]*=[ \t]*["]([^"]+)["][ \t]*(.*)$}
    	squote {^([a-zA-Z0-9_-]+)[ \t]*=[ \t]*[']([^']+)['][ \t]*(.*)$}
    	uquote {^([a-zA-Z0-9_-]+)[ \t]*=[ \t]*([^ \t'"]+)[ \t]*(.*)$} 
        }

    proc parseAttr {astring} {
        global match
        array set attr {}
        set astring [string trim $astring]
        if {$astring eq ""} {
    	return {}
        }
    
        while {$astring != ""} {
    	foreach m {quote squote uquote} {
            set org $astring
    	    if {[regexp $match($m) $astring all var val suffix]} {
    		set attr($var) $val
    		set astring [string trimleft $suffix]
    	    }
    	}
	if {$astring == $org} {
	    error "parseAttr: can't parse $astring - not a properly formed attribute string"
        }

        }
        return [array get attr]
    }
----
Since you are considering the dark side of markup parsing, you might also enjoy [XML Shallow Parsing with Regular Expressions]

[LES]: Regex are too often criticized by those who just don't know or like them. "''If only you knew the power of the dark side...''"

[NEM]: [Regular Expressions Are Not A Good Idea for Parsing XML, HTML, or e-mail Addresses]. Regular expressions can be immensely useful -- I use them frequently for pulling apart simple (''regular'') strings. However, there are genuine limits to the power of regexps, and people should be aware of them. Especially for situations (such as parsing XML/HTML) where there exist (several) excellent quality full parsers.

[LES]: I find your lack of faith disturbing. E-mail '''can't''' be parsed with regex. But XML '''can'''. Feel free to ask for help whenever you need it and think it can't be done.
----

[Tcllib] also contains a module, [htmlparse], for parsing HTML code.

----
[[
[Category HTML] |
[Category XML]
]]