[L1 ] - NEM This link is dead for me on 2 June, 2005. This [L2 ] article from comp.lang.tcl certainly looks relevant, however.

[Wiki page on e-mail addresses]

[different meanings of "regular expressions"]

[ Perl disease]

[When REs go wrong]

Regular expression examples


05Apr03 Brian Theado - For XML, I'm guessing the title of this page is referring to one-off regular expressions, but see [L3 ] for a paper describing shallow parsing of XML using only a regular expression. The regular expression is about 30 lines long, but the paper documents it well. The Appendix includes sample implementation in Perl, Javascript and Flex/Lex. The Appendix also includes an interactive demo (using the Javascript implementation apparently). The demo helped me understand what they meant by "shallow parsing". For a Tcl translation, see XML Shallow Parsing with Regular Expressions.


Why are regular expressions not suited for parsing email addresses? "Regular expression to validate e-mail addresses" comments on this.

A few more comments appear in "The Limits to Regular Expressions" [L4 ] and "Regular Expressions Do Not Solve All Problems" [L5 ], themselves descendants of Jamie Zawinski's notorious judgment [L6 ] REs multiply, rather than solve, problems.


D. McC: OK, so what can you use instead of REs to solve, rather than multiply, problems?

AM In Tcl you have a number of options, depending on what you really want to do:

  • Searching for individual words - consider [lsearch]
  • Searching for particularly simple patterns - consider [string match]
  • Try coming up with simple REs that solve the matching problem to, say, 80 or 90% and use a second step to get rid of the "false positives"
  • Use a combination of all three
  • If you are trying to match text that spans multiple lines, not uncommon, turn it into one long string first, removing any unnecessary characters (like \ or \n)

That is just a handful of methods. I am sure others can come up with more methods.

DKF: For XML and HTML, use a proper parser to build a DOM tree. For email addresses, do a cheap hack that does the 99.999% of the cases seen in practice. :^)