[http://groups.google.com/groups?th=4bbebbb242ec1e1e] - [NEM] This link is dead for me on 2 June, 2005. This [http://groups-beta.google.com/group/comp.lang.tcl/browse_thread/thread/cf0ae7f9cba4c0df/021f15cfa5b61862?q=regexp+xml&rnum=6#021f15cfa5b61862] article from comp.lang.tcl certainly looks relevant, however. [[Wiki page on e-mail addresses]] [[different meanings of "[regular expressions]"]] [[ [Perl] disease] [[When REs go wrong]] [Regular expression examples] ---- 05Apr03 [Brian Theado] - For XML, I'm guessing the title of this page is referring to one-off regular expressions, but see [http://www.cs.sfu.ca/~cameron/REX.html] for a paper describing shallow parsing of [XML] using only a regular expression. The regular expression is about 30 lines long, but the paper documents it well. The Appendix includes sample implementation in [Perl], Javascript and Flex/Lex. The Appendix also includes an interactive demo (using the [Javascript] implementation apparently). The demo helped me understand what they meant by "shallow parsing". For a [Tcl] translation, see [XML Shallow Parsing with Regular Expressions]. ---- Why are regular expressions not suited for parsing email addresses? "[Regular expression to validate e-mail addresses]" comments on this. A few more comments appear in "The Limits to Regular Expressions" [http://www.unixreview.com/documents/s=2472/uni1037388368795/] and "Regular Expressions Do Not Solve All Problems" [http://informit.com/articles/article.asp?p=102171&redir=1], themselves descendants of Jamie Zawinski's notorious judgment [http://slashdot.org/comments.pl?sid=19607&cid=1871619] REs multiply, rather than solve, problems. ---- [D. McC]: OK, so what can you use instead of REs to solve, rather than multiply, problems? [AM] In Tcl you have a number of options, depending on what you really want to do: * Searching for individual words - consider [[lsearch]] * Searching for particularly simple patterns - consider [[string match]] * Try coming up with simple REs that solve the matching problem to, say, 80 or 90% and use a second step to get rid of the "false positives" * Use a combination of all three * If you are trying to match text that spans multiple lines, not uncommon, turn it into one long string first, removing any unnecessary characters (like \ or \n) That is just a handful of methods. I am sure others can come up with more methods.