Version 24 of PCRE

Updated 2003-07-23 10:11:58

Perl-Compatible Regular Expressions. A superset of Regular Expressions with a few extra features introduced by Perl, less a couple of features that could not be enforced without Perl itself. Welcomed by many, hated by many others.

http://www.pcre.org/man.txt

Note that, in spite of the name, PCRE do not exist in Perl only. Other programs and/or languages can implement them, like PHP.

Tcl uses ARE.


DKF - Wow. There's more Features from the Black Lagoon in there than you can shake a B-Movie at...


A big Regular Expression fan sees DKF's remark and says: Maybe Tcl uses ARE because Tcl is so merciful. More features for the bold and less features for the queasy.

Another (OK, the same) big Regular Expression fan also says: Regular Expressions are not very easy, granted, but they're also overly mystified, and PCRE take the blame for some extra mystification, even by those who are good friends with Regular Expressions.

The thing is that PCRE allow tricks that are impossible with traditional RE. Some people advocate avoiding PCRE completely and, instead, writing even more complex, long-winded, probably convoluted code to replace them.

Big Regular Expression Fan will never understand these people.


Here is a very quick summary of PCRE's most relevant features. Items marked with + are supported by ARE (thank God).

 + foo(?=bar)            match "foo" only if "bar" follows it
 + foo(?!bar)            match "foo" only if "bar" does NOT follow it
  (?<=foo)bar            match "bar" only if "foo" precedes it
  (?<!foo)bar            match "bar" only if "foo" does NOT precede it

  (?<!in|on|at)foo       match "foo" only if NOT preceded by "in", "on" or "at"
  (?<=\d{3})(?<!999)foo         match "foo" only if preceded by 3 digits other than "999"

 + (?i)abc               case-insensitive match of abc, ABC, aBc, ABc, etc.
 + ab(?i)c               same as above; the (?i) applies throughout the pattern
  (ab(?i)c)              matches abc or abC; the outer parens make the difference!
 + (?m)                         multi-line pattern space: same as "s/FIND/REPL/M"
 + (?s)                         set "." to match newline also: same as "s/FIND/REPL/S"
 + (?x)                         ignore whitespace and #comments;
 + (?:abc)foo                 match "abcfoo", but do not capture 'abc' in \1
  (?:ab|cd)ef                 match "abef" or "cdef"; only 'cd' is captured in \1
 + (?#remark)xy                 match "xy"; remarks after "#" in the parens are ignored.

 (?(condition)yes-pattern)
 (?(condition)yes-pattern|no-pattern)  
 ...matches conditionally, like "if" statements.

 (?R)                    recursive match. OK, this one is really tough.

 \l                make letters capital
 \L                make letters small until \E
 \u                make letters capital
 \U                make letters capital until \E
 \Q                escape all until \E
 \E                end of modifyer's action
 \G                end of previous match

Roy Terry, 22July2003: It seems that Tcl's (?=foo) and (?!foo) are equivalent to the PCRE (?<...) feature. No?

Big Regular Expression Fan, 22July2003: Not exactly.

  • "Look ahead assertions" - supported by ARE and PCRE:

foo(?=bar) will match "foo" only if "bar" follows it. For example:

 http://(?=www\.)

The RE above will only find Web addresses whose sub domain is 'www'.

foo(?!bar) will match "foo" only if "bar" does not follow it. So, conversely,

 http://(?!www\.)

...will only find Web addresses whose sub domain is NOT 'www'.

  • "Look behind assertions" - supported by PCRE only:

f(?<=foo)bar will match "bar" only if "foo" precedes it. For example:

 (?<=http://)ftp\.

The RE above will only find Web addresses whose sub domain is 'ftp', but their protocol actually is 'http'.

(?<!foo)bar will match "bar" only if "foo" does not precede it. Therefore, conversely,

 (?<!http://)ftp\.

...will only find Web addresses whose sub domain is 'ftp', expressely ruling out any one whose protocol is not 'http'.

Of course, in many cases they can be interchanged. You can say http://(?=ftp\ .) instead of (?<=http://)ftp\ .. But that may force you to look for what you really want "backwards", so to speak. I've actually encountered situations in which I could not do without "look behind assertions", but I cannot recall any of them right now. Stay tuned to this page. Breaking news at any moment.

Meanwhile, check out this beautifully formatted page: http://www.slabihoud.de/spampal/pcrepattern.html

DKF notes that look-behind assertions could probably be added to Tcl's RE package without stomping over the theoretical basis for it in any way worse than look-ahead assertions already do. But it would take some work from someone energetic...


[ Category Acronym ]