Error processing request

Parameters

CONTENT_LENGTH0
REQUEST_METHODGET
REQUEST_URI/revision/Regular+Expression+Examples?V=178
QUERY_STRINGV=178
CONTENT_TYPE
DOCUMENT_URI/revision/Regular+Expression+Examples
DOCUMENT_ROOT/var/www/nikit/nikit/nginx/../docroot
SCGI1
SERVER_PROTOCOLHTTP/1.1
HTTPSon
REMOTE_ADDR172.70.35.11
REMOTE_PORT47944
SERVER_PORT4443
SERVER_NAMEwiki.tcl-lang.org
HTTP_HOSTwiki.tcl-lang.org
HTTP_CONNECTIONKeep-Alive
HTTP_ACCEPT_ENCODINGgzip
HTTP_CF_IPCOUNTRYUS
HTTP_X_FORWARDED_FOR35.172.111.71
HTTP_CF_RAY711cb3f2cac65a76-IAD
HTTP_X_FORWARDED_PROTOhttps
HTTP_CF_VISITOR{"scheme":"https"}
HTTP_USER_AGENTCCBot/2.0 (https://commoncrawl.org/faq/)
HTTP_ACCEPTtext/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
HTTP_ACCEPT_LANGUAGEen-US,en;q=0.5
HTTP_CF_CONNECTING_IP35.172.111.71
HTTP_CDN_LOOPcloudflare

Body


Error

Unknow state transition: LINE -> END

-code

1

-level

0

-errorstack

INNER {returnImm {Unknow state transition: LINE -> END} {}} CALL {my render_wikit {Regular Expression Examples} '''Regular\ Expression\ Examples'''\ is\ a\ list,\ roughly\ sorted\ by\ complexity,\ of\nregular\ expression\ examples.\ \ It\ also\ serves\ as\ both\ a\ library\ of\ useful\nexpressions\ to\ include\ in\ your\ own\ code.\n\nFor\ advanced\ examples,\ see\ \[Advanced\ Regular\ Expression\ Examples\]\ You\ can\ also\nfind\ some\ regular\ expressions\ on\ \[Regular\ Expressions\]\ and\ \[Bag\ of\ algorithms\]\npages.\n\n\n\n**\ See\ Also\ **\n\n\ \ \ \[http://www.regular-expressions.info/examplesprogrammer.html%|%Example\ Regexes\ to\ Match\ Common\ Programming\ Language\ Constructs\]:\ \ \ \n\n\ \ \ \[https://groups.google.com/d/msg/comp.lang.tcl/uHEWT5LuuVg/LNa0PgvlBtQJ%|%Extracting\ numbers\ from\ text\ strings,\ removing\ unwanted\ characters\],\ \[comp.lang.tcl\],\ 2002-06-23:\ \ \ a\ delightful\ explication\ by\ \[Michael\ Cleverly\]\n\n\ \ \ \[re_syntax\]:\ \ \ \n\n\ \ \ \[URI\ detector\ for\ arbitrary\ text\ as\ a\ regular\ expression\]:\ \ \ \n\n\ \ \ \[Arts\ and\ crafts\ of\ Tcl-Tk\ programming\]:\ \ \ \n\n\ \ \ \[Regular\ Expressions\]:\ \ \ \n\n\ \ \ \[Regular\ Expression\ Debugging\ Tips\]:\ \ \ \n\n\ \ \ \[Visual\ Regexp\]:\ \ \ A\ '''terrific'''\ way\ to\ learn\ about\ REs.\n\n\ \ \ \[Redet\]:\ \ \ Another\ tool\ for\ learning\ about\ and\ working\ with\ REs.\n\n\ \ \ \[Regular\ Expression\ Debugging\ Tips\]:\ \ \ More\ tools.\n\n\n\n**\ Simple\ `\[regexp\]`\ Examples\ **\n\n`\[regexp\]`\ has\ syntax:\n\n\ \ \ \ :\ \ \ regexp\ ?switches?\ exp\ string\ ?matchVar?\ ?subMatchVar\ subMatchVar\ ...?\n\nIf\ ''matchVar''\ is\ specified,\ its\ value\ will\ be\ only\ the\ part\ of\ the\n''string''\ that\ was\ matched\ by\ the\ ''exp''.\ \ As\ an\ example:\ \ \ \n\n======\nregexp\ \{c.*g\}\ \"abcdefghi\"\ matched\nputs\ \$matched\ \ \ \ \ \ \ \;#\ ==>\ cdefg\n======\n\nIf\ any\ ''subMatchVar''s\ are\ specified,\ their\ values\ will\ be\ the\ part\nof\ the\ ''string''\ that\ were\ matched\ by\ parenthesized\ bits\ in\ the\ ''exp'',\ counting\ open\ parentheses\ from\ left\ to\ right.\ \ For\ example:\n\n======\nregexp\ \{c((.*)g)(.*)\}\ \"abcdefghi\"\ matched\ sub1\ sub2\ sub3\nputs\ \$matched\ \ \ \ \ \ \ \;#\ ==>\ cdefghi\nputs\ \$sub1\ \ \ \ \ \ \ \ \ \ \;#\ ==>\ defg\nputs\ \$sub2\ \ \ \ \ \ \ \ \ \ \;#\ ==>\ def\nputs\ \$sub3\ \ \ \ \ \ \ \ \ \ \;#\ ==>\ hi\n======\n\nMany\ times,\ people\ only\ care\ about\ the\ ''subMatchVar''s\ and\ want\ to\nignore\ ''matchVar''.\ \ They\ use\ a\ \"dummy\"\ variable\ as\ a\ placeholder\ in\nthe\ command\ for\ the\ ''matchVar''.\ \ You\ will\ often\ see\ things\ like\n\n======\nregexp\ \$exp\ \$string\ ->\ sub1\ sub2\n======\n\nwhere\ `\$\{->\}`\ holds\ the\ matched\ part.\ \ It\ is\ a\ sneaky\ but\ legal\ Tcl\nvariable\ name.\n\n\[PYK\]\ 2015-10-29:\ \ As\ a\ matter\ of\ fact,\ '''every'''\ string\ is\ a\ legal\ Tcl\nvariable\ name.\n\n\n\n**\ Splitting\ a\ String\ Into\ Words\ **\n\n''\"How\ do\ I\ split\ an\ arbitrary\ string\ into\ words?\"''\ is\ a\ frequently\nasked\ question.\ \ If\ you\ use\ `\[split\]\ \$string\ \{\ \}`,\ then\ multiple\ \nspaces\ will\ produce\ a\ list\ with\ empty\ elements.\ \ If\ you\ try\ to\ use\ `\[foreach\]`\ or\ \n`\[lindex\]`\ or\ some\ other\ \[list\]\ operation,\ then\ you\ must\ be\ sure\ that\nthe\ string\ is\ a\ well-formed\ list.\ \ (Braces\ could\ cause\ problems.)\ \ So\ use\ a\ regular\ expression\ like\ this\ very\ simple\ shorthand\ for\ non-space\ characters:\n\n\n======none\n\{\\S+\}\n======\n\nYou\ can\ even\ split\ a\ string\ of\ text\ with\ arbitrary\ spaces\ and\ special\ characters\ into\ a\ list\ of\ words\ by\ using\ the\ '''-inline'''\ and\ '''-all'''\ switches\ to\ `\[regexp\]`:\n\n======\nset\ text\ \"Some\ arbitrary\ text\ which\ might\ include\ \\\$\ or\ \{\"\nset\ wordList\ \[regexp\ -inline\ -all\ --\ \{\\S+\}\ \$text\]\n======\n\n\n\n**\ Split\ into\ Words,\ Respecting\ Acronyms\ **\n\n======\nset\ data\ \{Marvels.Agents.of.S.H.I.E.L.D\}\nregsub\ -all\ \{(\\w\{2,\})\\.\}\ \$data\ \{\\1\ \}\n======\n\n\ \ \ \ :\ \ \ from\ \[Tcl\ Chatroom\],\ 2013-10-09\n\n\n\n**\ Floating\ Point\ Number\ **\n\nThis\ expression\ includes\ options\ for\ leading\ +/-\ character,\ digits,\ decimal\ points,\ and\ a\ trailing\ exponent.\ \ Note\ the\ use\ of\ nearly\ duplicate\ expressions\ joined\ with\ the\ ''or''\ operator\ `|`\ to\ permit\ the\ decimal\ point\ to\ lead\ or\ follow\ digits.\ \n\n======none\n^\[-+\]?\[0-9\]*\\.?\[0-9\]+(\[eE\]\[-+\]?\[0-9\]+)?\$\n======\n\nExpression\ to\ find\ if\ a\ string\ have\ any\ substring\ maching\ a\ floating\ point\ number\ (\ This\ was\ posted\ to\ comp.lang.tcl\ by\ Roland\ B.\ Roberts.):\n\n======none\n\[-+\]?(\[0-9\]+\\.?\[0-9\]*|\\.\[0-9\]+)(\[eE\]\[-+\]?\[0-9\]+)?\n======\n\n\nMore\ information\ (http://www.regular-expressions.info/floatingpoint.html)\n\n\n\n**\ Letters\ **\n\nThanks\ to\ Brent\ Welch\ for\ these\ examples,\ showing\ the\ difference\ between\ a\ traditional\ character\ matching\ and\ \"the\ Unicode\ way.\"\n\nOnly\ letters:\n\n======none\n^\[A-Za-z\]+\$\n======\n\nOnly\ letters,\ the\ Unicode\ way:\n\n======none\n^\[\[:alpha:\]\]+\$\n======\n\n\n\n**\ Special\ Characters\ **\n\nThanks\ again\ to\ Brent\ Welch\ for\ these\ two\ examples.\n\nThe\ set\ of\ Tcl\ special\ characters:\ `\]\]\ \[\[\ \$\ \{\ \}\ \\`:\n\n======\n\[\]\[\$\{\}\\\\\]\n======\n\nThe\ set\ of\ regular\ expression\ special\ characters:\ `'\]\]\ \[\[\ \$\ ^\ ?\ +\ *\ (\ )\ |\ \\'`\n\n\n%|CHARACTER|DESCRIPTION|%\n&|*|The\ sub-pattern\ before\ '*'\ can\ occur\ zero\ or\ more\ times|&\n&|+|The\ sub-pattern\ before\ '+'\ can\ occur\ one\ or\ more\ times|&\n&|?|The\ sub-pattern\ before\ '?'\ can\ only\ occur\ zero\ or\ one\ time|&\n&|<<pipe>>|(Alteration)\ Matches\ any\ one\ sub-pattern\ separated\ by\ '<<pipe>>'s.\ \ Similar\ to\ logical\ 'OR'.\ |&\n&|()|Groups\ a\ pattern|&\n&|\[\[\]\]|Defines\ a\ set\ of\ characters,\ or\ range\ of\ characters\ \[\[a-z,A-Z,0-9\]\]|&\n\n\n======\n\[\]\[\$^?+*()|\\\\\]\n======\n\ \n\nI\ don't\ understand\ these\ examples.\ \ Why\ have\ `\[\[`,\ `\]\]`,\ and\ then\ the\ rest\nof\ the\ characters\ inside\ a\ `\[\[\]\]`\ -\ that\ just\ makes\ the\ string\ have\ `\[\[`\ and\ `\]\]`\nthere\ twice,\ right?\n\n\[LV\]:\ the\ first\ regular\ expression\ should\ be\ seen\ like\ this:\n\n\ \ \ `\{\ ...\ \}`:\ \ \ Protect\ the\ 9\ inner\ characters.\n\n\ \ \ `\[\[\ ...\ \]\]`:\ \ \ Define\ a\ set\ of\ characters\ to\ process.\n\n\ \ \ `\]\]`:\ \ \ If\ your\ set\ of\ characters\ is\ going\ to\ include\ the\ right\ bracket\ character\ `\]\]`\ as\ a\ specific\ matching\ character,\ then\ it\ needs\ to\ be\ first\ in\ the\ set/class\ definition.\n\n\ \ \ `\[\[\$\{\}`:\ \ \ More\ individual\ characters.\n\n\ \ \ `\\\\`:\ \ \ Doubled\ because\ when\ `regexp`\ goes\ to\ evaluate\ the\ characters,\ it\ would\ otherwise\ treat\ a\ single\ backslash\ `\\`\ as\ a\ request\ to\ quote\ the\ next\ character,\ the\ ending\ right\ bracket\ of\ the\ set/class.\n\nThe\ second\ regular\ expression\ is\ interpreted\ in\ a\ similar\ fashion.\ \ There\ are\ more\ characters\ because\ there\ are\ more\ metacharacters.\n\nAlso,\ not\ all\ characters\ are\ there\ -\ where\ are\ the\ period,\ equals,\ bang\ (exclamation\ sign),\ dash,\ colon,\ alphas\ that\ are\ a\ part\ of\ character\ entry\ escapes\ or\ classes,\ 0,\ hash/pound\ sign,\ and\ angle\ brackets\ (<\ and\ >)?\ \ These\ special\ characters\ all\ have\ meta\ meanings\ within\ regular\ expressions...\n\n\[LV\]:\ Apparently\ no\ one\ has\ come\ along\ and\ updated\ the\ above\ expression\ to\ cover\ these.\n\nExample\ posted\ by\ KC:\n\nA\ set\ containing\ both\ angle\ brackets:\n\n======none\n\[\\<\\>\]\n======\n\n\n***\ newline/carriage\ return\ ***\n\nCould\ someone\ replace\ this\ line\ with\ some\ verbiage\ regarding\ the\ way\ one\ uses\nregular\ expressions\ for\ specific\ newline-carriage\ return\ handling\ (as\ opposed\nto\ the\ use\ of\ the\ `\$`\ metacharacter)?\n\n\[Janos\ Holanyi\]:\ I\ would\ really\ need\ to\ build\ up\ a\ re\ that\ would\ match\ one\ line\nand\ only\ one\ line\ -\ that\ is,\ excluding\ carriage-return-newline's\ (\\r\\n)\ from\nmatching...\ How\ would\ such\ a\ re\ look\ like?\n\n----\n\n\[LV\]:\ how\ about\ something\ like\ this?\n\n======none\n%\ set\ a\ \"abc\nev\"\n#\ a\ now\ has\ two\ lines\ in\ it\n%\ regexp\ -line\ --\ \{(.*)\}\ \$a\ b\ c\ d\n1\n%\ puts\ \$b\nabc\n%\ puts\ \$c\nabc\n======\n\nIf\ you\ want\ to\ keep\ carriage\ returns\ or\ newlines\ by\ themselves,\ but\ not\ when\nthey\ are\ together,\ you\ need\ something\ like:\n\n======\nregexp\ --\ \ \{^(\[^\\r\]|\\r(?!\\n))*\}\ \ \$a\ b\ c\ d\n======\n\nThis\ allows\ plain\ carriage\ return\ or\ plain\ newline.\n\nThanks\ to\ \[bbh\]\ and\ \[Donal\ Fellows\]\ for\ this\ regular\ expression.\n\n\n\n**\ Back\ References\ **\n\nFrom\ \[comp.lang.tcl\]:\n\nI\ did\ some\ experimenting\ with\ other\ strings,\ like\ \"just\ a\nHHHHEEEEAAAADDDDEEEERRRR\".\ The\ regular\ expression\ `(.)\\1\\1\\1`\ does\ the\ job\ I\nwould\ have\ wanted,\ whereas\ `(.)\{4\}`\ will\ return\ the\ last\ of\ each\ four\ncharacters\ -\ as\ posted\ as\ well.\n\nThat\ surprised\ me\ too\ --\ being\ able\ to\ place\ backreferences\ within\ the\ regex\ is\nan\ extremely\ powerful\ technique.\n\n======\nregsub\ -all\ \{(.)\\1\{3\}\}\ \$string\ \{\\1\}\ result\n======\n\nfor\ exactly\ 4\ char\ repeats,\ and\ `(.)\\1+`\ for\ arbitrary\ repeats.\n\n\n\n**\ IP\ Numbers\ **\n\nYou\ can\ create\ a\ regular\ expression\ to\ check\ an\ IP\ address\ for\ \ncorrect\ syntax.\ \ Note\ that\ this\ regular\ expression\ only\ checks\nfor\ groups\ of\ 1-3\ digits\ separated\ by\ periods.\ \ If\ you\ want\nto\ ensure\ that\ the\ digit\ groups\ are\ from\ 0-255,\ or\ that\ you\ \nhave\ a\ valid\ IP\ address,\ you'll\ have\ to\ do\ additional\n(non\ regexp)\ work.\ \ This\ code\ posted\ to\ comp.lang.tcl\ by\n\[George\ Peter\ Staplin\]\n\ \n======\nset\ str\ 66.70.7.154\n\nregexp\ \"(\\\[0-9\]\{1,3\})\\.(\\\[0-9\]\{1,3\})\\.(\\\[0-9\]\{1,3\})\\.(\\\[0-9\]\{1,3\})\"\ \$str\ all\ first\ second\ third\ fourth\n\nputs\ \"\$all\ \\n\ \$first\ \\n\ \$second\ \\n\ \$third\ \\n\ \$fourth\ \\n\"\n======\n\nThe\ above\ regular\ expression\ matches\ any\ string\ where\ there\ are\ four\ groups\ of\ 1-3\ digits\ separated\ by\ periods.\ \ Since\ it's\ not\ anchored\ to\ the\ start\ and\ end\ of\ the\ string\ (with\ ^\ and\ \$)\ it\ will\ match\ any\ string\ that\ contains\ four\ groups\ of\ 1-3\ digits\ separated\ by\ periods,\ such\ as:\ \"66.70.7.154.9\".\n\nIf\ you\ don't\ mind\ a\ longer\ regexp,\ there\ is\ no\ reason\ you\ can't\ ensure\ that\ each\ group\ of\ 1-3\ digits\ is\ in\ the\ range\ of\ 0-255.\ \ For\ example\ (broken\ up\ a\ bit\ to\ make\ it\ more\ readable):\n\n======\nset\ octet\ \{(\\d|\[1-9\]\\d|1\\d\\d|2\[0-4\]\\d|25\[0-5\])\}\nset\ RE\ \"^\[join\ \[list\ \$octet\ \$octet\ \$octet\ \$octet\]\ \{\\.\}\]\\\$\"\nregexp\ \$RE\ \$str\ all\ first\ second\ third\ fourth\ \;#\ Michael\ A.\ Cleverly\n======\n\nrecently\ on\ comp.lang.tcl,\ someone\ mentioned\ that\ \nhttp://www.oreilly.com/catalog/regex/chapter/ch04.html#Be_Specific\ntalks\ about\ matching\ IP\ addresses.\n\n'''Gururajesh:'''\ A\ Perfect\ regular\ expression\ to\ validate\ ip\ address\ with\ a\ single\ expression.\n\n======\nif\ \{\[regexp\ \{(^\[2\]\[5\]\[0-5\].|^\[2\]\[0-4\]\[0-9\].|^\[1\]\[0-9\]\[0-9\].|^\[0-9\]\[0-9\].|^\[0-9\].)(\[2\]\[0-5\]\[0-5\].|\[2\]\[0-4\]\[0-9\].|\[1\]\[0-9\]\[0-9\].|\[0-9\]\[0-9\].|\[0-9\].)(\[2\]\[0-5\]\[0-5\].|\[2\]\[0-4\]\[0-9\].|\[1\]\[0-9\]\[0-9\].|\[0-9\]\[0-9\].|\[0-9\].)(\[2\]\[0-5\]\[0-5\]|\[2\]\[0-4\]\[0-9\]|\[1\]\[0-9\]\[0-9\]|\[0-9\]\[0-9\]|\[0-9\])\$\}\ \$string\ match\ v1\ v2\ v3\ v4\]\}\ \{puts\ \"\$v1\$v2\$v3\$v4\"\}\ else\ \{puts\ \"none\"\}\n======\n\nFor\ `245.254.253.2`,\ output\ is\ `245.254.253.2`\n\nFor\ `265.254.243.2`,\ output\ is\ `none`,\ As\ ip-address\ can`t\ have\ a\ number\ greater\ than\ 255.\n\n\[Lars\ H\]:\ Perfect?\ No,\ it\ looks\ like\ it\ would\ accept\ `99a99b99c99`,\ since\ `.`\ will\ match\ any\ character.\ Also,\ it\ can\ be\ shortened\ significantly\ by\ making\ use\ of\ `\{4\}`\ and\ the\ like\ (see\ \[Regular\ expressions\]).\n\nBetter\ is\n\n======\nif\ \{\[regexp\ \{^(((\[2\]\[5\]\[0-5\]|(\[2\]\[0-4\]|\[1\]\[0-9\]|\[0-9\])?\[0-9\])\\.)\{3\})(\[2\]\[5\]\[0-5\]|(\[2\]\[0-4\]|\[1\]\[0-9\]|\[0-9\])?\[0-9\])\$\}\ \$IP\ \$string\ match\ v1\ v2\ v3\ v4\]\}\ \{puts\ \"\$v1\$v2\$v3\$v4\"\}\ else\ \{puts\ \"none\"\}\n======\n\nTcllib\ should\ be\ useful\n\n\ \ \ *\ http://docs.activestate.com/activetcl/8.5/tcllib/dns/tcllib_ip.html\n\ \ \ *\ http://tcllib.sourceforge.net/doc/tcllib_ip.html\n\n\[freethomas\]:\ I\ thinks\ this\ regexp\ is\ much\ simple\ and\ easier\ for\ IP\ number\n\n======\nset\ str\ \"66.70.7.154\"\nregexp\ \{(\\d+)(\\D)(\\d+)(\\D)(\\d+)(\\D)(\\d+)\}\ \$str\ match\n======\n\n\[AMG\]:\ This\ expression\ allows\ any\ character\ to\ separate\ the\ octets,\ not\ just\ period.\ \ I\ sincerely\ doubt\ this\ is\ what\ you\ want.\ \ Use\ `\\.`\ instead\ of\ `\\D`.\ \ Also\ it's\ not\ anchored\ with\ `^`\ and\ `\$`,\ so\ it\ works\ on\ substrings\ rather\ than\ requiring\ that\ the\ whole\ string\ match.\ \ Though\ maybe\ this\ is\ what\ you\ want\ since\ you\ explicitly\ capture\ the\ matching\ substring.\n\nI\ already\ fixed\ the\ syntax\ issue\ of\ saying\ `\{`\ at\ the\ beginning\ but\ leaving\ out\ the\ closing\ `\}`,\ also\ of\ leaving\ out\ the\ first\ `(`.\n\nI\ see\ no\ reason\ to\ use\ `(`\ and\ `)`\ grouping.\ \ You\ don't\ give\ variables\ into\ which\ the\ subexpressions\ would\ be\ captured,\ and\ it's\ pointless\ to\ capture\ the\ dots\ between\ the\ octets.\ \ (See\ what\ I\ did\ there?)\ \ Try\ this:\n\n======\nregexp\ \{^\\d+\\.\\d+\\.\\d+\\.\\d+\$\}\ \$str\n======\n\n\[AMG\]:\ Here's\ a\ very\ similar\ script\ (to\ \[Lars\ H\]'s\ contribution)\ that\ uses\ `\[scan\]`\ instead\ of\ `\[regexp\]`.\ \ It's\ much\ more\ readable,\ in\ my\ opinion.\n\n======\nif\ \{\[scan\ \$string\ %d.%d.%d.%d\ a\ b\ c\ d\]\ ==\ 4\n\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nThere\ are\ a\ few\ differences.\ \ One,\ the\ trailing\ dot\ is\ omitted\ from\ the\ first\ three\ output\ variables\ (which\ I\ call\ `a`,\ `b`,\ `c`,\ `d`\ instead\ of\ `v1`,\ `v2`,\ `v3`,\ `v4`).\ \ Two,\ leading\ zeroes\ are\ permitted\ and\ discarded.\ \ Three,\ `-0`\ is\ accepted\ as\ `0`.\ \ Four,\ garbage\ at\ the\ end\ of\ \$string\ is\ silently\ discarded.\ \ Five,\ each\ octet\ can\ have\ a\ leading\ `+`,\ e.g.\ `+255.+255.+255.+255`.\ \ Six,\ it's\ ''OVER\ FIVE\ TIMES\ FASTER!''\ \ On\ this\ machine,\ my\ version\ using\ `\[scan\]`\ takes\ 15\ microseconds,\ whereas\ your\ version\ using\ `\[regexp\]`\ takes\ 78\ microseconds.\ \ Use\ `\[time\]`\ to\ measure\ performance.\ \ (I\ replaced\ `\[puts\]`\ with\ `\[return\]`\ when\ testing.)\n\nNow,\ here's\ a\ hybrid\ version\ that\ uses\ regexp.\n\n======\nif\ \{\[regexp\ \{^(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)\$\}\ \$string\ _\ a\ b\ c\ d\]\n\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nThis\ version\ takes\ 46\ microseconds\ to\ execute.\ \ It\ doesn't\ accept\ leading\ `+`\ or\ `-`.\ \ It\ rejects\ garbage\ at\ the\ end\ of\ the\ string.\ \ It\ treats\ the\ octets\ as\ octal\ if\ they\ are\ given\ leading\ zeroes,\ and\ invalid\ octal\ is\ always\ accepted.\ \ The\ reason\ for\ this\ last\ is\ because\ `\[if\]`\ treats\ strings\ containing\ invalid\ octal\ as\ nonnumeric\ text,\ so\ the\ \[<=\]\ operator\ is\ used\ to\ sort\ text\ rather\ than\ compare\ numbers.\ \ Corrected\ version:\n\n======\nif\ \{\[regexp\ \{^(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)\$\}\ \$string\ _\ a\ b\ c\ d\]\n\ &&\ \[string\ is\ integer\ \$a\]\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$b\]\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$c\]\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$d\]\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nThis\ version\ takes\ 47\ microseconds\ and\ it\ rejects\ invalid\ octal.\ \ However,\ it\ still\ interprets\ numbers\ as\ octal\ if\ leading\ zeroes\ are\ given,\ so\ `0377.255.255.255`\ is\ accepted\ (but\ `0400.255.255.255`\ is\ rejected).\ \ To\ fix\ this,\ it\ would\ be\ necessary\ to\ make\ a\ pattern\ that\ rejects\ leading\ zeroes\ unless\ the\ octet\ is\ exactly\ zero,\ something\ like:\ `(0|\[\[^1-9\]\]\\d*)`.\ \ But\ this\ is\ getting\ clumsy\ and\ slow\;\ I\ prefer\ the\ `\[scan\]`\ solution.\ \ `\[regexp\]`:\ not\ always\ the\ right\ tool!\n\nGururajesh:\n\n======\nset\ string\ \"0377.255.255.255\"\nif\ \{\[regexp\ \{^(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)\$\}\ \$string\ _\ a\ b\ c\ d\]\n\ &&\ \[string\ is\ integer\ \$a\]\ &&\ \[scan\ \$a\ %d\ v1\]\ &&\ 0\ <=\ \$v1\ &&\ \$v1\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$b\]\ &&\ \[scan\ \$b\ %d\ v2\]\ &&\ 0\ <=\ \$v2\ &&\ \$v2\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$c\]\ &&\ \[scan\ \$c\ %d\ v3\]\ &&\ 0\ <=\ \$v3\ &&\ \$v3\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$d\]\ &&\ \[scan\ \$d\ %d\ v4\]\ &&\ 0\ <=\ \$v4\ &&\ \$v4\ <=\ 255\}\ \{puts\ \$v1.\$v2.\$v3.\$v4\}\ else\ \{puts\ none\}\n======\n\nThis\ will\ be\ ok...\ for\ above\ mentioned\ issue.\n\n\[AMG\]:\ Why\ call\ `\[scan\]`\ four\ times?\ \ A\ single\ invocation\ can\ do\ the\ job:\n\n======\nset\ string\ \"0377.255.255.255\"\nif\ \{\[regexp\ \{^\\d+\\.\\d+\\.\\d+\\.\\d+\$\}\ \$string\]\n\ &&\ \[scan\ \$string\ %d.%d.%d.%d\ a\ b\ c\ d\]\ ==\ 4\n\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nI\ don't\ see\ any\ drawbacks\ to\ this\ approach.\ \ The\ regular\ expression\ is\ simple\ and\ is\ used\ only\ to\ reject\ `+`\ and\ `-`\ signs\ and\ garbage\ at\ the\ end,\ `\[scan\]`\ does\ the\ job\ of\ splitting\ and\ converting\ to\ integers,\ and\ math\ expressions\ check\ ranges.\ \ Three\ tools,\ each\ doing\ what\ they're\ designed\ for.\n\n\[CJB\]:\ Here\ is\ a\ pure\ `\[regexp\]`\ version\ with\ comparable\ performance.\ \ It\ matches\ any\ valid\ ip,\ rejecting\ octals.\ \ However\ it\ does\ not\ split\ the\ integers\ and\ is\ therefore\ only\ useful\ for\ validation.\ \ The\ timings\ on\ my\ computer\ were\ about\ 22\ microseconds\ for\ this\ version\ compared\ to\ 28\ microseconds\ for\ the\ regexp/scan\ combo\ (I\ removed\ the\ \[puts\]\ statements\ for\ the\ comparison\ because\ they\ are\ slow\ and\ tend\ to\ vary).\n\nNote\ that\ the\ pure\ `\[scan\]`\ version\ is\ still\ fastest\ (about\ 20\ microseconds),\ splits,\ and\ has\ the\ same\ rejections\ (`%d`\ stores\ integers\ and\ ignores\ extra\ leading\ `0`\ characters).\n\n======\nset\ string\ 123.255.189.255\ \nregexp\ \{^(?:(?:\[2\]\[5\]\[0-5\]|\[1\]?\[1-9\]\{1,2\}|0)(?:\\.|\$))\{4\}\}\ \$string\ match\n======\n\n\[fh\]\ 2012-02-13\ 11:54:30:\n\nTo\ search\ IP\ ADDRESS\ using\ Regular\ Expression\n\n======\nset\ \ IP\ \"The\ Interface\ IP\ Address\ is\ 198.176.17.16\ \"\nregexp\ \{(25\[0-5\]|2\[0-9\]\[0-9\]|\[0-9\]?\[0-9\]\[0-9\]?)\\.\{3\}(25\[0-5\]|2\[0-9\]\[0-9\]|\[0-1\]?\[0-9\]\[0-9\]?)\ \$IP\ match\n======\n\n**\ Domain\ names\ **\n\n(First\ shot)\n\n======none\n^\[a-zA-Z\](\[a-zA-Z0-9-\]\{0,61\}\[a-zA-Z0-9\])?\\.\[a-zA-Z\](\[a-zA-Z0-9-\]\{0,61\}\[a-zA-Z0-9\])?(\\.\[a-zA-Z\](\[a-zA-Z0-9-\]\{0,61\}\[a-zA-Z0-9\])?)?\$\n======\n\nThis\ code\ does\ NOT\ attempt,\ obviously,\ to\ ensure\ that\ the\ last\ level\ of\ the\ regular\ expression\ matches\ a\ known\ domain...\n\n\n\n**\ Regular\ Expression\ for\ parsing\ http\ string\ **\n\n======\nregexp\ \{(\[\[^:\]\]+)://(\[\[^:/\]\]+)(:(\[\[0-9\]\]+))\}\ \[ns_conn\ location\]\ match\ protocol\ server\ x\ port\n======\n\nthe\ above\ author\ should\ remember\ this\ is\ a\ Tcl\ wiki,\ and\ not\ an\ \[aolserver\]\none,\ but\ thanks\ for\ the\ submission\ \;)\n\n\[PYK\]\ 2016-02-28:\ In\ the\ previous\ edit,\ a\ `-`\ character\ was\ added\ to\ the\ regular\ expression,\ prohibiting\ the\ occurrence\ of\ `-`\ in\ ''scheme''\ component\ of\ a\ URL.\ \ As\ far\ as\ I\ can\ tell,\ `-`\ is\ allowed\ in\ the\ ''scheme''\ component,\ so\ I've\ reverted\ that\ change\ in\ the\ expression\ above.\n\n\n**\ E-mail\ addresses\ **\ \n\nRS:\ No\ warranty,\ just\ a\ first\ shot:\n\n======none\n^\[A-Za-z0-9._-\][email protected]\[\[A-Za-z0-9.-\]+\$\n======\n\nUnderstand\ that\ this\ expression\ is\ an\ attempt\ to\ see\ if\ a\ string\ has\ a\ format\ that\ is\ compatible\ with\ ''normal''\ RFC\ SMTP\ email\ address\ formats.\ \ It\ does\ not\ attempt\ to\ see\ whether\ the\ email\ address\ is\ correct.\ \ Also,\ it\ does\ not\ account\ for\ comments\ embedded\ within\ email\ addresses,\ which\ are\ defined\ even\ though\ seldom\ used.\n\n\[bll\]\ 2017-6-30\ E-mail\ addresses\ are\ quite\ complicated.\ \ You\ must\ be\ careful\ not\ to\ reject\ valid\ e-mail\ addresses.\ \ For\ example,\ `%`\ and\ `+`\ characters\ are\ valid.\nNobody\ uses\ the\ `%`\ sign\ any\ more\ as\ it\ is\ not\ secure.\ \nThe\ `+`\ character\ is\ very\ useful,\ but\ unfortunately,\ there\ are\ a\ lot\ of\ incorrect\ e-mail\ validation\ routines\ that\ reject\ it.\n\nThe\ following\ pattern\ will\ still\ reject\ an\ e-mail\ of\ the\ form\ [email protected]\[\[ip-address\]\].\nNo\ lengths\ are\ checked.\ \ It\ does\ not\ check\ that\ the\ top-level\ domain\ (e.g.\ .org,\ .com,\ .solutions)\ is\ valid.\n\n======\nset\ ::emailpat\ \{\n^\n(\ \ #\ local-part\n\ \ (?:\n\ \ \ \ (?:\n\ \ \ \ \ \ (?:\[^\"().,:\;\\\[\\\]\\s\\\\@\]+)\ \ \ #\ one\ or\ more\ non-special\ characters\ (not\ dot)\n\ \ \ \ \ \ |\n\ \ \ \ \ \ (?:\n\ \ \ \ \ \ \ \ \"\ \ #\ begin\ quoted\ string\n\ \ \ \ \ \ \ \ (?:\n\ \ \ \ \ \ \ \ \ \[^\\\\\"\]\ \ #\ any\ character\ other\ than\ backslash\ or\ double\ quote\n\ \ \ \ \ \ \ \ \ |\n\ \ \ \ \ \ \ \ \ (?:\\\\.)\ #\ or\ a\ backslash\ followed\ by\ another\ character\n\ \ \ \ \ \ \ \ )+\ \ \ #\ repeated\ one\ or\ more\ times\n\ \ \ \ \ \ \ \ \"\ \ #\ end\ quote\n\ \ \ \ \ \ )\n\ \ \ \ )\n\ \ \ \ \\.\ \ \ #\ followed\ by\ a\ dot\n\ \ )*\ \ \ \ #\ local\ portion\ with\ trailing\ dot\ repeated\ zero\ or\ more\ times.\n\ \ (?:\[^\"().,:\;\\\[\\\]\\s\\\\@\]+)|(?:\"(?:\[^\\\\\"\]|(?:\\\\.))+\")\ \ #\ as\ above,\ the\ final\ portion\ may\ not\ contain\ a\ trailing\ dot\n)\[email protected]\n(\ \ #\ domain-name,\ underscores\ are\ not\ allowed\n\ \ (?:(?:\[A-Za-z0-9\]\[A-Za-z0-9-\]*)?\[A-Za-z0-9\]\\.)+\ #\ one\ or\ more\ domain\ specifiers\ followed\ by\ a\ dot\n\ \ (?:\[A-Za-z0-9\]\[A-Za-z0-9-\]*)?\[A-Za-z0-9\]\ \ \ \ \ #\ top-level\ domain\n\ \ \\.?\ \ \ \ \ \ \ \ \ \ \ #\ may\ be\ fully-qualified\n)\n\$\n\}\n\nproc\ testit\ \{\ valid\ testaddr\ \}\ \{\n\ \ set\ rc\ NG\n\ \ if\ \{\ \[regexp\ -expanded\ \$::emailpat\ \$testaddr\ emailaddr\ local\ domain\]\ \}\ \{\n\ \ \ \ set\ rc\ OK\n\ \ \}\n\ \ if\ \{\ \$rc\ ne\ \$valid\ \}\ \{\n\ \ \ \ puts\ \"Fail:\ (\$valid)\ \$testaddr\"\n\ \ \}\ elseif\ \{\ \$valid\ eq\ \"OK\"\ \}\ \{\n\ \ \ \ puts\ \"ok:\ \$testaddr\ \$local\ \$domain\"\n\ \ \}\n\}\n\n#\ valid\ e-mails\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{internal-quote.\"*()\"[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{\"[email protected]\"@example.com\}\ntestit\ OK\ \{\"very.(),:\;<>\[\]\\\".VERY.\\\"[email protected]\\\\\ \\\"very\\\".unusual\"@strange.example.com\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{#!\$%&'*+-/=?^_`\{\}|[email protected]\}\ntestit\ OK\ \{\"()<>\[\]:,\;@\\\\\\\"!#\$%&'-/=?^_`\{\}|\ ~.a\"@example.org\}\ntestit\ OK\ \{\"\ \"@example.org\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\n#\ invalid\ tests\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{Abc.example.com\}\ntestit\ NG\ \{[email protected]@[email protected]\}\ntestit\ NG\ \{a\"b(c)d,e:f\;g<h>i\[j\\k\][email protected]\}\ntestit\ NG\ \{just\"not\"[email protected]\}\ntestit\ NG\ \{this\ is\"not\\[email protected]\}\ntestit\ NG\ \{this\\\ still\\\"not\\\\[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]_dom.com\}\n======\n\nReference:\ https://en.wikipedia.org/wiki/Email_address#Examples\n\n\n\n\n\n**\ XML-like\ data\ **\n\nTo\ match\ something\ similar\ to\ XML-tags\ you\ can\ use\ regular-expressions,\ too.\nLet's\ assume\ we\ have\ this\ text:\n\n======none\n%\ set\ text\ \{<bo>s</bo><it><bo>M</bo></it>\}\n======\n\nWe\ can\ match\ the\ body\ of\ '''bo'''\ with\ this\ regexp:\n\n======none\n%\ regexp\ \"<(bo)>(.*?)</bo>\"\ \$text\ dummy\ tag\ body\n======\n\nNow\ we\ extend\ our\ XML-text\ with\ some\ attributes\ for\ the\ tags,\ say:\n\n======\nset\ text2\ \{<bo\ h=\"m\">s</bo><it><bo>M</bo></it>\}\n======\n\nIf\ we\ try\ to\ match\ this\ with:\n\n======none\nregexp\ \"<(bo)\\\\s+(.+?)>(.*?)</bo>\"\ \$text2\ dummy\ tag\ attributes\ body\n======\n\nit\ ''won't\ work''\ anymore.\ \nThis\ is\ because\ `\\\\s+`\ is\ greedy\n(in\ contrary\ to\ the\ non-greedy\ `(.+?)`\nand\ `(.*?)`)\ and\ that\ (the\ one\ greedy-operator)\nmakes\ the\ whole\ expression\ greedy.\n\nSee\ \[Henry\ Spencer\]'s\ reply\ in\n\[http://groups.google.com/d/msg/comp.lang.tcl/FddeFPbTFw8/asoMuv7dWqIJ%|%tcl\n8.2\ regexp\ not\ doing\ non-greedy\ matching\ correctly\],\ \[comp.lang.tcl\],\ 1999-09-20.\n\nThe\ ''correct''\ way\ is:\n\n======\nregexp\ \"<(bo)\\\\s+?(.+?)>(.*?)</bo>\"\ \$text2\ dummy\ tag\ attributes\ body\n======\n\nNow\ we\ can\ write\ a\ more\ general\ XML-to-whatever-translater\ like\ this:\n\ \ \ 1.\ Substitute\ `\[\[`\ and\ `\]\]`\ with\ their\ corresponding\ `\\\[\[`\ and\ `\\\]\]`\ to\ avoid\ confusion\ with\ `\[subst\]`\ in\ 3.\n\ \ \ 2.\ Substitute\ the\ tags\ and\ attributes\ with\ commands\n\ \ \ 3.\ Do\ a\ `\[subst\]`\ on\ the\ whole\ text,\ thereby\ calling\ the\ inserted\ commands\n\n======\nproc\ xml2whatever\ \{text\ userCallback\}\ \{\n\ \ \ \ set\ text\ \[string\ map\ \{\[\ \\\\\[\ \]\ \\\\\]\}\ \$text\]\n\ \ \ \ #\ replace\ all\ tags\ with\ a\ call\ to\ userCallback\n\ \ \ \ #\ this\ has\ to\ be\ done\ multiple\ times,\ because\ of\ nested\ tags\n\ \ \ \ #\ match\ each\ tag\ (everything\ not\ space\ after\ <)\n\ \ \ \ #\ and\ all\ the\ attributes\ (everything\ behind\ the\ tag\ until\ >)\n\ \ \ \ #\ then\ match\ body\ and\ the\ end-tag\ (which\ should\ be\ the\ same\ as\ the\n\ \ \ \ #\ first\ matched\ one\ (\\1))\n\ \ \ \ while\ \{\[regsub\ -all\ \{<(\\S+?)(\\s+\[^\\s>\].*?)?\\s*?>(.*?)</\\1>\}\ \$text\ \"\\\[\[list\ \$userCallback\ \\\\1\ \\\\2\ \\\\3\]\\\]\"\ text\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ puts\ doop:\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ puts\ \$text\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ puts\ \{\}\n\ \ \ \ \ \ \ \ #\ do\ nothing\n\ \ \ \ \}\n\ \ \ \ return\ \[subst\ -novariables\ -nobackslashes\ \$text\]\n\}\n\n#\ is\ called\ from\ xml2whatever\ with\n#\ element:\ the\ xml-element\n#\ attributes:\ the\ attributes\ of\ xml-element\n#\ body:\ body\ of\ xml-element\nproc\ myTranslate\ \{element\ attributes\ body\}\ \{\n\ \ \ \ #\ Remove\ the\ bracket\ armour\ added\ by\ xml2whatever\n\ \ \ \ set\ element\ \[string\ map\ \{\\\\\[\ \[\ \\\\\]\ \]\}\ \$element\]\n\ \ \ \ set\ attributes\ \[string\ map\ \{\\\\\[\ \[\ \\\\\]\ \]\}\ \$attributes\]\n\ \ \ \ set\ body\ \[string\ map\ \{\\\\\[\ \[\ \\\\\]\ \]\}\ \$body\]\n\ \ \n\ \ \ \ #\ map\ bo\ -\ b\;\ it\ -\ i\ (leave\ rest\ alone)\n\ \ \ \ #\ do\ a\ subst\ for\ the\ body,\ because\ of\ possible\ nested\ tags\n\ \ \ \ switch\ --\ \$element\ \{\n\ \ \ \ \ \ \ \ bo\ \{\ return\ \"<b>\[subst\ -novariables\ -nobackslashes\ \$body\]</b>\"\}\n\ \ \ \ \ \ \ \ it\ \{\ return\ \"<i>\[subst\ -novariables\ -nobackslashes\ \$body\]</i>\"\}\n\ \ \ \ \ \ \ \ default\ \{\ return\ \"<\$element\$attributes>\[subst\ -novariables\ -nobackslashes\ \$body\]</\$element>\"\ \}\n\ \ \ \ \}\n\}\n======\n\nCall\ the\ parser\ with:\n\n======\nxml2whatever\ \$text2\ myTranslate\n======\n\nYou\ have\ to\ be\ careful,\ though.\ Don't\ do\ this\ for\ large\ texts\ or\ texts\nwith\ many\ nested\ xml-tags\ because\ the\ regular-expression-machine\ is\ not\ the\nthe\ right\ tool\ to\ parse\ large,nested\ files\ efficiently.\ (\[Stefan\ Vogel\])\n\n\[DKF\]:\ I\ agree\ with\ that\ last\ point.\ \ If\ you\ are\ really\ dealing\ with\ \[XML\],\ it\ is\ better\ to\ use\ a\ proper\ tool\ like\ \[TclDOM\]\ or\ \[tDOM\].\n\n\[PYK\]\ 2015-10-30:\ \ I\ patched\ the\ regular\ expression\ to\ fix\ an\ issue\ where\ the\ attributes\ group\ could\ pick\ up\ part\ of\ the\ tag\ in\ documents\ containing\ tags\ with\ similar\ prefixes.\ \ The\ fix\ is\ to\ use\ whitespace\ followed\ by\ non-whitespace\ other\ than\ `>`\ to\ detect\ the\ beginning\ of\ attributes.\ \ There\ are\ other\ things\ \n\n\n\n\n**\ Negated\ string\ **\n\n''\[Bruce\ Hartweg\]\ wrote\ in\ comp.lang.tcl:''\nYou\ can't\ negate\ a\ regular\ expression,\ but\ you\ CAN\ negate\ a\ regular\ expression\ that\ is\ only\ a\ simple\ string.\ Logically,\ it's\ the\ following:\n\ \ \ *\ match\ any\ single\ char\ except\ first\ letter\ in\ the\ string.\n\ \ \ *\ match\ the\ first\ char\ in\ string\ if\ followed\ by\ any\ letter\ except\ the\ 2nd\n\ \ \ *\ match\ the\ first\ two\ if\ followed\ by\ any\ but\ the\ third,\ et\ cetera\n\nThen\ the\ only\ thing\ more\ is\ to\ allow\ a\ partial\ match\ of\ the\ string\ at\ end\ of\ line.\ So\ for\ a\ regexp\ that\ matches\n\ any\ line\ that\ DOES\ NOT\ have\ the\ word\ ''foo'':\n\n======\nset\ exp\ \{^(\[^f\]|f\[^o\]|fo\[^o\])*.\{0,2\}\$\}\n======\n\nThe\ following\ proc\ will\ build\ the\ expression\ for\ any\ given\ string\n\n======\nproc\ reg_negate\ \{str\}\ \{\n\ \ \ \ set\ partial\ \"\"\n\ \ \ \ set\ branches\ \[list\]\n\ \ \ \ foreach\ c\ \[split\ \$str\ \"\"\]\ \{\n\ \ \ \ \ \ \ \ lappend\ branches\ \[format\ \{%s\[^%s\]\}\ \$partial\ \$c\]\n\ \ \ \ \ \ \ \ append\ partial\ \$c\n\ \ \ \ \}\n\ \ \ \ set\ exp\ \[format\ \{^(%s)*.\{0,%d\}\$\}\ \[join\ \$branches\ \"|\"\]\ \\\n\ \ \ \ \ \ \ \ \[expr\ \[string\ length\ \$str\]\ -1\]\]\n\}\n======\n\n\[Donal\ Fellows\]\ followed\ up\ with:\n\nThat's\ just\ set\ me\ thinking\;\ you\ can\ do\ this\ by\ specifying\ that\ the\ \nwhole\ string\ must\ be\ either\ not\ the\ character\ of\ the\ ''antimatch''*,\ \nor\ the\ first\ character\ of\ the\ antimatch\ so\ long\ as\ it\ is\ not\ \nfollowed\ by\ the\ rest\ of\ the\ antimatch.\ \ This\ leads\ to\ a\ fairly\ \nsimply\ expressed\ pattern.\n\n======\nset\ exp\ \{^(?:\[^f\]|f(?!oo))*\$\}\n======\n\nIn\ fact,\ this\ allows\ us\ to\ strengthen\ what\ you\ say\ above\ to\ allow\ \nthe\ matching\ of\ any\ negated\ regular\ expression\ directly\ so\ long\ as\ the\ first\ \ncomponent\ of\ the\ antimatch\ is\ a\ literal,\ and\ the\ rest\ of\ the\ \nantimatch\ is\ expressible\ in\ an\ ERE\ lookahead\ constraint\ (which\ \nimposes\ a\ number\ of\ restrictions,\ but\ still\ allows\ for\ some\ fairly\ \nsophisticated\ patterns.)\n\n*\ Anything's\ better\ than\ overloading\ 'string'\ here!\n\n\[JMN\]\ 2005-12-22:\nCould\ someone\ please\ explain\ what\ is\ meant\ by\ a\ 'negated\ string'\ here?\nSpecifically\ -\ what\ do\ the\ above\ achieve\ that\ isn't\ satisfied\ by\ the\ simpler:\ \n\n======\nset\ exp\ \{^(?!(.*foo.*))\}\n======\n\nDoesn't\ the\ following\ snippet\ from\ the\ regexp\ manpage\ indicate\ that\ a\ regexp\ can\ be\ negated?\ \nwhere\ does(or\ did?)\ the\ 'simple\ string'\ requirement\ come\ in?\ -\ is\ this\ info\ no\ longer\ current?\n\n\ (?!re)\n\ negative\ lookahead\ (AREs\ only),\ matches\ at\ any\ point\ where\ no\ substring\ matching\ re\ begins\ \n\n\[Lars\ H\]:\ It\ indeed\ seems\ the\ entire\ problem\ is\ rather\ trivial.\ In\ Tcl\ 7\ (before\ AREs)\ one\ sometimes\ had\ to\ do\ funny\ tricks\ like\ the\ ones\ Bruce\ Hartweg\ performs\ above,\ but\ his\ use\ of\ `\{0,2\}`\ means\ he\ must\ be\ assuming\ AREs.\ Perhaps\ there\ was\ a\ transitory\ period\ where\ one\ was\ available\ but\ not\ the\ other.\n\n\[Oleg\]\ 2009-12-11:\nIf\ one\ needs\ to\ match\ any\ string\ but\ 'foo',\ then\ the\ following\ will\ do\ the\ work:\n\n======none\nset\ exp\ \{^((?!foo).*)|^(foo.+)\}\n======\n\nAnd\ in\ general\ case\ when\ one\ needs\ to\ match\ any\ string\ that\ is\ neither\ 'foo'\ nor\ 'bar',\ then\ the\ following\ will\ do\ the\ work:\n\n======none\nset\ exp\ \{^((?!(foo|bar)).*)|^((foo|bar).+)\}\n======\n\n\[CRML\]\ 2013-11-06\nIn\ general\ case\ when\ one\ needs\ to\ match\ any\ string\ that\ is\ neither\ 'foo'\ nor\ 'bar'\ might\ be\ done\ using:\n\n======none\nset\ exp\ \{^(?!((foo|bar)\$))\}\n======\n\n\[AMG\]:\ Oleg's\ regexps\ confuse\ me.\ \ Translated\ literally,\ I\ read\ them\ as\ \"match\ any\ string\ that\ does\ not\ begin\ with\ `foo`\ (or\ `bar`)\ unless\ that\ string\ has\ more\ characters\ after\ the\ `foo`\ (or\ `bar`).\"\ \ Very\ indirect,\ I\ must\ say.\ \ CRML's\ suggestion\ I\ like\ better,\ though\ I\ would\ drop\ the\ extra\ parentheses\ to\ obtain:\ `^(?!(foo|bar)\$)`.\ \ This\ says,\ \"match\ any\ string\ that\ does\ not\ begin\ with\ either\ `foo`\ or\ `bar`\ when\ immediately\ followed\ by\ end\ of\ string.\"\ \ In\ other\ words,\ \"match\ any\ string\ that\ is\ not\ exactly\ `foo`\ or\ `bar`.\"\n\n\n\n**\ Turn\ a\ string\ into\ %hex-escaped\ (url\ encoded)\ characters:\ **\n\ne.g.\ `Csan\ ->\ %43%73%61%6E`\n\n======\nregsub\ -all\ --\ \{(.)\}\ \$string\ \{%\[format\ \"%02lX\"\ \[scan\ \\1\ \"%c\"\]\]\}\ new_string\nsubst\ \$new_string\n======\n\nThis\ demonstrates\ the\ power\ of\ using\ \n\[regsub\]\ together\ with\ \n\[subst\],\ which\ is\ regarded\ as\ one\ of\ the\ most\ powerful\ ways\ to\ use\ regular\ expressions\ in\ Tcl.\n\n\n\n**\ Turn\ a\ string\ into\ %hex-escaped\ (url\ encoded)\ characters\ (part\ 2)\ **\n\nThis\ one\ makes\ the\ result\ more\ readable\ and\ still\ quite\ safe\ to\ use\ in\ URLs\ne.g.\ http://wiki.tcl.tk\ ->\ http%3A%2F%2Fwiki%2Etcl%2Etk\n\n======\nregsub\ -all\ --\ \{(\[^A-Za-z0-9_-\])\}\ \$string\ \{%\[format\ \"%02lX\"\ \[scan\ \\1\ \"%c\"\]\]\}\ new_string\nsubst\ \$new_string\n======\n\n\[nl\]\n\n----\n\n\[Joe\ Mistachkin\]\n\nThe\ inverse\ of\ the\ above\ (not\ optimized):\n\n======\nregsub\ -all\ --\ \{%(\[0123456789ABCDEF\]\[0123456789ABCDEF\])\}\ \$string\ \{\[format\ \"%c\"\ 0x\\1\]\}\ new_string\nsubst\ \$new_string\n======\n\n\n\n**\ Caveats\ about\ using\ `\[regsub\]`\ with\ `\[subst\]`\ **\n\n\[glennj\]\ 2008-12-16:\ It\ can\ be\ dangerous\ to\ blindly\ apply\ `\[subst\]`\ to\ the\ results\ of\ `\[regsub\]`,\ particularly\ if\ you\ have\ not\ validated\ the\ input\ string.\ \ Here's\ an\ example\ that's\ not\ too\ contrived:\n\n======\nset\ string\ \{\[some\ malicious\ command\]\}\nregsub\ -all\ \{\\w+\}\ \$string\ \{\[string\ totitle\ &\]\}\ result\nsubst\ \$result\n======\n\nThis\ results\ in\ `invalid\ command\ name\ \"Some\"`.\ \ What\ if\ `\$string`\ was\ `\[\[exec\ format\ c:\]\]`?\n\nSee\ DKF's\ \"proc\ regsub-eval\"\ contribution\ in\ `\[regsub\]`\ to\ properly\ prepare\ the\ input\ string\ for\ substitution.\ \ Paraphrased:\n\n======\nset\ string\ \{\[some\ malicious\ command\]\}\nset\ escaped\ \[string\ map\ \{\\\[\ \\\\\[\ \\\]\ \\\\\]\ \\\$\ \\\\\$\ \\\\\ \\\\\\\\\}\ \$string\]\nregsub\ -all\ \{\\w+\}\ \$escaped\ \{\[string\ totitle\ &\]\}\ result\nsubst\ \$result\n======\n\nwhich\ results\ in\ what\ you'd\ expect:\ \ the\ string\ \"\[\[Some\ Malicious\ Command\]\]\"\n\n\[APN\]\ I\ don't\ follow\ why\ all\ the\ extra\ \\\ are\ needed\ in\ the\ string\ map.\ The\nfollowing\ should\ work\ just\ as\ well?\n\n======\nset\ escaped\ \[string\ map\ \{\[\ \\\\\[\ \]\ \\\\\]\ \$\ \\\\\$\ \\\\\ \\\\\\\\\}\ \$string\]\n======\n\n\[PYK\]\ 2016-05-28:\ \ Indeed:\n\n======\nexpr\ \{\ \[list\ \{*\}\{\ \[\ \ \\\\\[\ \ \]\ \ \\\\\]\ \ \$\ \ \\\\\$\ \\\\\ \\\\\\\\\}\]\n\ \ \ \ eq\ \[list\ \{*\}\{\\\[\ \ \\\\\[\ \\\]\ \ \\\\\]\ \\\$\ \ \\\\\$\ \\\\\ \\\\\\\\\}\]\}\ \;#\ ->\ 1\n======\n\n\n----\n\n**\ Maintain\ proper\ spacing\ when\ formatting\ for\ HTML\ **\n\n\[DG\]\ got\ this\ from\ \[Kevin\ Kenny\]\ on\ c.l.t.\n\n======none\nregsub\ -all\ \{\ (?=\ )\}\ \$line\ \{\\&nbsp\;\}\ line\n\nset\ line\ \{this\ is\ an\ \ \ \ example\}\nregsub\ -all\ \{\ (?=\ )\}\ \$line\ \{\\&nbsp\;\}\ line\nset\ line\n======\n\nAnd\ the\ output\ is:\n\n======\nthis\ is\ an&nbsp\;&nbsp\;&nbsp\;\ example\n======\n\nTabs\ require\ replacement,\ too:\n\n======\nset\ tabFill\ \"\[string\ repeat\ \\\\&nbsp\\\;\ 7\]\ \"\nregsub\ -all\ \{\\t\}\ \$line\ \$tabFill\ line\n======\n\n----\n\n\[glennj\]:\ \ Taken\ from\ comp.lang.perl.misc,\ transform\ variable\ names\ into\ StudlyCapsNames:\n\n======\nset\ old_vars\ \{VARIABLE_ONE\ VARIABLE_NUMBER_TWO\ a_really_long_VARIABLE_name\}\nset\ NewVars\ \{\}\nforeach\ v\ \$old_vars\ \{\n\ \ \ regsub\ -all\ \{_?(.)(\[^_\]*)\}\ \$v\ \{\[string\ toupper\ \"\\1\"\]\[string\ tolower\ \"\\2\"\]\}\ new\n\ \ \ lappend\ NewVars\ \[subst\ \$new\]\n\}\n======\n\n----\n\nWhen\ using\ \[ASED%|%ASED's\]\ syntax\ checker\ you\ get\ an\ error\ of\ you\ don't\ use\ the\ `--`\ option\ to\ `\[regexp\]`.\ Instead\ of\ `regexp\ \{(\[\[^A-Za-z0-9_-\]\])\}\ \$string`\ you\ have\ to\ write\ `regexp\ --\ \{(\[\[^A-Za-z0-9_-\]\])\}\ \$string`\n\n----\n\n\[LV\]:\ A\ user\ recently\ asked:\n\nI\ have\ a\ string\ that\ I'm\ trying\ to\ parse.\ Why\ doesn't\ this\ seem\ to\ work?\n\n======\n%\ set\ str\ \{Acc\ No:\ 12345\}\n%\ set\ num\ \[regexp\ \{.*?(\\d+).*\}\ \$str\ junk\ result\]\n%\ puts\ \$result\n1\n======\n\nIt\ looks\ to\ me\ like\ the\ `*?`\ causes\ the\ subsequent\ `\\d+`\ to\ also\ be\ greedy\nand\ only\ match\ the\ first\ hit.\ Did\ I\ figure\ that\ out\ correctly?\ I\ presume\nthat\ we\ currently\ don't\ have\ a\ way\ to\ ''turn\ off''\ the\ greediness\ item?\n\nOf\ course,\ in\ this\ simplified\ problem,\ one\ could\ just\ drop\ the\ greediness\ and\ncode\n\ \n======none\n%\ set\ num\ \[regexp\ \{(\\d+)\}\ \$str\ junk\ result\]\n%\ puts\ \$result\n12345\n======\n\nI'll\ let\ the\ user\ decide\ if\ that\ suffices.\n\n----\n\n**\ How\ do\ you\ select\ from\ two\ words?\ **\n\n======none\n%\ set\ word\ \"foo\"\n%\ set\ result\ \[regexp\ \{(foo|bar)\}\ match\ zzz\]\n%\ set\ zzz\ncan't\ read\ \"zzz\":\ no\ such\ variable\n???\n======\n\n\[LES\]:\ You\ got\ the\ regexp\ syntax\ wrong\ and\ tried\ to\ match\ the\ regular\ expression\ with\ the\ string\ \"match\".\ There\ is\ no\ \"zzz\"\ variable\ (the\ actual\ match\ variable\ in\ your\ code)\ because\ your\ regular\ expression\ does\ not\ match\ the\ string\ \"match\".\ Try\ this:\n\n======none\n%\ set\ word\ \"foo\"\n%\ set\ result\ \[regexp\ \{(foo|bar)\}\ \$word\ match\ zzz\]\n%\ set\ match\n======\n\nNote\ that\ I\ could\ have\ dropped\ the\ \"zzz\"\ variable,\ but\ left\ it\ there\ as\ a\ second\ match\ variable,\ as\ an\ exercise\ to\ you.\ You\ should\ understand\ why\ and\ what\ it\ does\ if\ you\ read\ the\ \[regexp\]\ page\ and\ assimilate\ the\ syntax.\n\n\n\n**\ Infinite\ spaces\ at\ start\ and\ end\ **\n\n\[RUJ\]:\ Could\ you\ match\ the\ following\ pattern\ of\ following\ string:\ infinite\ spaces\ at\ start\ and\ end.\n\n======none\n%\ set\ str\ \"\ \ sjkhf\ sdhj\ \ \ \"\n======\n\n\[LV\]:\ try\n\n======\nset\ rest\ \[regexp\ \{^\ +.*\ +\$\}\ \$str\ match\]\nputs\ \$rest\n======\n\nwhich\ should\ have\ a\ value\ of\ 1\ (in\ other\ words,\ it\ matched).\nOf\ course,\ if\ those\ leading\ and\ trailing\ spaces\ are\ optional,\ then\ change\nthe\ +\ to\ a\ *.\n\n\[CRML\]\ non\ greedy\ or\ greedy\ does\ not\ give\ the\ same\ result.\ In\ the\ previous\ example,\ the\ .*\ matches\ all\ the\ string\ up\ to\ the\ last\ but\ one\ char.\n\n======\nset\ rest\ \[regexp\ \{^\ +(.*?)\ *\$\}\ \$str\ match\ noinfinite\]\nputs\ \$rest\nputs\ \"|\$noinfinte|\"\nset\ rest\ \[regexp\ \{^\ +(.*)\ *\$\}\ \$str\ match\ noinfinite\]\nputs\ \$rest\nputs\ \"|\$noinfinte|\"\n======\n\n----\n\n**\ URL\ Parser\ **\n\nSee\ \[URL\ Parser\].\n\n----\n\n**\ Match\ a\ \"quoted\ string\"\ **\n\n\[AMG\]:\ Adapted\ from\ \[Wibble\]:\n\n======\nproc\ quoted-string\ \{str\}\ \{\n\ \ \ \ regexp\ \{^\"(?:\[^\\\\\"\]|\\\\.)*\"\$\}\ \$str\n\}\n======\n\nThis\ recognizes\ strings\ starting\ and\ ending\ with\ double\ quote\ characters.\ \ Any\ character\ can\ be\ embedded\ in\ the\ string,\ even\ double\ quotes,\ when\ preceded\ by\ an\ odd\ number\ of\ backslashes.\n\n\n\n**\ Word\ Splitting,\ Respecting\ Quoted\ Strings\ **\n\ngiven\ some\ text,\ e.g.\n\n======none\nhere\ \ \ \ is\ some\ \"quoted\ \ \ \ \ text\ with\ \ \ lots\ \ \ \ of\ space\"\ \ and\ \ \ \ more\ \ \ \n======\n\nhow\ to\ parse\ it\ into\n\n======none\nhere\ is\ some\ \{quoted\ \ \ \ \ text\ with\ \ \ lots\ \ \ \ of\ space\}\ and\ more\n======\n\n======\nregexp\ -all\ -inline\ \{(?:\[^\ \"\]|\\\"\[^\"\]*\\\")+\}\n======\n\nsee\ \[KBK\],\ #tcl\ irc\ channel,\ 2012-12-02\n\n**\ split\ a\ string\ into\ n-length\ substrings\ **\n\n======\nregexp\ -all\ -inline\ \".\{\$n\}\"\ \$string\n======\n\nevilotto,\ #tcl,\ 2013-02-07\n\n\[https://groups.google.com/forum/#!topic/comp.lang.tcl/mQenwyY578o/discussion%|%:)\ Contest:\ fast\ way\ to\ chop\ string\ in\ short\ fixed\ pieces,\ comp.lang.tcl,\ 2004-07-19%|%\]\n\n\n\n**\ At\ Least\ 1\ Alpha\ Character\ Interspersed\ with\ 0\ or\ More\ Digits\ **\n\n======none\nregexp\ \{\[\[:alnum:\]\]*\[\[:alpha:\]\]\[\[:alnum:\]\]*\}\ \$string\n======\n\n\n**\ Matching\ a\ group\ of\ strings\ **\n\n\nregexp\ -nocase\ \{string1,string2,string3\ ...\}\ \$string\n======\n\nWe\ can\ match\ a\ group\ of\ strings\ or\ subjects\ in\ a\ single\ regular\ expression\n\n\n**\ \[Sqlite\]\ Numeric\ Literal\ **\n\n======\nregexp\ \{^(\[\[:digit:\]\]*)(?:\\.(\[\[:digit:\]\]+))?(?:\[eE\]\[+-\]?(\[\[:digit:\]\]+))?\$|^0x\[a-fA-F\]+\$\}\ number\ int\ mant\ exp\ \n======\n\n\n----\n'''\[ak\]\ -\ 2017-08-08\ 03:32:33'''\n\nRegarding\ negation\ of\ regular\ expressions.\n\nWhile\ the\ regular\ expression\ syntax\ does\ not\ allow\ for\ simple\ negation\ the\ underlying\ formalism\ of\ (non)deterministic\ finite\ automata\ does.\ Simply\ swap\ final\ and\ non-final\ states\ to\ negate,\ i.e.\ complement\ it.\n\nSee\ for\ example\ the\nhttps://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/grammar_fa/fa.html%|%grammar::fa%|%\npackage\ in\ Tcllib,\ which\ provides\ a\nhttps://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/grammar_fa/fa.html#51%|%complement%|%\ method.\nIt\ is\ implemented\ in\ the\nhttps://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/grammar_fa/faop.html%|%operations%|%\npackage.\ As\ are\ methods\ to\ convert\ from\ and\ to\ regular\ expressions.\n\n<<categories>>\ Tutorial\ |\ String\ Processing regexp2} CALL {my render {Regular Expression Examples} '''Regular\ Expression\ Examples'''\ is\ a\ list,\ roughly\ sorted\ by\ complexity,\ of\nregular\ expression\ examples.\ \ It\ also\ serves\ as\ both\ a\ library\ of\ useful\nexpressions\ to\ include\ in\ your\ own\ code.\n\nFor\ advanced\ examples,\ see\ \[Advanced\ Regular\ Expression\ Examples\]\ You\ can\ also\nfind\ some\ regular\ expressions\ on\ \[Regular\ Expressions\]\ and\ \[Bag\ of\ algorithms\]\npages.\n\n\n\n**\ See\ Also\ **\n\n\ \ \ \[http://www.regular-expressions.info/examplesprogrammer.html%|%Example\ Regexes\ to\ Match\ Common\ Programming\ Language\ Constructs\]:\ \ \ \n\n\ \ \ \[https://groups.google.com/d/msg/comp.lang.tcl/uHEWT5LuuVg/LNa0PgvlBtQJ%|%Extracting\ numbers\ from\ text\ strings,\ removing\ unwanted\ characters\],\ \[comp.lang.tcl\],\ 2002-06-23:\ \ \ a\ delightful\ explication\ by\ \[Michael\ Cleverly\]\n\n\ \ \ \[re_syntax\]:\ \ \ \n\n\ \ \ \[URI\ detector\ for\ arbitrary\ text\ as\ a\ regular\ expression\]:\ \ \ \n\n\ \ \ \[Arts\ and\ crafts\ of\ Tcl-Tk\ programming\]:\ \ \ \n\n\ \ \ \[Regular\ Expressions\]:\ \ \ \n\n\ \ \ \[Regular\ Expression\ Debugging\ Tips\]:\ \ \ \n\n\ \ \ \[Visual\ Regexp\]:\ \ \ A\ '''terrific'''\ way\ to\ learn\ about\ REs.\n\n\ \ \ \[Redet\]:\ \ \ Another\ tool\ for\ learning\ about\ and\ working\ with\ REs.\n\n\ \ \ \[Regular\ Expression\ Debugging\ Tips\]:\ \ \ More\ tools.\n\n\n\n**\ Simple\ `\[regexp\]`\ Examples\ **\n\n`\[regexp\]`\ has\ syntax:\n\n\ \ \ \ :\ \ \ regexp\ ?switches?\ exp\ string\ ?matchVar?\ ?subMatchVar\ subMatchVar\ ...?\n\nIf\ ''matchVar''\ is\ specified,\ its\ value\ will\ be\ only\ the\ part\ of\ the\n''string''\ that\ was\ matched\ by\ the\ ''exp''.\ \ As\ an\ example:\ \ \ \n\n======\nregexp\ \{c.*g\}\ \"abcdefghi\"\ matched\nputs\ \$matched\ \ \ \ \ \ \ \;#\ ==>\ cdefg\n======\n\nIf\ any\ ''subMatchVar''s\ are\ specified,\ their\ values\ will\ be\ the\ part\nof\ the\ ''string''\ that\ were\ matched\ by\ parenthesized\ bits\ in\ the\ ''exp'',\ counting\ open\ parentheses\ from\ left\ to\ right.\ \ For\ example:\n\n======\nregexp\ \{c((.*)g)(.*)\}\ \"abcdefghi\"\ matched\ sub1\ sub2\ sub3\nputs\ \$matched\ \ \ \ \ \ \ \;#\ ==>\ cdefghi\nputs\ \$sub1\ \ \ \ \ \ \ \ \ \ \;#\ ==>\ defg\nputs\ \$sub2\ \ \ \ \ \ \ \ \ \ \;#\ ==>\ def\nputs\ \$sub3\ \ \ \ \ \ \ \ \ \ \;#\ ==>\ hi\n======\n\nMany\ times,\ people\ only\ care\ about\ the\ ''subMatchVar''s\ and\ want\ to\nignore\ ''matchVar''.\ \ They\ use\ a\ \"dummy\"\ variable\ as\ a\ placeholder\ in\nthe\ command\ for\ the\ ''matchVar''.\ \ You\ will\ often\ see\ things\ like\n\n======\nregexp\ \$exp\ \$string\ ->\ sub1\ sub2\n======\n\nwhere\ `\$\{->\}`\ holds\ the\ matched\ part.\ \ It\ is\ a\ sneaky\ but\ legal\ Tcl\nvariable\ name.\n\n\[PYK\]\ 2015-10-29:\ \ As\ a\ matter\ of\ fact,\ '''every'''\ string\ is\ a\ legal\ Tcl\nvariable\ name.\n\n\n\n**\ Splitting\ a\ String\ Into\ Words\ **\n\n''\"How\ do\ I\ split\ an\ arbitrary\ string\ into\ words?\"''\ is\ a\ frequently\nasked\ question.\ \ If\ you\ use\ `\[split\]\ \$string\ \{\ \}`,\ then\ multiple\ \nspaces\ will\ produce\ a\ list\ with\ empty\ elements.\ \ If\ you\ try\ to\ use\ `\[foreach\]`\ or\ \n`\[lindex\]`\ or\ some\ other\ \[list\]\ operation,\ then\ you\ must\ be\ sure\ that\nthe\ string\ is\ a\ well-formed\ list.\ \ (Braces\ could\ cause\ problems.)\ \ So\ use\ a\ regular\ expression\ like\ this\ very\ simple\ shorthand\ for\ non-space\ characters:\n\n\n======none\n\{\\S+\}\n======\n\nYou\ can\ even\ split\ a\ string\ of\ text\ with\ arbitrary\ spaces\ and\ special\ characters\ into\ a\ list\ of\ words\ by\ using\ the\ '''-inline'''\ and\ '''-all'''\ switches\ to\ `\[regexp\]`:\n\n======\nset\ text\ \"Some\ arbitrary\ text\ which\ might\ include\ \\\$\ or\ \{\"\nset\ wordList\ \[regexp\ -inline\ -all\ --\ \{\\S+\}\ \$text\]\n======\n\n\n\n**\ Split\ into\ Words,\ Respecting\ Acronyms\ **\n\n======\nset\ data\ \{Marvels.Agents.of.S.H.I.E.L.D\}\nregsub\ -all\ \{(\\w\{2,\})\\.\}\ \$data\ \{\\1\ \}\n======\n\n\ \ \ \ :\ \ \ from\ \[Tcl\ Chatroom\],\ 2013-10-09\n\n\n\n**\ Floating\ Point\ Number\ **\n\nThis\ expression\ includes\ options\ for\ leading\ +/-\ character,\ digits,\ decimal\ points,\ and\ a\ trailing\ exponent.\ \ Note\ the\ use\ of\ nearly\ duplicate\ expressions\ joined\ with\ the\ ''or''\ operator\ `|`\ to\ permit\ the\ decimal\ point\ to\ lead\ or\ follow\ digits.\ \n\n======none\n^\[-+\]?\[0-9\]*\\.?\[0-9\]+(\[eE\]\[-+\]?\[0-9\]+)?\$\n======\n\nExpression\ to\ find\ if\ a\ string\ have\ any\ substring\ maching\ a\ floating\ point\ number\ (\ This\ was\ posted\ to\ comp.lang.tcl\ by\ Roland\ B.\ Roberts.):\n\n======none\n\[-+\]?(\[0-9\]+\\.?\[0-9\]*|\\.\[0-9\]+)(\[eE\]\[-+\]?\[0-9\]+)?\n======\n\n\nMore\ information\ (http://www.regular-expressions.info/floatingpoint.html)\n\n\n\n**\ Letters\ **\n\nThanks\ to\ Brent\ Welch\ for\ these\ examples,\ showing\ the\ difference\ between\ a\ traditional\ character\ matching\ and\ \"the\ Unicode\ way.\"\n\nOnly\ letters:\n\n======none\n^\[A-Za-z\]+\$\n======\n\nOnly\ letters,\ the\ Unicode\ way:\n\n======none\n^\[\[:alpha:\]\]+\$\n======\n\n\n\n**\ Special\ Characters\ **\n\nThanks\ again\ to\ Brent\ Welch\ for\ these\ two\ examples.\n\nThe\ set\ of\ Tcl\ special\ characters:\ `\]\]\ \[\[\ \$\ \{\ \}\ \\`:\n\n======\n\[\]\[\$\{\}\\\\\]\n======\n\nThe\ set\ of\ regular\ expression\ special\ characters:\ `'\]\]\ \[\[\ \$\ ^\ ?\ +\ *\ (\ )\ |\ \\'`\n\n\n%|CHARACTER|DESCRIPTION|%\n&|*|The\ sub-pattern\ before\ '*'\ can\ occur\ zero\ or\ more\ times|&\n&|+|The\ sub-pattern\ before\ '+'\ can\ occur\ one\ or\ more\ times|&\n&|?|The\ sub-pattern\ before\ '?'\ can\ only\ occur\ zero\ or\ one\ time|&\n&|<<pipe>>|(Alteration)\ Matches\ any\ one\ sub-pattern\ separated\ by\ '<<pipe>>'s.\ \ Similar\ to\ logical\ 'OR'.\ |&\n&|()|Groups\ a\ pattern|&\n&|\[\[\]\]|Defines\ a\ set\ of\ characters,\ or\ range\ of\ characters\ \[\[a-z,A-Z,0-9\]\]|&\n\n\n======\n\[\]\[\$^?+*()|\\\\\]\n======\n\ \n\nI\ don't\ understand\ these\ examples.\ \ Why\ have\ `\[\[`,\ `\]\]`,\ and\ then\ the\ rest\nof\ the\ characters\ inside\ a\ `\[\[\]\]`\ -\ that\ just\ makes\ the\ string\ have\ `\[\[`\ and\ `\]\]`\nthere\ twice,\ right?\n\n\[LV\]:\ the\ first\ regular\ expression\ should\ be\ seen\ like\ this:\n\n\ \ \ `\{\ ...\ \}`:\ \ \ Protect\ the\ 9\ inner\ characters.\n\n\ \ \ `\[\[\ ...\ \]\]`:\ \ \ Define\ a\ set\ of\ characters\ to\ process.\n\n\ \ \ `\]\]`:\ \ \ If\ your\ set\ of\ characters\ is\ going\ to\ include\ the\ right\ bracket\ character\ `\]\]`\ as\ a\ specific\ matching\ character,\ then\ it\ needs\ to\ be\ first\ in\ the\ set/class\ definition.\n\n\ \ \ `\[\[\$\{\}`:\ \ \ More\ individual\ characters.\n\n\ \ \ `\\\\`:\ \ \ Doubled\ because\ when\ `regexp`\ goes\ to\ evaluate\ the\ characters,\ it\ would\ otherwise\ treat\ a\ single\ backslash\ `\\`\ as\ a\ request\ to\ quote\ the\ next\ character,\ the\ ending\ right\ bracket\ of\ the\ set/class.\n\nThe\ second\ regular\ expression\ is\ interpreted\ in\ a\ similar\ fashion.\ \ There\ are\ more\ characters\ because\ there\ are\ more\ metacharacters.\n\nAlso,\ not\ all\ characters\ are\ there\ -\ where\ are\ the\ period,\ equals,\ bang\ (exclamation\ sign),\ dash,\ colon,\ alphas\ that\ are\ a\ part\ of\ character\ entry\ escapes\ or\ classes,\ 0,\ hash/pound\ sign,\ and\ angle\ brackets\ (<\ and\ >)?\ \ These\ special\ characters\ all\ have\ meta\ meanings\ within\ regular\ expressions...\n\n\[LV\]:\ Apparently\ no\ one\ has\ come\ along\ and\ updated\ the\ above\ expression\ to\ cover\ these.\n\nExample\ posted\ by\ KC:\n\nA\ set\ containing\ both\ angle\ brackets:\n\n======none\n\[\\<\\>\]\n======\n\n\n***\ newline/carriage\ return\ ***\n\nCould\ someone\ replace\ this\ line\ with\ some\ verbiage\ regarding\ the\ way\ one\ uses\nregular\ expressions\ for\ specific\ newline-carriage\ return\ handling\ (as\ opposed\nto\ the\ use\ of\ the\ `\$`\ metacharacter)?\n\n\[Janos\ Holanyi\]:\ I\ would\ really\ need\ to\ build\ up\ a\ re\ that\ would\ match\ one\ line\nand\ only\ one\ line\ -\ that\ is,\ excluding\ carriage-return-newline's\ (\\r\\n)\ from\nmatching...\ How\ would\ such\ a\ re\ look\ like?\n\n----\n\n\[LV\]:\ how\ about\ something\ like\ this?\n\n======none\n%\ set\ a\ \"abc\nev\"\n#\ a\ now\ has\ two\ lines\ in\ it\n%\ regexp\ -line\ --\ \{(.*)\}\ \$a\ b\ c\ d\n1\n%\ puts\ \$b\nabc\n%\ puts\ \$c\nabc\n======\n\nIf\ you\ want\ to\ keep\ carriage\ returns\ or\ newlines\ by\ themselves,\ but\ not\ when\nthey\ are\ together,\ you\ need\ something\ like:\n\n======\nregexp\ --\ \ \{^(\[^\\r\]|\\r(?!\\n))*\}\ \ \$a\ b\ c\ d\n======\n\nThis\ allows\ plain\ carriage\ return\ or\ plain\ newline.\n\nThanks\ to\ \[bbh\]\ and\ \[Donal\ Fellows\]\ for\ this\ regular\ expression.\n\n\n\n**\ Back\ References\ **\n\nFrom\ \[comp.lang.tcl\]:\n\nI\ did\ some\ experimenting\ with\ other\ strings,\ like\ \"just\ a\nHHHHEEEEAAAADDDDEEEERRRR\".\ The\ regular\ expression\ `(.)\\1\\1\\1`\ does\ the\ job\ I\nwould\ have\ wanted,\ whereas\ `(.)\{4\}`\ will\ return\ the\ last\ of\ each\ four\ncharacters\ -\ as\ posted\ as\ well.\n\nThat\ surprised\ me\ too\ --\ being\ able\ to\ place\ backreferences\ within\ the\ regex\ is\nan\ extremely\ powerful\ technique.\n\n======\nregsub\ -all\ \{(.)\\1\{3\}\}\ \$string\ \{\\1\}\ result\n======\n\nfor\ exactly\ 4\ char\ repeats,\ and\ `(.)\\1+`\ for\ arbitrary\ repeats.\n\n\n\n**\ IP\ Numbers\ **\n\nYou\ can\ create\ a\ regular\ expression\ to\ check\ an\ IP\ address\ for\ \ncorrect\ syntax.\ \ Note\ that\ this\ regular\ expression\ only\ checks\nfor\ groups\ of\ 1-3\ digits\ separated\ by\ periods.\ \ If\ you\ want\nto\ ensure\ that\ the\ digit\ groups\ are\ from\ 0-255,\ or\ that\ you\ \nhave\ a\ valid\ IP\ address,\ you'll\ have\ to\ do\ additional\n(non\ regexp)\ work.\ \ This\ code\ posted\ to\ comp.lang.tcl\ by\n\[George\ Peter\ Staplin\]\n\ \n======\nset\ str\ 66.70.7.154\n\nregexp\ \"(\\\[0-9\]\{1,3\})\\.(\\\[0-9\]\{1,3\})\\.(\\\[0-9\]\{1,3\})\\.(\\\[0-9\]\{1,3\})\"\ \$str\ all\ first\ second\ third\ fourth\n\nputs\ \"\$all\ \\n\ \$first\ \\n\ \$second\ \\n\ \$third\ \\n\ \$fourth\ \\n\"\n======\n\nThe\ above\ regular\ expression\ matches\ any\ string\ where\ there\ are\ four\ groups\ of\ 1-3\ digits\ separated\ by\ periods.\ \ Since\ it's\ not\ anchored\ to\ the\ start\ and\ end\ of\ the\ string\ (with\ ^\ and\ \$)\ it\ will\ match\ any\ string\ that\ contains\ four\ groups\ of\ 1-3\ digits\ separated\ by\ periods,\ such\ as:\ \"66.70.7.154.9\".\n\nIf\ you\ don't\ mind\ a\ longer\ regexp,\ there\ is\ no\ reason\ you\ can't\ ensure\ that\ each\ group\ of\ 1-3\ digits\ is\ in\ the\ range\ of\ 0-255.\ \ For\ example\ (broken\ up\ a\ bit\ to\ make\ it\ more\ readable):\n\n======\nset\ octet\ \{(\\d|\[1-9\]\\d|1\\d\\d|2\[0-4\]\\d|25\[0-5\])\}\nset\ RE\ \"^\[join\ \[list\ \$octet\ \$octet\ \$octet\ \$octet\]\ \{\\.\}\]\\\$\"\nregexp\ \$RE\ \$str\ all\ first\ second\ third\ fourth\ \;#\ Michael\ A.\ Cleverly\n======\n\nrecently\ on\ comp.lang.tcl,\ someone\ mentioned\ that\ \nhttp://www.oreilly.com/catalog/regex/chapter/ch04.html#Be_Specific\ntalks\ about\ matching\ IP\ addresses.\n\n'''Gururajesh:'''\ A\ Perfect\ regular\ expression\ to\ validate\ ip\ address\ with\ a\ single\ expression.\n\n======\nif\ \{\[regexp\ \{(^\[2\]\[5\]\[0-5\].|^\[2\]\[0-4\]\[0-9\].|^\[1\]\[0-9\]\[0-9\].|^\[0-9\]\[0-9\].|^\[0-9\].)(\[2\]\[0-5\]\[0-5\].|\[2\]\[0-4\]\[0-9\].|\[1\]\[0-9\]\[0-9\].|\[0-9\]\[0-9\].|\[0-9\].)(\[2\]\[0-5\]\[0-5\].|\[2\]\[0-4\]\[0-9\].|\[1\]\[0-9\]\[0-9\].|\[0-9\]\[0-9\].|\[0-9\].)(\[2\]\[0-5\]\[0-5\]|\[2\]\[0-4\]\[0-9\]|\[1\]\[0-9\]\[0-9\]|\[0-9\]\[0-9\]|\[0-9\])\$\}\ \$string\ match\ v1\ v2\ v3\ v4\]\}\ \{puts\ \"\$v1\$v2\$v3\$v4\"\}\ else\ \{puts\ \"none\"\}\n======\n\nFor\ `245.254.253.2`,\ output\ is\ `245.254.253.2`\n\nFor\ `265.254.243.2`,\ output\ is\ `none`,\ As\ ip-address\ can`t\ have\ a\ number\ greater\ than\ 255.\n\n\[Lars\ H\]:\ Perfect?\ No,\ it\ looks\ like\ it\ would\ accept\ `99a99b99c99`,\ since\ `.`\ will\ match\ any\ character.\ Also,\ it\ can\ be\ shortened\ significantly\ by\ making\ use\ of\ `\{4\}`\ and\ the\ like\ (see\ \[Regular\ expressions\]).\n\nBetter\ is\n\n======\nif\ \{\[regexp\ \{^(((\[2\]\[5\]\[0-5\]|(\[2\]\[0-4\]|\[1\]\[0-9\]|\[0-9\])?\[0-9\])\\.)\{3\})(\[2\]\[5\]\[0-5\]|(\[2\]\[0-4\]|\[1\]\[0-9\]|\[0-9\])?\[0-9\])\$\}\ \$IP\ \$string\ match\ v1\ v2\ v3\ v4\]\}\ \{puts\ \"\$v1\$v2\$v3\$v4\"\}\ else\ \{puts\ \"none\"\}\n======\n\nTcllib\ should\ be\ useful\n\n\ \ \ *\ http://docs.activestate.com/activetcl/8.5/tcllib/dns/tcllib_ip.html\n\ \ \ *\ http://tcllib.sourceforge.net/doc/tcllib_ip.html\n\n\[freethomas\]:\ I\ thinks\ this\ regexp\ is\ much\ simple\ and\ easier\ for\ IP\ number\n\n======\nset\ str\ \"66.70.7.154\"\nregexp\ \{(\\d+)(\\D)(\\d+)(\\D)(\\d+)(\\D)(\\d+)\}\ \$str\ match\n======\n\n\[AMG\]:\ This\ expression\ allows\ any\ character\ to\ separate\ the\ octets,\ not\ just\ period.\ \ I\ sincerely\ doubt\ this\ is\ what\ you\ want.\ \ Use\ `\\.`\ instead\ of\ `\\D`.\ \ Also\ it's\ not\ anchored\ with\ `^`\ and\ `\$`,\ so\ it\ works\ on\ substrings\ rather\ than\ requiring\ that\ the\ whole\ string\ match.\ \ Though\ maybe\ this\ is\ what\ you\ want\ since\ you\ explicitly\ capture\ the\ matching\ substring.\n\nI\ already\ fixed\ the\ syntax\ issue\ of\ saying\ `\{`\ at\ the\ beginning\ but\ leaving\ out\ the\ closing\ `\}`,\ also\ of\ leaving\ out\ the\ first\ `(`.\n\nI\ see\ no\ reason\ to\ use\ `(`\ and\ `)`\ grouping.\ \ You\ don't\ give\ variables\ into\ which\ the\ subexpressions\ would\ be\ captured,\ and\ it's\ pointless\ to\ capture\ the\ dots\ between\ the\ octets.\ \ (See\ what\ I\ did\ there?)\ \ Try\ this:\n\n======\nregexp\ \{^\\d+\\.\\d+\\.\\d+\\.\\d+\$\}\ \$str\n======\n\n\[AMG\]:\ Here's\ a\ very\ similar\ script\ (to\ \[Lars\ H\]'s\ contribution)\ that\ uses\ `\[scan\]`\ instead\ of\ `\[regexp\]`.\ \ It's\ much\ more\ readable,\ in\ my\ opinion.\n\n======\nif\ \{\[scan\ \$string\ %d.%d.%d.%d\ a\ b\ c\ d\]\ ==\ 4\n\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nThere\ are\ a\ few\ differences.\ \ One,\ the\ trailing\ dot\ is\ omitted\ from\ the\ first\ three\ output\ variables\ (which\ I\ call\ `a`,\ `b`,\ `c`,\ `d`\ instead\ of\ `v1`,\ `v2`,\ `v3`,\ `v4`).\ \ Two,\ leading\ zeroes\ are\ permitted\ and\ discarded.\ \ Three,\ `-0`\ is\ accepted\ as\ `0`.\ \ Four,\ garbage\ at\ the\ end\ of\ \$string\ is\ silently\ discarded.\ \ Five,\ each\ octet\ can\ have\ a\ leading\ `+`,\ e.g.\ `+255.+255.+255.+255`.\ \ Six,\ it's\ ''OVER\ FIVE\ TIMES\ FASTER!''\ \ On\ this\ machine,\ my\ version\ using\ `\[scan\]`\ takes\ 15\ microseconds,\ whereas\ your\ version\ using\ `\[regexp\]`\ takes\ 78\ microseconds.\ \ Use\ `\[time\]`\ to\ measure\ performance.\ \ (I\ replaced\ `\[puts\]`\ with\ `\[return\]`\ when\ testing.)\n\nNow,\ here's\ a\ hybrid\ version\ that\ uses\ regexp.\n\n======\nif\ \{\[regexp\ \{^(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)\$\}\ \$string\ _\ a\ b\ c\ d\]\n\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nThis\ version\ takes\ 46\ microseconds\ to\ execute.\ \ It\ doesn't\ accept\ leading\ `+`\ or\ `-`.\ \ It\ rejects\ garbage\ at\ the\ end\ of\ the\ string.\ \ It\ treats\ the\ octets\ as\ octal\ if\ they\ are\ given\ leading\ zeroes,\ and\ invalid\ octal\ is\ always\ accepted.\ \ The\ reason\ for\ this\ last\ is\ because\ `\[if\]`\ treats\ strings\ containing\ invalid\ octal\ as\ nonnumeric\ text,\ so\ the\ \[<=\]\ operator\ is\ used\ to\ sort\ text\ rather\ than\ compare\ numbers.\ \ Corrected\ version:\n\n======\nif\ \{\[regexp\ \{^(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)\$\}\ \$string\ _\ a\ b\ c\ d\]\n\ &&\ \[string\ is\ integer\ \$a\]\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$b\]\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$c\]\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$d\]\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nThis\ version\ takes\ 47\ microseconds\ and\ it\ rejects\ invalid\ octal.\ \ However,\ it\ still\ interprets\ numbers\ as\ octal\ if\ leading\ zeroes\ are\ given,\ so\ `0377.255.255.255`\ is\ accepted\ (but\ `0400.255.255.255`\ is\ rejected).\ \ To\ fix\ this,\ it\ would\ be\ necessary\ to\ make\ a\ pattern\ that\ rejects\ leading\ zeroes\ unless\ the\ octet\ is\ exactly\ zero,\ something\ like:\ `(0|\[\[^1-9\]\]\\d*)`.\ \ But\ this\ is\ getting\ clumsy\ and\ slow\;\ I\ prefer\ the\ `\[scan\]`\ solution.\ \ `\[regexp\]`:\ not\ always\ the\ right\ tool!\n\nGururajesh:\n\n======\nset\ string\ \"0377.255.255.255\"\nif\ \{\[regexp\ \{^(\\d+)\\.(\\d+)\\.(\\d+)\\.(\\d+)\$\}\ \$string\ _\ a\ b\ c\ d\]\n\ &&\ \[string\ is\ integer\ \$a\]\ &&\ \[scan\ \$a\ %d\ v1\]\ &&\ 0\ <=\ \$v1\ &&\ \$v1\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$b\]\ &&\ \[scan\ \$b\ %d\ v2\]\ &&\ 0\ <=\ \$v2\ &&\ \$v2\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$c\]\ &&\ \[scan\ \$c\ %d\ v3\]\ &&\ 0\ <=\ \$v3\ &&\ \$v3\ <=\ 255\n\ &&\ \[string\ is\ integer\ \$d\]\ &&\ \[scan\ \$d\ %d\ v4\]\ &&\ 0\ <=\ \$v4\ &&\ \$v4\ <=\ 255\}\ \{puts\ \$v1.\$v2.\$v3.\$v4\}\ else\ \{puts\ none\}\n======\n\nThis\ will\ be\ ok...\ for\ above\ mentioned\ issue.\n\n\[AMG\]:\ Why\ call\ `\[scan\]`\ four\ times?\ \ A\ single\ invocation\ can\ do\ the\ job:\n\n======\nset\ string\ \"0377.255.255.255\"\nif\ \{\[regexp\ \{^\\d+\\.\\d+\\.\\d+\\.\\d+\$\}\ \$string\]\n\ &&\ \[scan\ \$string\ %d.%d.%d.%d\ a\ b\ c\ d\]\ ==\ 4\n\ &&\ 0\ <=\ \$a\ &&\ \$a\ <=\ 255\ &&\ 0\ <=\ \$b\ &&\ \$b\ <=\ 255\n\ &&\ 0\ <=\ \$c\ &&\ \$c\ <=\ 255\ &&\ 0\ <=\ \$d\ &&\ \$d\ <=\ 255\}\ \{\n\ \ \ \ puts\ \$a.\$b.\$c.\$d\n\}\ else\ \{\n\ \ \ \ puts\ none\n\}\n======\n\nI\ don't\ see\ any\ drawbacks\ to\ this\ approach.\ \ The\ regular\ expression\ is\ simple\ and\ is\ used\ only\ to\ reject\ `+`\ and\ `-`\ signs\ and\ garbage\ at\ the\ end,\ `\[scan\]`\ does\ the\ job\ of\ splitting\ and\ converting\ to\ integers,\ and\ math\ expressions\ check\ ranges.\ \ Three\ tools,\ each\ doing\ what\ they're\ designed\ for.\n\n\[CJB\]:\ Here\ is\ a\ pure\ `\[regexp\]`\ version\ with\ comparable\ performance.\ \ It\ matches\ any\ valid\ ip,\ rejecting\ octals.\ \ However\ it\ does\ not\ split\ the\ integers\ and\ is\ therefore\ only\ useful\ for\ validation.\ \ The\ timings\ on\ my\ computer\ were\ about\ 22\ microseconds\ for\ this\ version\ compared\ to\ 28\ microseconds\ for\ the\ regexp/scan\ combo\ (I\ removed\ the\ \[puts\]\ statements\ for\ the\ comparison\ because\ they\ are\ slow\ and\ tend\ to\ vary).\n\nNote\ that\ the\ pure\ `\[scan\]`\ version\ is\ still\ fastest\ (about\ 20\ microseconds),\ splits,\ and\ has\ the\ same\ rejections\ (`%d`\ stores\ integers\ and\ ignores\ extra\ leading\ `0`\ characters).\n\n======\nset\ string\ 123.255.189.255\ \nregexp\ \{^(?:(?:\[2\]\[5\]\[0-5\]|\[1\]?\[1-9\]\{1,2\}|0)(?:\\.|\$))\{4\}\}\ \$string\ match\n======\n\n\[fh\]\ 2012-02-13\ 11:54:30:\n\nTo\ search\ IP\ ADDRESS\ using\ Regular\ Expression\n\n======\nset\ \ IP\ \"The\ Interface\ IP\ Address\ is\ 198.176.17.16\ \"\nregexp\ \{(25\[0-5\]|2\[0-9\]\[0-9\]|\[0-9\]?\[0-9\]\[0-9\]?)\\.\{3\}(25\[0-5\]|2\[0-9\]\[0-9\]|\[0-1\]?\[0-9\]\[0-9\]?)\ \$IP\ match\n======\n\n**\ Domain\ names\ **\n\n(First\ shot)\n\n======none\n^\[a-zA-Z\](\[a-zA-Z0-9-\]\{0,61\}\[a-zA-Z0-9\])?\\.\[a-zA-Z\](\[a-zA-Z0-9-\]\{0,61\}\[a-zA-Z0-9\])?(\\.\[a-zA-Z\](\[a-zA-Z0-9-\]\{0,61\}\[a-zA-Z0-9\])?)?\$\n======\n\nThis\ code\ does\ NOT\ attempt,\ obviously,\ to\ ensure\ that\ the\ last\ level\ of\ the\ regular\ expression\ matches\ a\ known\ domain...\n\n\n\n**\ Regular\ Expression\ for\ parsing\ http\ string\ **\n\n======\nregexp\ \{(\[\[^:\]\]+)://(\[\[^:/\]\]+)(:(\[\[0-9\]\]+))\}\ \[ns_conn\ location\]\ match\ protocol\ server\ x\ port\n======\n\nthe\ above\ author\ should\ remember\ this\ is\ a\ Tcl\ wiki,\ and\ not\ an\ \[aolserver\]\none,\ but\ thanks\ for\ the\ submission\ \;)\n\n\[PYK\]\ 2016-02-28:\ In\ the\ previous\ edit,\ a\ `-`\ character\ was\ added\ to\ the\ regular\ expression,\ prohibiting\ the\ occurrence\ of\ `-`\ in\ ''scheme''\ component\ of\ a\ URL.\ \ As\ far\ as\ I\ can\ tell,\ `-`\ is\ allowed\ in\ the\ ''scheme''\ component,\ so\ I've\ reverted\ that\ change\ in\ the\ expression\ above.\n\n\n**\ E-mail\ addresses\ **\ \n\nRS:\ No\ warranty,\ just\ a\ first\ shot:\n\n======none\n^\[A-Za-z0-9._-\][email protected]\[\[A-Za-z0-9.-\]+\$\n======\n\nUnderstand\ that\ this\ expression\ is\ an\ attempt\ to\ see\ if\ a\ string\ has\ a\ format\ that\ is\ compatible\ with\ ''normal''\ RFC\ SMTP\ email\ address\ formats.\ \ It\ does\ not\ attempt\ to\ see\ whether\ the\ email\ address\ is\ correct.\ \ Also,\ it\ does\ not\ account\ for\ comments\ embedded\ within\ email\ addresses,\ which\ are\ defined\ even\ though\ seldom\ used.\n\n\[bll\]\ 2017-6-30\ E-mail\ addresses\ are\ quite\ complicated.\ \ You\ must\ be\ careful\ not\ to\ reject\ valid\ e-mail\ addresses.\ \ For\ example,\ `%`\ and\ `+`\ characters\ are\ valid.\nNobody\ uses\ the\ `%`\ sign\ any\ more\ as\ it\ is\ not\ secure.\ \nThe\ `+`\ character\ is\ very\ useful,\ but\ unfortunately,\ there\ are\ a\ lot\ of\ incorrect\ e-mail\ validation\ routines\ that\ reject\ it.\n\nThe\ following\ pattern\ will\ still\ reject\ an\ e-mail\ of\ the\ form\ [email protected]\[\[ip-address\]\].\nNo\ lengths\ are\ checked.\ \ It\ does\ not\ check\ that\ the\ top-level\ domain\ (e.g.\ .org,\ .com,\ .solutions)\ is\ valid.\n\n======\nset\ ::emailpat\ \{\n^\n(\ \ #\ local-part\n\ \ (?:\n\ \ \ \ (?:\n\ \ \ \ \ \ (?:\[^\"().,:\;\\\[\\\]\\s\\\\@\]+)\ \ \ #\ one\ or\ more\ non-special\ characters\ (not\ dot)\n\ \ \ \ \ \ |\n\ \ \ \ \ \ (?:\n\ \ \ \ \ \ \ \ \"\ \ #\ begin\ quoted\ string\n\ \ \ \ \ \ \ \ (?:\n\ \ \ \ \ \ \ \ \ \[^\\\\\"\]\ \ #\ any\ character\ other\ than\ backslash\ or\ double\ quote\n\ \ \ \ \ \ \ \ \ |\n\ \ \ \ \ \ \ \ \ (?:\\\\.)\ #\ or\ a\ backslash\ followed\ by\ another\ character\n\ \ \ \ \ \ \ \ )+\ \ \ #\ repeated\ one\ or\ more\ times\n\ \ \ \ \ \ \ \ \"\ \ #\ end\ quote\n\ \ \ \ \ \ )\n\ \ \ \ )\n\ \ \ \ \\.\ \ \ #\ followed\ by\ a\ dot\n\ \ )*\ \ \ \ #\ local\ portion\ with\ trailing\ dot\ repeated\ zero\ or\ more\ times.\n\ \ (?:\[^\"().,:\;\\\[\\\]\\s\\\\@\]+)|(?:\"(?:\[^\\\\\"\]|(?:\\\\.))+\")\ \ #\ as\ above,\ the\ final\ portion\ may\ not\ contain\ a\ trailing\ dot\n)\[email protected]\n(\ \ #\ domain-name,\ underscores\ are\ not\ allowed\n\ \ (?:(?:\[A-Za-z0-9\]\[A-Za-z0-9-\]*)?\[A-Za-z0-9\]\\.)+\ #\ one\ or\ more\ domain\ specifiers\ followed\ by\ a\ dot\n\ \ (?:\[A-Za-z0-9\]\[A-Za-z0-9-\]*)?\[A-Za-z0-9\]\ \ \ \ \ #\ top-level\ domain\n\ \ \\.?\ \ \ \ \ \ \ \ \ \ \ #\ may\ be\ fully-qualified\n)\n\$\n\}\n\nproc\ testit\ \{\ valid\ testaddr\ \}\ \{\n\ \ set\ rc\ NG\n\ \ if\ \{\ \[regexp\ -expanded\ \$::emailpat\ \$testaddr\ emailaddr\ local\ domain\]\ \}\ \{\n\ \ \ \ set\ rc\ OK\n\ \ \}\n\ \ if\ \{\ \$rc\ ne\ \$valid\ \}\ \{\n\ \ \ \ puts\ \"Fail:\ (\$valid)\ \$testaddr\"\n\ \ \}\ elseif\ \{\ \$valid\ eq\ \"OK\"\ \}\ \{\n\ \ \ \ puts\ \"ok:\ \$testaddr\ \$local\ \$domain\"\n\ \ \}\n\}\n\n#\ valid\ e-mails\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{internal-quote.\"*()\"[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{\"[email protected]\"@example.com\}\ntestit\ OK\ \{\"very.(),:\;<>\[\]\\\".VERY.\\\"[email protected]\\\\\ \\\"very\\\".unusual\"@strange.example.com\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{#!\$%&'*+-/=?^_`\{\}|[email protected]\}\ntestit\ OK\ \{\"()<>\[\]:,\;@\\\\\\\"!#\$%&'-/=?^_`\{\}|\ ~.a\"@example.org\}\ntestit\ OK\ \{\"\ \"@example.org\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\ntestit\ OK\ \{[email protected]\}\n#\ invalid\ tests\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{Abc.example.com\}\ntestit\ NG\ \{[email protected]@[email protected]\}\ntestit\ NG\ \{a\"b(c)d,e:f\;g<h>i\[j\\k\][email protected]\}\ntestit\ NG\ \{just\"not\"[email protected]\}\ntestit\ NG\ \{this\ is\"not\\[email protected]\}\ntestit\ NG\ \{this\\\ still\\\"not\\\\[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]\}\ntestit\ NG\ \{[email protected]_dom.com\}\n======\n\nReference:\ https://en.wikipedia.org/wiki/Email_address#Examples\n\n\n\n\n\n**\ XML-like\ data\ **\n\nTo\ match\ something\ similar\ to\ XML-tags\ you\ can\ use\ regular-expressions,\ too.\nLet's\ assume\ we\ have\ this\ text:\n\n======none\n%\ set\ text\ \{<bo>s</bo><it><bo>M</bo></it>\}\n======\n\nWe\ can\ match\ the\ body\ of\ '''bo'''\ with\ this\ regexp:\n\n======none\n%\ regexp\ \"<(bo)>(.*?)</bo>\"\ \$text\ dummy\ tag\ body\n======\n\nNow\ we\ extend\ our\ XML-text\ with\ some\ attributes\ for\ the\ tags,\ say:\n\n======\nset\ text2\ \{<bo\ h=\"m\">s</bo><it><bo>M</bo></it>\}\n======\n\nIf\ we\ try\ to\ match\ this\ with:\n\n======none\nregexp\ \"<(bo)\\\\s+(.+?)>(.*?)</bo>\"\ \$text2\ dummy\ tag\ attributes\ body\n======\n\nit\ ''won't\ work''\ anymore.\ \nThis\ is\ because\ `\\\\s+`\ is\ greedy\n(in\ contrary\ to\ the\ non-greedy\ `(.+?)`\nand\ `(.*?)`)\ and\ that\ (the\ one\ greedy-operator)\nmakes\ the\ whole\ expression\ greedy.\n\nSee\ \[Henry\ Spencer\]'s\ reply\ in\n\[http://groups.google.com/d/msg/comp.lang.tcl/FddeFPbTFw8/asoMuv7dWqIJ%|%tcl\n8.2\ regexp\ not\ doing\ non-greedy\ matching\ correctly\],\ \[comp.lang.tcl\],\ 1999-09-20.\n\nThe\ ''correct''\ way\ is:\n\n======\nregexp\ \"<(bo)\\\\s+?(.+?)>(.*?)</bo>\"\ \$text2\ dummy\ tag\ attributes\ body\n======\n\nNow\ we\ can\ write\ a\ more\ general\ XML-to-whatever-translater\ like\ this:\n\ \ \ 1.\ Substitute\ `\[\[`\ and\ `\]\]`\ with\ their\ corresponding\ `\\\[\[`\ and\ `\\\]\]`\ to\ avoid\ confusion\ with\ `\[subst\]`\ in\ 3.\n\ \ \ 2.\ Substitute\ the\ tags\ and\ attributes\ with\ commands\n\ \ \ 3.\ Do\ a\ `\[subst\]`\ on\ the\ whole\ text,\ thereby\ calling\ the\ inserted\ commands\n\n======\nproc\ xml2whatever\ \{text\ userCallback\}\ \{\n\ \ \ \ set\ text\ \[string\ map\ \{\[\ \\\\\[\ \]\ \\\\\]\}\ \$text\]\n\ \ \ \ #\ replace\ all\ tags\ with\ a\ call\ to\ userCallback\n\ \ \ \ #\ this\ has\ to\ be\ done\ multiple\ times,\ because\ of\ nested\ tags\n\ \ \ \ #\ match\ each\ tag\ (everything\ not\ space\ after\ <)\n\ \ \ \ #\ and\ all\ the\ attributes\ (everything\ behind\ the\ tag\ until\ >)\n\ \ \ \ #\ then\ match\ body\ and\ the\ end-tag\ (which\ should\ be\ the\ same\ as\ the\n\ \ \ \ #\ first\ matched\ one\ (\\1))\n\ \ \ \ while\ \{\[regsub\ -all\ \{<(\\S+?)(\\s+\[^\\s>\].*?)?\\s*?>(.*?)</\\1>\}\ \$text\ \"\\\[\[list\ \$userCallback\ \\\\1\ \\\\2\ \\\\3\]\\\]\"\ text\]\}\ \{\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ puts\ doop:\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ puts\ \$text\n\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ puts\ \{\}\n\ \ \ \ \ \ \ \ #\ do\ nothing\n\ \ \ \ \}\n\ \ \ \ return\ \[subst\ -novariables\ -nobackslashes\ \$text\]\n\}\n\n#\ is\ called\ from\ xml2whatever\ with\n#\ element:\ the\ xml-element\n#\ attributes:\ the\ attributes\ of\ xml-element\n#\ body:\ body\ of\ xml-element\nproc\ myTranslate\ \{element\ attributes\ body\}\ \{\n\ \ \ \ #\ Remove\ the\ bracket\ armour\ added\ by\ xml2whatever\n\ \ \ \ set\ element\ \[string\ map\ \{\\\\\[\ \[\ \\\\\]\ \]\}\ \$element\]\n\ \ \ \ set\ attributes\ \[string\ map\ \{\\\\\[\ \[\ \\\\\]\ \]\}\ \$attributes\]\n\ \ \ \ set\ body\ \[string\ map\ \{\\\\\[\ \[\ \\\\\]\ \]\}\ \$body\]\n\ \ \n\ \ \ \ #\ map\ bo\ -\ b\;\ it\ -\ i\ (leave\ rest\ alone)\n\ \ \ \ #\ do\ a\ subst\ for\ the\ body,\ because\ of\ possible\ nested\ tags\n\ \ \ \ switch\ --\ \$element\ \{\n\ \ \ \ \ \ \ \ bo\ \{\ return\ \"<b>\[subst\ -novariables\ -nobackslashes\ \$body\]</b>\"\}\n\ \ \ \ \ \ \ \ it\ \{\ return\ \"<i>\[subst\ -novariables\ -nobackslashes\ \$body\]</i>\"\}\n\ \ \ \ \ \ \ \ default\ \{\ return\ \"<\$element\$attributes>\[subst\ -novariables\ -nobackslashes\ \$body\]</\$element>\"\ \}\n\ \ \ \ \}\n\}\n======\n\nCall\ the\ parser\ with:\n\n======\nxml2whatever\ \$text2\ myTranslate\n======\n\nYou\ have\ to\ be\ careful,\ though.\ Don't\ do\ this\ for\ large\ texts\ or\ texts\nwith\ many\ nested\ xml-tags\ because\ the\ regular-expression-machine\ is\ not\ the\nthe\ right\ tool\ to\ parse\ large,nested\ files\ efficiently.\ (\[Stefan\ Vogel\])\n\n\[DKF\]:\ I\ agree\ with\ that\ last\ point.\ \ If\ you\ are\ really\ dealing\ with\ \[XML\],\ it\ is\ better\ to\ use\ a\ proper\ tool\ like\ \[TclDOM\]\ or\ \[tDOM\].\n\n\[PYK\]\ 2015-10-30:\ \ I\ patched\ the\ regular\ expression\ to\ fix\ an\ issue\ where\ the\ attributes\ group\ could\ pick\ up\ part\ of\ the\ tag\ in\ documents\ containing\ tags\ with\ similar\ prefixes.\ \ The\ fix\ is\ to\ use\ whitespace\ followed\ by\ non-whitespace\ other\ than\ `>`\ to\ detect\ the\ beginning\ of\ attributes.\ \ There\ are\ other\ things\ \n\n\n\n\n**\ Negated\ string\ **\n\n''\[Bruce\ Hartweg\]\ wrote\ in\ comp.lang.tcl:''\nYou\ can't\ negate\ a\ regular\ expression,\ but\ you\ CAN\ negate\ a\ regular\ expression\ that\ is\ only\ a\ simple\ string.\ Logically,\ it's\ the\ following:\n\ \ \ *\ match\ any\ single\ char\ except\ first\ letter\ in\ the\ string.\n\ \ \ *\ match\ the\ first\ char\ in\ string\ if\ followed\ by\ any\ letter\ except\ the\ 2nd\n\ \ \ *\ match\ the\ first\ two\ if\ followed\ by\ any\ but\ the\ third,\ et\ cetera\n\nThen\ the\ only\ thing\ more\ is\ to\ allow\ a\ partial\ match\ of\ the\ string\ at\ end\ of\ line.\ So\ for\ a\ regexp\ that\ matches\n\ any\ line\ that\ DOES\ NOT\ have\ the\ word\ ''foo'':\n\n======\nset\ exp\ \{^(\[^f\]|f\[^o\]|fo\[^o\])*.\{0,2\}\$\}\n======\n\nThe\ following\ proc\ will\ build\ the\ expression\ for\ any\ given\ string\n\n======\nproc\ reg_negate\ \{str\}\ \{\n\ \ \ \ set\ partial\ \"\"\n\ \ \ \ set\ branches\ \[list\]\n\ \ \ \ foreach\ c\ \[split\ \$str\ \"\"\]\ \{\n\ \ \ \ \ \ \ \ lappend\ branches\ \[format\ \{%s\[^%s\]\}\ \$partial\ \$c\]\n\ \ \ \ \ \ \ \ append\ partial\ \$c\n\ \ \ \ \}\n\ \ \ \ set\ exp\ \[format\ \{^(%s)*.\{0,%d\}\$\}\ \[join\ \$branches\ \"|\"\]\ \\\n\ \ \ \ \ \ \ \ \[expr\ \[string\ length\ \$str\]\ -1\]\]\n\}\n======\n\n\[Donal\ Fellows\]\ followed\ up\ with:\n\nThat's\ just\ set\ me\ thinking\;\ you\ can\ do\ this\ by\ specifying\ that\ the\ \nwhole\ string\ must\ be\ either\ not\ the\ character\ of\ the\ ''antimatch''*,\ \nor\ the\ first\ character\ of\ the\ antimatch\ so\ long\ as\ it\ is\ not\ \nfollowed\ by\ the\ rest\ of\ the\ antimatch.\ \ This\ leads\ to\ a\ fairly\ \nsimply\ expressed\ pattern.\n\n======\nset\ exp\ \{^(?:\[^f\]|f(?!oo))*\$\}\n======\n\nIn\ fact,\ this\ allows\ us\ to\ strengthen\ what\ you\ say\ above\ to\ allow\ \nthe\ matching\ of\ any\ negated\ regular\ expression\ directly\ so\ long\ as\ the\ first\ \ncomponent\ of\ the\ antimatch\ is\ a\ literal,\ and\ the\ rest\ of\ the\ \nantimatch\ is\ expressible\ in\ an\ ERE\ lookahead\ constraint\ (which\ \nimposes\ a\ number\ of\ restrictions,\ but\ still\ allows\ for\ some\ fairly\ \nsophisticated\ patterns.)\n\n*\ Anything's\ better\ than\ overloading\ 'string'\ here!\n\n\[JMN\]\ 2005-12-22:\nCould\ someone\ please\ explain\ what\ is\ meant\ by\ a\ 'negated\ string'\ here?\nSpecifically\ -\ what\ do\ the\ above\ achieve\ that\ isn't\ satisfied\ by\ the\ simpler:\ \n\n======\nset\ exp\ \{^(?!(.*foo.*))\}\n======\n\nDoesn't\ the\ following\ snippet\ from\ the\ regexp\ manpage\ indicate\ that\ a\ regexp\ can\ be\ negated?\ \nwhere\ does(or\ did?)\ the\ 'simple\ string'\ requirement\ come\ in?\ -\ is\ this\ info\ no\ longer\ current?\n\n\ (?!re)\n\ negative\ lookahead\ (AREs\ only),\ matches\ at\ any\ point\ where\ no\ substring\ matching\ re\ begins\ \n\n\[Lars\ H\]:\ It\ indeed\ seems\ the\ entire\ problem\ is\ rather\ trivial.\ In\ Tcl\ 7\ (before\ AREs)\ one\ sometimes\ had\ to\ do\ funny\ tricks\ like\ the\ ones\ Bruce\ Hartweg\ performs\ above,\ but\ his\ use\ of\ `\{0,2\}`\ means\ he\ must\ be\ assuming\ AREs.\ Perhaps\ there\ was\ a\ transitory\ period\ where\ one\ was\ available\ but\ not\ the\ other.\n\n\[Oleg\]\ 2009-12-11:\nIf\ one\ needs\ to\ match\ any\ string\ but\ 'foo',\ then\ the\ following\ will\ do\ the\ work:\n\n======none\nset\ exp\ \{^((?!foo).*)|^(foo.+)\}\n======\n\nAnd\ in\ general\ case\ when\ one\ needs\ to\ match\ any\ string\ that\ is\ neither\ 'foo'\ nor\ 'bar',\ then\ the\ following\ will\ do\ the\ work:\n\n======none\nset\ exp\ \{^((?!(foo|bar)).*)|^((foo|bar).+)\}\n======\n\n\[CRML\]\ 2013-11-06\nIn\ general\ case\ when\ one\ needs\ to\ match\ any\ string\ that\ is\ neither\ 'foo'\ nor\ 'bar'\ might\ be\ done\ using:\n\n======none\nset\ exp\ \{^(?!((foo|bar)\$))\}\n======\n\n\[AMG\]:\ Oleg's\ regexps\ confuse\ me.\ \ Translated\ literally,\ I\ read\ them\ as\ \"match\ any\ string\ that\ does\ not\ begin\ with\ `foo`\ (or\ `bar`)\ unless\ that\ string\ has\ more\ characters\ after\ the\ `foo`\ (or\ `bar`).\"\ \ Very\ indirect,\ I\ must\ say.\ \ CRML's\ suggestion\ I\ like\ better,\ though\ I\ would\ drop\ the\ extra\ parentheses\ to\ obtain:\ `^(?!(foo|bar)\$)`.\ \ This\ says,\ \"match\ any\ string\ that\ does\ not\ begin\ with\ either\ `foo`\ or\ `bar`\ when\ immediately\ followed\ by\ end\ of\ string.\"\ \ In\ other\ words,\ \"match\ any\ string\ that\ is\ not\ exactly\ `foo`\ or\ `bar`.\"\n\n\n\n**\ Turn\ a\ string\ into\ %hex-escaped\ (url\ encoded)\ characters:\ **\n\ne.g.\ `Csan\ ->\ %43%73%61%6E`\n\n======\nregsub\ -all\ --\ \{(.)\}\ \$string\ \{%\[format\ \"%02lX\"\ \[scan\ \\1\ \"%c\"\]\]\}\ new_string\nsubst\ \$new_string\n======\n\nThis\ demonstrates\ the\ power\ of\ using\ \n\[regsub\]\ together\ with\ \n\[subst\],\ which\ is\ regarded\ as\ one\ of\ the\ most\ powerful\ ways\ to\ use\ regular\ expressions\ in\ Tcl.\n\n\n\n**\ Turn\ a\ string\ into\ %hex-escaped\ (url\ encoded)\ characters\ (part\ 2)\ **\n\nThis\ one\ makes\ the\ result\ more\ readable\ and\ still\ quite\ safe\ to\ use\ in\ URLs\ne.g.\ http://wiki.tcl.tk\ ->\ http%3A%2F%2Fwiki%2Etcl%2Etk\n\n======\nregsub\ -all\ --\ \{(\[^A-Za-z0-9_-\])\}\ \$string\ \{%\[format\ \"%02lX\"\ \[scan\ \\1\ \"%c\"\]\]\}\ new_string\nsubst\ \$new_string\n======\n\n\[nl\]\n\n----\n\n\[Joe\ Mistachkin\]\n\nThe\ inverse\ of\ the\ above\ (not\ optimized):\n\n======\nregsub\ -all\ --\ \{%(\[0123456789ABCDEF\]\[0123456789ABCDEF\])\}\ \$string\ \{\[format\ \"%c\"\ 0x\\1\]\}\ new_string\nsubst\ \$new_string\n======\n\n\n\n**\ Caveats\ about\ using\ `\[regsub\]`\ with\ `\[subst\]`\ **\n\n\[glennj\]\ 2008-12-16:\ It\ can\ be\ dangerous\ to\ blindly\ apply\ `\[subst\]`\ to\ the\ results\ of\ `\[regsub\]`,\ particularly\ if\ you\ have\ not\ validated\ the\ input\ string.\ \ Here's\ an\ example\ that's\ not\ too\ contrived:\n\n======\nset\ string\ \{\[some\ malicious\ command\]\}\nregsub\ -all\ \{\\w+\}\ \$string\ \{\[string\ totitle\ &\]\}\ result\nsubst\ \$result\n======\n\nThis\ results\ in\ `invalid\ command\ name\ \"Some\"`.\ \ What\ if\ `\$string`\ was\ `\[\[exec\ format\ c:\]\]`?\n\nSee\ DKF's\ \"proc\ regsub-eval\"\ contribution\ in\ `\[regsub\]`\ to\ properly\ prepare\ the\ input\ string\ for\ substitution.\ \ Paraphrased:\n\n======\nset\ string\ \{\[some\ malicious\ command\]\}\nset\ escaped\ \[string\ map\ \{\\\[\ \\\\\[\ \\\]\ \\\\\]\ \\\$\ \\\\\$\ \\\\\ \\\\\\\\\}\ \$string\]\nregsub\ -all\ \{\\w+\}\ \$escaped\ \{\[string\ totitle\ &\]\}\ result\nsubst\ \$result\n======\n\nwhich\ results\ in\ what\ you'd\ expect:\ \ the\ string\ \"\[\[Some\ Malicious\ Command\]\]\"\n\n\[APN\]\ I\ don't\ follow\ why\ all\ the\ extra\ \\\ are\ needed\ in\ the\ string\ map.\ The\nfollowing\ should\ work\ just\ as\ well?\n\n======\nset\ escaped\ \[string\ map\ \{\[\ \\\\\[\ \]\ \\\\\]\ \$\ \\\\\$\ \\\\\ \\\\\\\\\}\ \$string\]\n======\n\n\[PYK\]\ 2016-05-28:\ \ Indeed:\n\n======\nexpr\ \{\ \[list\ \{*\}\{\ \[\ \ \\\\\[\ \ \]\ \ \\\\\]\ \ \$\ \ \\\\\$\ \\\\\ \\\\\\\\\}\]\n\ \ \ \ eq\ \[list\ \{*\}\{\\\[\ \ \\\\\[\ \\\]\ \ \\\\\]\ \\\$\ \ \\\\\$\ \\\\\ \\\\\\\\\}\]\}\ \;#\ ->\ 1\n======\n\n\n----\n\n**\ Maintain\ proper\ spacing\ when\ formatting\ for\ HTML\ **\n\n\[DG\]\ got\ this\ from\ \[Kevin\ Kenny\]\ on\ c.l.t.\n\n======none\nregsub\ -all\ \{\ (?=\ )\}\ \$line\ \{\\&nbsp\;\}\ line\n\nset\ line\ \{this\ is\ an\ \ \ \ example\}\nregsub\ -all\ \{\ (?=\ )\}\ \$line\ \{\\&nbsp\;\}\ line\nset\ line\n======\n\nAnd\ the\ output\ is:\n\n======\nthis\ is\ an&nbsp\;&nbsp\;&nbsp\;\ example\n======\n\nTabs\ require\ replacement,\ too:\n\n======\nset\ tabFill\ \"\[string\ repeat\ \\\\&nbsp\\\;\ 7\]\ \"\nregsub\ -all\ \{\\t\}\ \$line\ \$tabFill\ line\n======\n\n----\n\n\[glennj\]:\ \ Taken\ from\ comp.lang.perl.misc,\ transform\ variable\ names\ into\ StudlyCapsNames:\n\n======\nset\ old_vars\ \{VARIABLE_ONE\ VARIABLE_NUMBER_TWO\ a_really_long_VARIABLE_name\}\nset\ NewVars\ \{\}\nforeach\ v\ \$old_vars\ \{\n\ \ \ regsub\ -all\ \{_?(.)(\[^_\]*)\}\ \$v\ \{\[string\ toupper\ \"\\1\"\]\[string\ tolower\ \"\\2\"\]\}\ new\n\ \ \ lappend\ NewVars\ \[subst\ \$new\]\n\}\n======\n\n----\n\nWhen\ using\ \[ASED%|%ASED's\]\ syntax\ checker\ you\ get\ an\ error\ of\ you\ don't\ use\ the\ `--`\ option\ to\ `\[regexp\]`.\ Instead\ of\ `regexp\ \{(\[\[^A-Za-z0-9_-\]\])\}\ \$string`\ you\ have\ to\ write\ `regexp\ --\ \{(\[\[^A-Za-z0-9_-\]\])\}\ \$string`\n\n----\n\n\[LV\]:\ A\ user\ recently\ asked:\n\nI\ have\ a\ string\ that\ I'm\ trying\ to\ parse.\ Why\ doesn't\ this\ seem\ to\ work?\n\n======\n%\ set\ str\ \{Acc\ No:\ 12345\}\n%\ set\ num\ \[regexp\ \{.*?(\\d+).*\}\ \$str\ junk\ result\]\n%\ puts\ \$result\n1\n======\n\nIt\ looks\ to\ me\ like\ the\ `*?`\ causes\ the\ subsequent\ `\\d+`\ to\ also\ be\ greedy\nand\ only\ match\ the\ first\ hit.\ Did\ I\ figure\ that\ out\ correctly?\ I\ presume\nthat\ we\ currently\ don't\ have\ a\ way\ to\ ''turn\ off''\ the\ greediness\ item?\n\nOf\ course,\ in\ this\ simplified\ problem,\ one\ could\ just\ drop\ the\ greediness\ and\ncode\n\ \n======none\n%\ set\ num\ \[regexp\ \{(\\d+)\}\ \$str\ junk\ result\]\n%\ puts\ \$result\n12345\n======\n\nI'll\ let\ the\ user\ decide\ if\ that\ suffices.\n\n----\n\n**\ How\ do\ you\ select\ from\ two\ words?\ **\n\n======none\n%\ set\ word\ \"foo\"\n%\ set\ result\ \[regexp\ \{(foo|bar)\}\ match\ zzz\]\n%\ set\ zzz\ncan't\ read\ \"zzz\":\ no\ such\ variable\n???\n======\n\n\[LES\]:\ You\ got\ the\ regexp\ syntax\ wrong\ and\ tried\ to\ match\ the\ regular\ expression\ with\ the\ string\ \"match\".\ There\ is\ no\ \"zzz\"\ variable\ (the\ actual\ match\ variable\ in\ your\ code)\ because\ your\ regular\ expression\ does\ not\ match\ the\ string\ \"match\".\ Try\ this:\n\n======none\n%\ set\ word\ \"foo\"\n%\ set\ result\ \[regexp\ \{(foo|bar)\}\ \$word\ match\ zzz\]\n%\ set\ match\n======\n\nNote\ that\ I\ could\ have\ dropped\ the\ \"zzz\"\ variable,\ but\ left\ it\ there\ as\ a\ second\ match\ variable,\ as\ an\ exercise\ to\ you.\ You\ should\ understand\ why\ and\ what\ it\ does\ if\ you\ read\ the\ \[regexp\]\ page\ and\ assimilate\ the\ syntax.\n\n\n\n**\ Infinite\ spaces\ at\ start\ and\ end\ **\n\n\[RUJ\]:\ Could\ you\ match\ the\ following\ pattern\ of\ following\ string:\ infinite\ spaces\ at\ start\ and\ end.\n\n======none\n%\ set\ str\ \"\ \ sjkhf\ sdhj\ \ \ \"\n======\n\n\[LV\]:\ try\n\n======\nset\ rest\ \[regexp\ \{^\ +.*\ +\$\}\ \$str\ match\]\nputs\ \$rest\n======\n\nwhich\ should\ have\ a\ value\ of\ 1\ (in\ other\ words,\ it\ matched).\nOf\ course,\ if\ those\ leading\ and\ trailing\ spaces\ are\ optional,\ then\ change\nthe\ +\ to\ a\ *.\n\n\[CRML\]\ non\ greedy\ or\ greedy\ does\ not\ give\ the\ same\ result.\ In\ the\ previous\ example,\ the\ .*\ matches\ all\ the\ string\ up\ to\ the\ last\ but\ one\ char.\n\n======\nset\ rest\ \[regexp\ \{^\ +(.*?)\ *\$\}\ \$str\ match\ noinfinite\]\nputs\ \$rest\nputs\ \"|\$noinfinte|\"\nset\ rest\ \[regexp\ \{^\ +(.*)\ *\$\}\ \$str\ match\ noinfinite\]\nputs\ \$rest\nputs\ \"|\$noinfinte|\"\n======\n\n----\n\n**\ URL\ Parser\ **\n\nSee\ \[URL\ Parser\].\n\n----\n\n**\ Match\ a\ \"quoted\ string\"\ **\n\n\[AMG\]:\ Adapted\ from\ \[Wibble\]:\n\n======\nproc\ quoted-string\ \{str\}\ \{\n\ \ \ \ regexp\ \{^\"(?:\[^\\\\\"\]|\\\\.)*\"\$\}\ \$str\n\}\n======\n\nThis\ recognizes\ strings\ starting\ and\ ending\ with\ double\ quote\ characters.\ \ Any\ character\ can\ be\ embedded\ in\ the\ string,\ even\ double\ quotes,\ when\ preceded\ by\ an\ odd\ number\ of\ backslashes.\n\n\n\n**\ Word\ Splitting,\ Respecting\ Quoted\ Strings\ **\n\ngiven\ some\ text,\ e.g.\n\n======none\nhere\ \ \ \ is\ some\ \"quoted\ \ \ \ \ text\ with\ \ \ lots\ \ \ \ of\ space\"\ \ and\ \ \ \ more\ \ \ \n======\n\nhow\ to\ parse\ it\ into\n\n======none\nhere\ is\ some\ \{quoted\ \ \ \ \ text\ with\ \ \ lots\ \ \ \ of\ space\}\ and\ more\n======\n\n======\nregexp\ -all\ -inline\ \{(?:\[^\ \"\]|\\\"\[^\"\]*\\\")+\}\n======\n\nsee\ \[KBK\],\ #tcl\ irc\ channel,\ 2012-12-02\n\n**\ split\ a\ string\ into\ n-length\ substrings\ **\n\n======\nregexp\ -all\ -inline\ \".\{\$n\}\"\ \$string\n======\n\nevilotto,\ #tcl,\ 2013-02-07\n\n\[https://groups.google.com/forum/#!topic/comp.lang.tcl/mQenwyY578o/discussion%|%:)\ Contest:\ fast\ way\ to\ chop\ string\ in\ short\ fixed\ pieces,\ comp.lang.tcl,\ 2004-07-19%|%\]\n\n\n\n**\ At\ Least\ 1\ Alpha\ Character\ Interspersed\ with\ 0\ or\ More\ Digits\ **\n\n======none\nregexp\ \{\[\[:alnum:\]\]*\[\[:alpha:\]\]\[\[:alnum:\]\]*\}\ \$string\n======\n\n\n**\ Matching\ a\ group\ of\ strings\ **\n\n\nregexp\ -nocase\ \{string1,string2,string3\ ...\}\ \$string\n======\n\nWe\ can\ match\ a\ group\ of\ strings\ or\ subjects\ in\ a\ single\ regular\ expression\n\n\n**\ \[Sqlite\]\ Numeric\ Literal\ **\n\n======\nregexp\ \{^(\[\[:digit:\]\]*)(?:\\.(\[\[:digit:\]\]+))?(?:\[eE\]\[+-\]?(\[\[:digit:\]\]+))?\$|^0x\[a-fA-F\]+\$\}\ number\ int\ mant\ exp\ \n======\n\n\n----\n'''\[ak\]\ -\ 2017-08-08\ 03:32:33'''\n\nRegarding\ negation\ of\ regular\ expressions.\n\nWhile\ the\ regular\ expression\ syntax\ does\ not\ allow\ for\ simple\ negation\ the\ underlying\ formalism\ of\ (non)deterministic\ finite\ automata\ does.\ Simply\ swap\ final\ and\ non-final\ states\ to\ negate,\ i.e.\ complement\ it.\n\nSee\ for\ example\ the\nhttps://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/grammar_fa/fa.html%|%grammar::fa%|%\npackage\ in\ Tcllib,\ which\ provides\ a\nhttps://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/grammar_fa/fa.html#51%|%complement%|%\ method.\nIt\ is\ implemented\ in\ the\nhttps://core.tcl.tk/tcllib/doc/trunk/embedded/www/tcllib/files/modules/grammar_fa/faop.html%|%operations%|%\npackage.\ As\ are\ methods\ to\ convert\ from\ and\ to\ regular\ expressions.\n\n<<categories>>\ Tutorial\ |\ String\ Processing} CALL {my revision {Regular Expression Examples}} CALL {::oo::Obj5049020 process revision/Regular+Expression+Examples} CALL {::oo::Obj5049018 process}

-errorcode

NONE

-errorinfo

Unknow state transition: LINE -> END
    while executing
"error $msg"
    (class "::Wiki" method "render_wikit" line 6)
    invoked from within
"my render_$default_markup $N $C $mkup_rendering_engine"
    (class "::Wiki" method "render" line 8)
    invoked from within
"my render $name $C"
    (class "::Wiki" method "revision" line 31)
    invoked from within
"my revision $page"
    (class "::Wiki" method "process" line 51)
    invoked from within
"$server process [string trim $uri /]"

-errorline

4