Regular expression to validate e-mail addresses

[Explain why it's a bad idea. Always. Point to Friedl RE, link to REs, refer to Perl FAQ, and ...]

bll 2017-6-30 : See Regular Expression Examples

One reason this concept is a bad idea - just because a string of characters could, potentially, be an email address does NOT mean that the email address is a valid one. The closest thing that one can do to 'validate' that an email address exists is to send it mail and then to receive mail back in a way that when you parse it, it appears to be valid. Note that even THAT isn't guaranteed.


Geez, can we all just lighten up? All this moralziing and absolutism about plain'ol email addresses? What are we protecting that's so precious. So if this is all true then if a human saw an email address like "joe at moonbase2 DOT never" then the human should just go stupid and assume the address is good? Please!

RE's have a useful place in working with email addresses. They can serve as a very useful screening step to indentify bogus addresses (e*tShlt@@@@home, ....) If you haven't seen a string of rejects from online registrations, you might be surprised at the junk that comes through. Some obvious heuristics include: finding an @ char, legal ending domain .com, .net, etc., has a char preceeding the @, and so on. - Roy Terry


... but many websites use broken regular expressions which reject perfectly valid email addresses (specifically, they seem to often assume that "+" is not allowed in the address, which is a major annoyance if you want to use sendmail/qmail built in features to construct "spam-tracable" addresses) See e.g. [L1 ] for a more detailed rant on this subject. -- MNO


All that regular expressions can do is determine whether a string contains a predetermined set of characters or not. This has NOTHING to do with whether a particular string corresponds to a mailbox known by a mail server, nor whether said server is permitted to actually deposit email into an existing box, etc. None of that can be determined by regular expression - or for that matter, by much of anything other than a mail delivery program!

Pertinent Perl stuff:

Don Libes wrote "Authentication by Email Reception" [L2 ] to describe "use of email addresses as an authentication mechanism ... [which] provides reasonable security at very low cost ..."

tcllib / mime contains commands (mime::parseaddress, mostly) to parse email addresses.


But mime::parseaddress fails (LV By fail, do you mean does not perform functions it is defined to do - or does it mean does not perform functions that you want it to do? The documentation for the function states that if more than one address is provided, they will be seperated by commas!) on simple, common cases. Specifically, it fails for some fairly typical user input:

     [email protected],[email protected]

is OK but mime::parseaddress seems unable to deal with

     [email protected] [email protected]

Then, when I try to stress it and send mail to

     foo<>[email protected]

The parse "succeeds" but the send fails. -- CLN


Please report such things as either bugs or feature requests at http://sourceforge.net/tracker/?group_id=12883 (tcllib trackers)


How about RE's plus programming? Here's Tcl code. RT 3June2004

 proc EmailAddrSuspect {addr msgV {addrV ""} } {
    upvar $msgV msg
    set msg ""

    # Let caller have the core address back if desired
    if {$addrV ne ""} {upvar $addrV email}
    

    # reduce whiteness to spaces only
    set addr [string map [list \n " " \t " " \r " "] $addr]

        if { ! [regexp {<([^>]+)>} $addr -> email]} {
                        # Didn't find a delimited address.
            # Try to remove a (...) comment
            # Note: "xxx"  are not true comments and can only occur
            # if the real address is surrounded by < > (above)
                        if { ! [regsub { *\([^)]*\) *} $addr "" email]} {
                set email $addr
                        }
                }
    set aperson {[^@.]+}
    set at      {@}
    set n1      {[^@.]+}
    set middle  {.*}
    set dot     {\.}
    set dom     {[a-z]{2,3}$}
    set RE $aperson$at$n1$middle$dot$dom
    if { ! [regexp $RE $email]} {
        # Not a proper email address
        if       { ! [regexp $aperson $email]} {
            set msg "missing person part of address"
        } elseif { ! [regexp $aperson$at $email]} {
            set msg "no @-sign"
        } elseif { ! [regexp $aperson$at$n1 $email]} {
            set msg "missing organization part of address"
        } elseif { ! [regexp $aperson$at$n1$middle$dot$dom $email]} {
            set msg "missing .xx or .xxx ending part"
        } else {
            set msg "abnormal format"
        }
    } else {
        # check for more than 1 @ sign
        if {[regexp -all @ $email] > 1} {
            set msg "too many @-signs"
        } else {
            # check for "bad" characters
            # let nnnn,[email protected] go thru
            if {[regexp {[<>()+=#$%;'"^]|[^0-9],} $email]} {
                set msg "appears to have illegal characters"
            }
        }
    }
    if {$msg ne ""} {return 1} {return 0}
 }

"The Limits of Regular Expressions" [L3 ] says a bit more.


Bah! Validating e-mail addresses is a piece of cake: [L4 ]