A Primitive POP3-based Spam Killer

Here's how it is:

The "Swen" worm [L1 ] achieved critical mass this week (2003-09-23). My account gets 50 such mails per hour, and each mail has some 150KB. Downloading and processing these locally with my generic spam filter creates huge temporary mailboxes and uses a lot of time. I use a freemailer via POP3, which doesn't seem to do anything about it, and I am much too lazy to switch my mail account.

So I thought, Tcl certainly has a POP3 client library, and these mails look all alike, why don't I create a Tcl script to delete these remotely. Said and done, this is the cleaned-up result. Afterwards I learned that Mailfilter [L2 ] uses the same technique.

The script requires tcllib (I used 1.4). The script also requires that the POP3 "list" command returns the size of the mails and that the POP3 server supports the "top" command. Both features are mentioned as optional in the documentation of the Tcllib POP3 package. Log output goes to stdout, I just redirect that to a logfile in my integration setup.

I use this script as a preconnect filter for fetchmail, see the "preconnect" keyword in the fetchmail configuration file. In that combination it will miss some mails between calling the script and fetchmail retrieving the rest of the mails. But that doesn't bother me here, my generic spam filters can handle those remnants.

The script is based on the size of mails and on keywords in the "Subject" and "From" headers. Other filters that I have seen mentioned, detect Windows EXE binaries, but that requires a download of a much larger part of the posts than just the mail headers.

Some of the code, like the reconnect, when it encounters a mail that should already have been deleted, may be just due to a buggy server implementation with my provider. Or I may be missing something about the POP3 protocol. The code could use some more cleanup and comments.

Have fun,

benny (BR)

 # ------------------------------------------------------------------------
 #      pop3-filter.tcl
 #-------------------------------------------------------------------------
 #      $Log: 10003,v $
 #      Revision 1.18  2004-04-06 06:00:07  jcw
 #      10003-1081195649-213.58.80.69
 #
 #      2003-09-19 benny        Created
 # ------------------------------------------------------------------------

 # configuration
 set host xxxxx
 set user xxxxx
 set pass xxxxx

 set retry_interval [expr {1000 * 15}] ;# every 15 seconds


 package require pop3
 package require log

 ::log::lvSuppress debug ;# silence the pop3 package


 # FIXME: Should use the log package here, when we have it required anyway.
 proc timestamp {} {
        return [clock format [clock seconds] -format "%Y-%m-%d %H:%M:%S"]
 }
 proc log {s} {
        puts "[timestamp] - $s"
 }

 proc process_all {host user pass} {
        log "Opening connection ..."
        if {[catch {set pop [::pop3::open $host $user $pass]} eresult]} {
                log "No connection: $eresult"
                return 0
        }
        log "Opened ..."

        set done 0
        set lasthigh -1
        while {!$done} {
                set done 1
                log "Retrieving list ..."
                set posts [::pop3::list $pop]
                log "We have [expr {[llength $posts]/2}] posts ..."

                log "Processing ..."
                foreach {post size} $posts {
                        set signature "\[$size\]"
                        if {[info exists deleted($post)]} {
                                log "Closing for flush ..."
                                catch {unset deleted}
                                set lasthigh -1
                                ::pop3::close $pop
                                log "Closed"
                                log "Opening connection ..."
                                if {[catch {
                                        set pop [::pop3::open \
                                                     $host $user $pass]
                                } eresult]} {
                                        log "No connection: $eresult"
                                        return 0
                                }
                                log "Re-opened ..."
                                set done 0
                                break   ;# from foreach
                        }

                        if {$post <= $lasthigh} {
                                continue
                        }
                        set lasthigh $post
                        if {$size < 20000} {
                                set good_by_signature($signature) 1
                                continue
                        }
                        set top [::pop3::top $pop $post 50]
                        foreach {subject matched} [analyse $top] {break}
                        append signature " \"$subject\""
                        if {{} != $matched} {
                                set done 0
                                log "Matched post $post $signature: $matched"
                                set deleted($post) 1
                                set deleted_by_signature($signature) 1
                                ::pop3::delete $pop $post
                                log "Deleted post $post $signature"
                        } else {
                                set good_by_signature($signature) 1
                        }
                }
        }

        set good [llength [array names good_by_signature]]
        set bad [llength [array names deleted_by_signature]]
        set all [expr {$good + $bad}]
        log "Checked $all, bad $bad, good $good"

        log "Closing ..."
        catch {unset deleted}
        set lasthigh 0
        ::pop3::close $pop
        log "Done"

        return 1
 }

 proc format_patterns {patterns} {
        set result {}
        foreach {type pattern} $patterns {
                set re ""
                switch $type {
                        from  {
                                set label {(to|cc|from|sender)}
                                append re {\n} $label ":" {[^\n]*} \
                                    {[[:<:]]} $pattern {[[:>:]]}
                        }
                        fullsubject {
                                set label {subject}
                                append re {\n} $label ": " \
                                    $pattern {\n}
                        }
                        subject {
                                set label {subject}
                                append re {\n} $label ":" {[^\n]*} \
                                    {[[:<:]]} $pattern {[[:>:]]}
                        }
                }
                lappend result $type $pattern $re
        }
        return $result
 }

 set patterns [format_patterns {
        from "customer bulletin"
        from "customer services"
        from "delivery service"
        from "delivery system"
        from "email service"
        from "email system"
        from "inet"
        from "internet"
        from "mail service"
        from "mail system"
        from "microsoft"
        from "ms"
        from "net"
        from "network"
        from "security"
        from "storage"
        fullsubject " *"
        fullsubject "abort advice"
        fullsubject "abort announcement"
        fullsubject "abort letter"
        fullsubject "abort message"
        fullsubject "abort notice"
        fullsubject "abort report"
        fullsubject "advice"
        fullsubject "announcement"
        fullsubject "bug advice"
        fullsubject "bug announcement"
        fullsubject "bug letter"
        fullsubject "bug message"
        fullsubject "bug notice"
        fullsubject "bug report"
        fullsubject "error advice"
        fullsubject "error announcement"
        fullsubject "error letter"
        fullsubject "error message"
        fullsubject "error notice"
        fullsubject "error report"
        fullsubject "failure advice"
        fullsubject "failure announcement"
        fullsubject "failure letter"
        fullsubject "failure message"
        fullsubject "failure notice"
        fullsubject "failure report"
        fullsubject "letter"
        fullsubject "message"
        fullsubject "new patch"
        fullsubject "notice"
        fullsubject "report"
        subject "critical pack"
        subject "critical patch"
        subject "critical update"
        subject "critical upgrade"
        subject "failure notice"
        subject "internet pack"
        subject "internet patch"
        subject "internet update"
        subject "internet upgrade"
        subject "last pack"
        subject "last patch"
        subject "last update"
        subject "last upgrade"
        subject "latest pack"
        subject "latest patch"
        subject "latest update"
        subject "latest upgrade"
        subject "microsoft pack"
        subject "microsoft patch"
        subject "microsoft update"
        subject "microsoft upgrade"
        subject "net pack"
        subject "net patch"
        subject "net update"
        subject "net upgrade"
        subject "network pack"
        subject "network patch"
        subject "network update"
        subject "network upgrade"
        subject "security"
        subject "service pack"
        subject "service patch"
        subject "service update"
        subject "service upgrade"
        subject "undeliverable"
 }]

 proc analyse {top} {
        set headerend [string first "\n\n" $top]
        if {-1 != $headerend} {
                set top [string range $top 0 $headerend]
        }
        set top "\n$top\n"
        if {![regexp -nocase {\nSubject: ([^\n]+)\n} $top all subject]} {
                set subject "<no subject>"
        }
        #log "Trying $subject"
        set matched {}
        foreach {type pattern re} $::patterns {
                if {[regexp -nocase $re $top]} {
                        lappend matched [list $type $pattern]
                }
        }
        return [list $subject $matched]
 }

 proc process {} {
        if {[process_all $::host $::user $::pass]} {
                set ::quit 1
                return
        } else {
                log "Wait for some time"
                after $::retry_interval process
        }
 }

 process
 if {{} != [after info]} {
        set quit 0
        vwait quit
 }

 # ------------------------------------------------------------------------
 #                              eof
 # ------------------------------------------------------------------------

See also


Thanks for sharing this, it's very useful!