Blue Line - spam filter

Blue Line was an email spam filter written in Tcl that allows you to preview your e-mail before it hits your e-mail client. Currently Blue Line is quite useful just to see the email and easily delete it. It is currently missing in action.

Attributes

current version
2.0
release time
2004-01-16
website
Blue Line

Version 2.0 Discussion

escargo 2004-01-22: I added a value in the "From" column to my friends list, and pressed shift-Refresh to update the color coding. The color coding updated, but the windown title bar (which had shown 4/1 before) did not change. Minimizing and restoring the window made it switch to 5/0, as I would have expected.

RT 2004-01-22: If you get a fresh download I think you'll find this is corrected.

Also, the application displays itself as "Blueline" not "Blue Line". (It would not be the first mismatch between the wiki and an application's own view of its name.)

RT: Yup. It's really "Blueline", Honest! :-) BTW, I just got the slightly patched version of DKF's Bayseian filter code from Dossy and it looks like only a few days to integrate that into the rule system.

escargo: My ISP uses SpamAssassin to detect spam. That means our mail has X-Spam-Status: fields added to each message. Some of these indicate that the mail has executable files in the attachments. Also, there are times when I would like to blacklist e-mail with certain domains in the Received: fields. (I don't know of any e-mail from the .ru domain that I really need ;) It would be nice to be able to select such values to blacklist (instead of whitelist as I can now). RT, 23Jan2004 - OK. I'll make 2.0.1 handle blacklists throught the GUI and also add a rule predicate to allow testing of arbitrary headers.

male 2004-23-01: Wow - blueline changed a lot and became really good!

RT - Thanks for the feedback. :)

In my daily work I use MailWasher Pro which supports regular expressions too!

RT Yes that's one of the apps I have installed and compare with.

What's about adding to the rule bodies the condition containsRE and lacksRE, and as well containsGLOB lacksGLOB? Just to allow filtering via regular expressions and glob patterns! RT, 23Jan2004 - I am planning to expand these options to include RE's in 2.0.1

If I understood it right, then each rule can contain a separate black- and whitelist, right? If so, why not buttons to edit such lists? I only found a way to edit these lists by choosing "Add and Edit" after right-clicking an email. RT, 23Jan2004 - well, each rule can contain 2 lists of properties: one to attach when the rule matches the message and another set to attach when the rule fails to match the message. I will certainly be making more things available by the GUI and a blacklist is first in line. For generalized rule editing I'm not settled on an approach yet. But they are easy to edit by hand if one is so inclined.

Another thing would be to allow the user to specify the amount of lines to be downloaded from emails. This could be important to find matching rules! Sometimes 50 lines are too few.

RT: Please try stopping the program and editing blueline.cfg. Replace "setitem toplines xxx" with whatever value you like. Save and restart.

And - a question - I added an email to the rule "interests". Before it had a spam value of 50. And afterwards the spam value didn't change, even if the rule interests with the spam value of 20 should match. Was is going wrong? RT Did you try Right-clicking the message and choosing the "Explain rating..." menu item? If you do and it's still not making sense please post some more details or email me directly (include the explain output). BTW, the formula for combining spam values (predictions) is taken from http://www.paulgraham.com/naivebayes.html RT 2nd followup (23Jan2004) - Oh, I think I get the problem now. After editing friend list, etc. you need to ask for a re-calc of rules. Hold down shift and push the refresh button; that should do it.

So - when will the Bayesian filter be integrated? RT It really shouldn't take more than 3-4 of my part-time available days. A couple weeks on the calendar tops I think (crossing fingers ;^).

Hold on!

Martin

Version 1.0 Discussion

Roy Terry 2003-10-08:

In future I'll be adding automated filtering based on rules and (perhaps) Bayesian analysis.

This single-file tcl app requires the tcllib module pop3, and for Windows it comes zipped with a slightly patched version of the pop3 file. Requires the excellent tcl package tablelist

Here is (version 1.0) main window http://tclbuzz.com/v0/blueline/assets/images/mainwindow-arrownotes.gif

Download from http://tclbuzz.com/v0/blueline


escargo 2003-11-13: I have started using it, but you might want to change your copyright date (otherwise it won't go into effect for another 18000 years;). Roy Terry - yeah. fixed, I guess too much java makes my zero finger twitchy :)

escargo 2003-11-22: I just wanted to say that Blue Line has become my favorite first line of defense against spam and viruses now. When you do add spam and virus filtering, one feature I hope you will add as well is a way of making the system explain why a message was deleted. My e-mail client now (Forte Agent) has very powerful filtering, but no way to make it explain why a message got filtered. I have had problems with mail that I wanted to receive being deleted (or filed into a particular folder) for no apparent reason. If I have a filter rule wrong, or if a rule is matching too generally, I have no way of determining why. (I can guess and even experiment, but I cannot just get the agent to tell me.)

escargo 2003-11-25: I just thought I would mention something that I find irritating. As Blue Line is processing messages (and giving its blue line progress bar), the text above the bar keeps changing in size. I find this distracting. I don't mind what it's says, but having it keep stretching (instead of just being wide enough to start with) is one of life's little annoyances. Not major, by any means, but I am always noticing it.

Roy Terry: yeah me too! This is fixed and some simple rule-based processing is also in place. I'll post a rev in the next week or so then it's on to Bayesian filtering... BTW, I will plan to have the filter offer any "reasons" it may have for calling a message spam

escargo: Nice to hear about progress. I had a spam filter that I was working on (which I lost because of a recent hard disk crash), but it had a couple of white lists, one for subjects (mainly used for easy-to-identify mailing lists), and one for hosts. This saved a lot of work in spam identification. I hope your filters are pluggable. What I was imagining in my filter was having a set of filters (which could be dynamically discovered), where a filter would be passed the message summary (headers prefetched) and the filter would return a result that would indicate 1) I can't tell if it's spam, ask somebody else, 2) It's spam and here's why, or 3) It's not spam and here's why. A message could then be passed along a chain of filters until one answers 2 or 3. Users of the program could write their own filters (since filters all have the same interface) to extend the filtering any way they wanted it. I even thought that filters could be stored here on the wiki for people to download.


escargo 2003-12-01: It might be interesting to contrast Blue Line with Mailbox Sweeper[L1 ] (which appears to be implemented in Java).

escargo 2003-12-08: What kind of limitations does Blue Line have? When I was away from home for an extended time, and started Blue Line, it informed me about some number of messages (about 180). After filtering them, I started up my e-mail client, which then informed me that I had 300 messages (including several virus payloads that I didn't want to download). Is there an upper limit on the number of messages Blue Line can handle? Roy Terry: No known limit. I ran it up to 508 messages when my wife's mailbox got some weird earthlink hiccup. I know of several potential "loopholes" for getting unexpected messages in your email client:

  1. A batch of messages arrives between the last time blueline refreshes and when your client empties the box.
  2. In earlier versions of the code I occasionally saw some unexplained failures to auto refresh (I no longer notice that behavior)
  3. The first version has a timeout of 50 seconds to fetch messages - that could explain what you saw (Sorry - :-/ ). The new version allows 1 second per message so it's very unlikely to timeout inappropriately.

Bottom line: should be no obvious limitation on number of messages. Though I did observe a "hang" of several seconds between when the code finished downloading 500+ messages and when they hit the tablelist display. That delay has yet to be investigated.

Also, there are times when I found when it would be useful to sort by something other than sequence number. Sorting by recipient, sender, subject, and number of lines in the message would be really handy. All that and more will be in the next release real soon now as I have it running - RT

SEH 2006-06-23 -- How does this application compare with tkbiff? And it would be interesting if you could integrate this with Sift-tcl (http://www.island-resort.com/sift-tcl.htm )


LV 2007-02-23 Is this code related to the DKF code in Bayesian Spam Filtering?