USS Clueless - POPFile
     
     
 

Stardate 20031025.1737

(On Screen): Everyone's got problems with spam. But Roadrunner has a good spam filter in its email system, and where some people talk about getting a hundred spams for every legitimate email, I haven't had that kind of problem.

(By the way, as long as this message is on the front page, the horizontal formatting will be a bit screwed up. Just wanted to let you know.)

My email program (Agent) also has pretty good tools permitting me to set up filtration rules based on message headers, and over a period of time I'd gotten it tuned pretty well. The only real problem was that my rules produced a lot of false positives, so I had to constantly monitor the reject folder looking for legitimate letters from readers.

But a couple of months ago, a lot more spam started getting through. Maybe RR changed their filter to make it reject less. (Possibly there had been complaints about legitimate mail not getting through.) Or maybe the spammers had figured out how to fool it. Anyway, it had become clear that header filtration didn't cut it, and I started looking into more powerful tools.

And a couple of the ones I looked at ended up requiring me to program them to access RR's mail server, which was a real problem. I couldn't for the life of me remember what my email password was; I programmed it into Agent years ago and never used it, and I could not locate the paper RR gave me when my first cable modem was installed which said what it was. Agent knew, but it was encrypted in Agent's configuration file.

But when, a few days ago, I mentioned this to a friend of mine, he hacked out a quick program which pretended to be a POP3 server for the first two handshakes. Setting up Agent to talk to it permitted me to find out what it was again. (Whew!) So I started looking again into tools.

The last time I had looked, a program called K9 looked like a particularly good choice. That was the one I was going to use, in particular because it seemed to have excellent documentation. I also had considered SAProxy (which is used by Mike Trettel). But then I saw this post by Scott Wasson, about POPFile.

It's open source, which is fine. I'm not allergic to open-source. He's been using it for a while and if it was crap he would tossed it. It was originally developed for Linux but there was a version for Windows which he was using, which was better.

My main fear was that the user interface would be hostile and opaque and that the documentation would be skimpy and cryptic, so I looked around, and found some examples of the user interface and enough documentation to make me feel as if I could probably deal with it. So I decided to give it a try, and I installed it on Thursday. (And this is the point where I add a "Don't Write Letters". Don't write to tell me about other alternatives I "might want to consider" or send me testimonials about other tools. I say this because every time in the past I've ever discussed some tool I was using, my mailbox got flooded with well-meaning email full of alternatives I didn't really have any interest in. I appreciate the concern, but please don't waste either your time or mine. Receiving one such letter is a pleasant surprise. Receiving fifty such letters is a royal pain.)

POPFile and the other tools I considered process the entire mail message, not just the header. K9 and POPFile are different in one critical way: K9 checks with the email server on its own schedule and downloads and processes mail which it stores. It also looks like a POP server to the email program, which accesses K9 to download the stuff K9 has buffered. K9 will download from the real mail server on its own schedule even if there's no email program running at the time.

Both K9 and POPFile run constantly, and leave an icon in the tray. But POPFile only actually runs (in the sense of doing a lot of work) when the email program wants to access mail. Which is why it turned out that I didn't need to tell POPFile my email password; what POPFile does is to insert itself into the return datastream from the email server. It transparently allows the email program to log in, but recognizes when messages are downloaded, and processes each such email as it comes by. That would be a problem if it were slow, but it isn't. The idea is that as it receives and processes a message it tries to recognize whether it's spam or not, and tags it in some way which is easy for the mail program to recognize before passing it on through (while keeping a copy).

POPFile can, for instance, put "[spam]" into the subject line, but I'm not using it that way. Rather, I've set it up to add a special line to the header, and I have set up Agent to look for that line.

POPFile is actually a general tool, and you can set up an arbitrarily large number of "Buckets" with unique names. The installer defaults to having four buckets (which I think were "spam", "inbox", "work" and "hobby"). I have a feeling that if there are too many, whose messages are too similar, then it can make a lot of mistakes. But I don't actually need anymore than just two: "spam" and "inbox".

So I set it up last night, and tested it to make sure that my email program was correctly working th

Captured by MemoWeb from http://denbeste.nu/cd_log_entries/2003/10/POPFile.shtml on 9/16/2004