bogofilter
This mailfilter is said to have a very low false positive rating, according to Dr. Gord Cormack. This made me think that it would be worthwhile setting up. Here's how that went.
getting bogofilter
I'm a Debian user, so things were pretty easy in that department for me, but bogofilter should run in userspace if one has no access to the system wide configuration.
apt-get install bogofilterOr if you like, download the fresh tars and prepare that yourself. Go to bogofilter source forge project page and download the latest version.
tar -jxvf bogofilter-1.1.1.tar.bz2 cd bogofilter-1.1.1 ./configure make sudo make installBogofilter will now be places in /usr/loca/bin/.
Once done one can start moving spam from the trash directories to a 'spam to learn' directory. Do this now, so that bogofilter can begin processing that while you tend to other duties.
For me, I store spam in a easy to find location within my IMAP directory, 'spam_to_learn'. For those new to Maildir format, beneath the directory itself, there exists 'cur', 'new', and 'tmp' directories.
cur holds current mail, mail which your mail client knows about. new holds mail which has been written fully to disk and your mail client does not know about. tmp holds partially stored mail. once the mail is fully written to disk the inode is updated and is stored in new.
The first thing to do is run bogofilter -s on all those nasty pieces of spam in the cur directory of your designated spam folder, if this is your first encounter with the command line one might be thinking, "sheesh running something on every file, that's gotta be a painstaking job". Luckily, this is a perfect job for a shell script.
We're going to learn each piece of spam then immediately delete it.
SPAM=$HOME/Maildir/.spam_to_learn/cur ; cd $SPAM; for i in $( find . -type f ) ; do echo $i ; bogofilter -s -I $i ; rm $i ; done ;Well, that's too simple isn't it. Here's a perl script for those who want, or like to modify things somewhat.
#!/usr/bin/perl
use strict;
use warnings;
my $bogo=`which bogofilter`;
my $spam=$ENV{'HOME'};
$bogo =~ s/\n//;
$spam =~ s/\/$//; # remove trailing slash, if it exists
$spam .= "/Maildir/.spam_to_learn/cur";
sub updatespam {
my $file = shift;
my $cmd = "$bogo -s -I $file";
open( F, "$cmd|" );
while( defined( my $i = <F> ) ) {
print( STDERR "$i");
}
close(F);
unlink($file);
}
opendir( my $d, $spam );
while( my $f = readdir($d) ) {
my $fn = "$spam/$f";
next if( -d $fn );
print( "$fn\n" );
updatespam($fn);
}
closedir($d);
This is then inserted into a crontab entry, like so:
*/5 8-22 * * * /usr/bin/perl /home/ed/code/scripts/bogo.plEvery five minutes my spam to learn directory is processed, between 8am and 10pm, due to the computer operating rather close to where I sleep.
so what now
Well now we can configure maildrop to destroy, or move to a junk directory those nasty pieces of spam for us. Bogofilter adds one of three lines to mail headers:X-Bogosity: Ham, tests=bogofilter, spamicity=0.025715, version=1.0.2 X-Bogosity: Unsure, tests=bogofilter, spamicity=0.500000, version=1.0.2 X-Bogosity: Spam, tests=bogofilter, spamicity=0.999899, version=1.0.2This gives us the information one needs to configure maildrop. This is how I have my mailfilter setup.
if( /^X-Bogosity: Spam/ )
{
to Maildir/.spam/
}
to Maildir/
There are other rules in my mailfilter, but it's pointless to list them
all here. Needless to say, maildrop works on regular expessions so
configuring other rules is simple enough for your mail automation.So, we now have to do some work on the mail delivery system. Bogofilter, in default configuration runs in user space, at the time of delivery, in a per-delivery basis. So it's not going to run for all users, only some of them. This is quite good as it's rather pointless running the spamfilter on a mailbox that only processes mail that fits certain criteria, such as an automated business process mail box.
.qmail
.qmail files generally contain three types of definition:
./Maildir/ &person@example.com |/home/user/bin/programThe first is a file system location, if the line ends with a slash, then the delivery is to a file, otherwise the deliveyr is to a Maildir format directory.
The second is a delivery to an email address.
The final is delivery to a program via standard input. This is the method that we are going to use to process the delivery. Normally the delivery with a maildrop system looks like this:
| /usr/local/bin/maildrop .mailfilterHowever, we need to run bogofilter prior to running maildrop so that the headers are inserted before maildrop beings to process the mail. To solve this, here are two perl script:
#!/usr/bin/perl
use strict;
use warnings;
open( B, "|/usr/bin/bogofilter -p |/usr/local/bin/maildrop ~/.mailfilter" );
while( my $l = <STDIN> ) {
print( B $l );
}
close(B);
and
#!/usr/bin/perl
use strict;
use warnings;
use IPC::Open2;
my $bogofilter = "/usr/local/bin/bogofilter";
my $maildrop = "/usr/local/bin/maildrop";
my $bogopid = open2( my $brd, my $bwr, "$bogofilter -p" );
my $maildroppid = open2( my $mrd, my $mwr, "$maildrop ~/.mailfilter" );
my $inheaders = 1;
sub modifyheaders {
my $head = shift;
if( $head =~ /^(X-Bogosity: )(Unsure)(, tests=bogofilter, spamicity=)([0-9.]+) (.*)$/ ) {
$head = "$1Spam,$2$3 $4\n";
}
return( $head );
}
while( my $l = <STDIN> ) {
print( $bwr "$l" );
}
close($bwr);
open( LOGFILE, ">/home/ed/log.txt" );
while( my $l = <$brd> ) {
# at this point we can do some header modification should we
# choose to
if( $l eq "\r\n" ) {
$inheaders = 0;
}
if( $inheaders ) {
$l = modifyheaders($l);
}
print( $mwr $l );
print( LOGFILE "$l" );
}
close($brd);
close($mwr);
close($mrd);
close( LOGFILE );
Both scripts do similar jobs. The first just takes the output from
bogofilter and pipes that to maildrop. The second has the ability to
modify the headers. Either should be inserted into the .qmail file as
|/home/user/scripst/bogodrop.plThis looks innocent enough, and sure you could put the command bogofilter | maildrop inside the .qmail file. This however does not provide the ability to alter the contents between bogofilter and maildrop, should one wish to make changes to the header.
Once again, I choose perl for it's incredible text handling abilities.
Now that we have the components in place for processing mail on delivery you can begin to check the directory where maildrop puts spammy mail.
Info