SpamAssassin with MySQL on Debian wheezy

This post is about how to configure SpamAssassin to store its volatile data in a MySQL database. Most of the information is based on the documentation in /usr/share/doc/spamassassin/sql, the SpamAssassin SQL wiki and this page. As the title says these instructions are heavily Debian-based.

In /etc/default/spamassassin add the options --sql-config (-q) and --nouser-config (-x) to the OPTIONS configuration.

First, create a database and a user in MySQL:

mysql -u root -p mysql <<-EOF
        CREATE DATABASE spamassassin;
        CREATE USER 'spamassassin'@'localhost' IDENTIFIED BY PASSWORD 'my_spamassassin_password';
        GRANT ALL ON spamassassin.* TO 'spamassassin'@'localhost';
        FLUSH PRIVILEGES;
EOF

Setting up users' score

SpamAssassin can load user specific settings from the database, if the user_scores_dsn variable is set (and spamd is started with the --sql-config option).

user_scores_dsn              dbi:mysql:spamassassin:localhost
user_scores_sql_username     spamassassin
user_scores_sql_password     my_spamassassin_password

For the default query to work the userpref table must have at least the username, preference and value fields. The username field contains the username whose e-mail is being filtered or @GLOBAL for a global option.

CREATE TABLE userpref (
        username varchar(100) NOT NULL default '',
        preference varchar(50) NOT NULL default '',
        value varchar(100) NOT NULL default '',
        prefid int(11) NOT NULL auto_increment,
        PRIMARY KEY (prefid),
        KEY username (username));

INSERT INTO userpref (username, preference, value) VALUES ('@GLOBAL', 'required_hits', '4.0');
# INSERT INTO userpref (username, preference, value) VALUES ('alice', 'whitelist_from', '*@example.com');

Setting up the auto-whitelist configuration

The setting below tells SpamAssassin to use the auto-whitelist in the SQL database.

auto_whitelist_factory       Mail::SpamAssassin::SQLBasedAddrList

user_awl_dsn                 dbi:mysql:spamassassin:localhost
user_awl_sql_username        spamassassin
user_awl_sql_password        my_spamassassin_password
user_awl_sql_table           awl

Here is the default table layout:

CREATE TABLE awl (
        username varchar(100) NOT NULL default '',
        email varchar(255) NOT NULL default '',
        ip varchar(40) NOT NULL default '',
        count int(11) NOT NULL default '0',
        totscore float NOT NULL default '0',
        signedby varchar(255) NOT NULL default '',
        PRIMARY KEY (username, email, signedby, ip));

SpamAssassin only adds data to the auto-whitelist table and does not delete from it. Hence it is suggested to change the table to include a timestamp of the last modification for each record. With this additional column it is possible to selectively delete old and unused records.

ALTER TABLE awl ADD lastupdate timestamp default CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP;
UPDATE awl SET lastupdate = NOW() WHERE lastupdate < 1;

The following statements delete entries that are older than 6 months and addresses that occurred only once in the past 15 days.

DELETE FROM awl WHERE lastupdate <= DATE_SUB(SYSDATE(), INTERVAL 6 MONTH);
DELETE FROM awl WHERE count = 1 AND lastupdate <= DATE_SUB(SYSDATE(), INTERVAL 15 DAY);

Setting up the Bayes configuration

The Bayes settings are similar to the ones seen above.

bayes_store_module           Mail::SpamAssassin::BayesStore::SQL

bayes_sql_dsn                dbi:mysql:spamassassin:localhost
bayes_sql_username           spamassassin
bayes_sql_password           my_spamassassin_password
#bayes_sql_override_username  mail

The tables with the default layout are created with the following commands.

CREATE TABLE bayes_expire (
        id int(11) NOT NULL default '0',
        runtime int(11) NOT NULL default '0',
        KEY bayes_expire_idx1 (id));

CREATE TABLE bayes_global_vars (
        variable varchar(30) NOT NULL default '',
        value varchar(200) NOT NULL default '',
        PRIMARY KEY  (variable));

INSERT INTO bayes_global_vars VALUES ('VERSION','3');

CREATE TABLE bayes_seen (
        id int(11) NOT NULL default '0',
        msgid varchar(200) binary NOT NULL default '',
        flag char(1) NOT NULL default '',
        PRIMARY KEY (id, msgid));

CREATE TABLE bayes_token (
        id int(11) NOT NULL default '0',
        token char(5) NOT NULL default '',
        spam_count int(11) NOT NULL default '0',
        ham_count int(11) NOT NULL default '0',
        atime int(11) NOT NULL default '0',
        PRIMARY KEY (id, token), INDEX bayes_token_idx1 (id, atime));

CREATE TABLE bayes_vars (
        id int(11) NOT NULL AUTO_INCREMENT,
        username varchar(200) NOT NULL default '',
        spam_count int(11) NOT NULL default '0',
        ham_count int(11) NOT NULL default '0',
        token_count int(11) NOT NULL default '0',
        last_expire int(11) NOT NULL default '0',
        last_atime_delta int(11) NOT NULL default '0',
        last_expire_reduce int(11) NOT NULL default '0',
        oldest_token_age int(11) NOT NULL default '2147483647',
        newest_token_age int(11) NOT NULL default '0',
        PRIMARY KEY (id),
        UNIQUE bayes_vars_idx1 (username));