Chapter 15. Housekeeping and Reporting
Once SpamAssassin is installed and configured, it operates well with little or no intervention. A busy system administrator will be keen to automate every aspect of system operations and make life easier for users. In this chapter, some further filters and regular scripts are described.
Separating Levels of Spam
Spam does not need to be saved on the server, except as a corpus for training the Bayesian database and for score regeneration. Generally, the reason that spam emails are stored is so that any false positives can be reclaimed by users. If auto-learning is used, you also can use these stored spam emails to ensure that false positives have not been learned as spam. This involves checking the folder of spam on a daily or weekly basis.
One technique to lower the number of spam emails to be examined is to divide them into two folders: one for high-scoring spam emails and another for comparatively low-scoring spam emails. False positives are unlikely...
Separating Levels of Spam
Spam does not need to be saved on the server, except as a corpus for training the Bayesian database and for score regeneration. Generally, the reason that spam emails are stored is so that any false positives can be reclaimed by users. If auto-learning is used, you also can use these stored spam emails to ensure that false positives have not been learned as spam. This involves checking the folder of spam on a daily or weekly basis.
One technique to lower the number of spam emails to be examined is to divide them into two folders: one for high-scoring spam emails and another for comparatively low-scoring spam emails. False positives are unlikely to be in the high scoring category, so the user need not examine emails in this folder.
This filtering can be effected using a Procmail recipe. The X-Spam-Level
header contains a number of asterisks to indicate the score of the email. Emails that score between one and two get one asterisk, while emails that score between 12...
Detecting When SpamAssassin Fails
SpamAssassin in most circumstances is very reliable. When SpamAssassin is used as a daemon, email clients call spamc
. If the spamd
daemon is not running, then spamc
will not tag email and spam emails would be delivered to the users' mailbox. If the email solution relies on SpamAssassin, then we should regularly confirm that spamd
is running. One common reason for a service outage is that the daemon has stopped. Daemons can be tested by connecting to the port that they listen on. This involves writing a test client or using an existing client in test mode. This approach can be complex. Another solution is to simply test that the daemon is running among the processes on the system.
Large companies may use products like IBM's Tivoli or HP's OpenView for systems management, and these can be extended to watch the appropriate processes and send an alert in one of many ways. For smaller companies, the cost of a product like these is prohibitive. One inexpensive...
There may be a need to supply statistics on email processing. This might be necessary to support any time invested on email administration. Alternatively, it may be desired to chart the trends in general email use and in the proportions of ham and spam.
One basic statistic is the number of emails processed. By subtotaling both ham and spam, and then creating per-user statistics, a good representation of the dynamics of the email in a corporation can be built up.
Another useful report is the length of time taken by SpamAssassin to process email. Apart from giving an immediate statistic on the delay of email due to spam processing, this report is useful for long term planning; if email processing is taking longer each month, it indicates that the load on the system is increasing and also suggests that additional resources may be required to improve the quality of services.
A spam counter can be used to count the number of spam emails and the number of ham emails...
Spam email is usually archived for the purposes of creating a corpus for training a filter and identifying false positives. The user is required to manually go through spam messages to perform this sorting. One way of reducing this effort on the part of the user is to use automatic scripts and cron jobs to process and filter spam emails.
Separating spam into levels based on scores aid the user by presenting them with a smaller folder of spam to check for false positives. Spam reports can be generated by using similar scripts that parse the spam statistics that result from commands such as ps -ef
. This is done by calling simple scripts from Procmail, or from running reports on system logs. These scripts and their results can be modified as required. For example, you can have site-wide and hourly reports. A little bit of an effort will relieve the system administrator and users from the considerable effort of sifting through large numbers of spam emails.