The least complex rules are the body and header rules. Meta rules are more complex and are described later in the chapter.
All rules must implement a Perl regex. If a rule is defined, it will be run unless its score is set to 0. The default score for a rule is 1.0. Rules beginning with T_
are test rules, and SpamAssassin gives a default score of 0.01 to these. Rule names should be 22 characters or less. By convention, rule names are in uppercase.
A rule must also have a description. The describe
configuration directive is used for this.
Rules should be placed in a file with the extension .cf
and placed in /etc/mail/spamassassin
. Rules can only be defined for a user if allow_local_rules
is set in /etc/mail/spamassassin/local.cf:
User-defined rules are placed in ~/.spamassassin/user_prefs
. Rules can be developed using a user account. Once a rule is tested and scored, it can be moved to the site-wide configuration.
Rules can be written to search for single words...
Many other rulesets exist and are published on the Internet. These are often themed, for example, detecting invalid HTML, common URLs used in spam, or catching sequences of characters often used in spam emails. Custom rulesets are often updated frequently in response to changes in spam being sent.
An Internet search for 'spamassassin rulesets' will return many pages linking to rulesets. The SpamAssassin wiki (a collaborative information website) includes links to many custom rulesets on http://wiki.apache.org/spamassassin/CustomRulesets.
Custom rulesets may have specific installation instructions that should be read and followed. In general, the installation involves copying a rule file and a score file into /etc/mail/spamassasin/
and restarting spamd
.
The more rules SpamAssassin uses, the more the resources used to process each email. System performance may degrade if too many custom rulesets are used. Performance issues are covered in Chapter 14.
This chapter discussed the building blocks of SpamAssassin—rules. SpamAssassin allows the user to define rules to respond to the spam a site is currently receiving. There are a variety of rule types for processing different parts of the email.
User-defined rules are based on Perl regular expressions. Rule scoring is an important part of SpamAssassin filtering techniques.
Using a corpus and calculating the effectiveness of rules can assist in re-scoring rules to improve filtering. Several custom rulesets can be added to a site, and these should be frequently updated.