Chapter 11. Network Tests
SpamAssassin on its own can detect a high proportion of spam. By using network tests, spam detection can be further improved. SpamAssassin includes support for Realtime BlockLists (RBLs) and Spam URI Realtime BlockLists (SURBLs). All these external services are easy to integrate into SpamAssassin.
The effectiveness of network tests varies from a 60% detection rate upwards. By using them in conjunction with SpamAssassin, spam detection rates are much higher, typically over 95%! However, network tests slow down spam detection. This means that the SpamAssassin processes will take longer to complete and will increase the memory usage of the email server.
This chapter describes the support SpamAssassin has for RBLs and SURBLs, and focuses on three external services:
RBLs are blocklists of known sources of spam. By default, SpamAssassin uses a number of RBLs to check the source of the email.
A SURBL is a blocklist...
A number of RBLs are enabled with the default configuration of SpamAssassin. These are defined in /usr/share/spamassassin/20_dnsbl_tests.cf
. An example definition is shown here:
One set of definitions appears for each RBL configured. Rule definitions are explained in more detail in Chapter 12.
All the rules include a line that sets tflags
to net
. This groups the rules as network tests, and allows SpamAssassin to treat them as a group. There are two main reasons for this. The first is that network tests may take a long time to complete, especially at busy times. SpamAssassin uses a timeout for network tests, but it also applies this timeout in a progressive manner. If most of the network tests have completed, SpamAssassin will not wait for the last tests to complete. Specific details are given in the Mail::SpamAssassin::Conf
main page...
Spam URI Realtime BlockLists are a relatively recent technique and SpamAssassin 3.0 supports a relatively small number of SURBLs. SURBLs are configured much like RBLs. SpamAssassin 2.63 can use a different plug-in, described later. Details on SURBLs can be found at http://www.surbl.org.
The SURBLs are defined in /usr/share/spamassassin/25_uribl.cf
. An example definition is shown below:
One set of definitions appears for each SURBL configured.
As with RBLs, the SURBL rules set the tflags
to net
, to enable timeouts to be used, and to enable the rules to be switched on and off together.
SURBLs are implemented as a SpamAssassin plug-in. Plug-ins allow SpamAssassin to be extended with new types of tests and rules without changing SpamAssassin itself. To be enabled, the plug-in must be loaded. On SpamAssassin version 3...
Perl is required to install and use Vipul's Razor. This will already be installed as SpamAssassin uses it. A C compiler is also required, except on Debian Linux, for which a binary package is available.
To operate, Razor requires a constant internet connection. The Razor communication uses TCP port 2703, and Razor also uses TCP pings on port 7 to determine which servers are closest, so firewalls will have to be configured to enable these ports.
There are no mainstream RPM packages available for Razor. However, Razor is available in Gentoo and Debian Linux. To install in Gentoo, use the emerge razor
command, and to install in Debian, use apt-get razor
. On other Linux distributions and UNIX variants it can be installed from source.
Razor is available for download from the home page at http://razor.sourceforge.net/. Razor is not available via CPAN. Two packages are available, razor-agents
and razor-agents-sdk
. Both packages should be downloaded. The razor-agents...
Pyzor is written in Python, and so the Python language needs be installed. This is included with most modern Linux distributions and is available for other operating systems including AIX, Solaris, and HP/UX. Pyzor source is packaged in a tar.bz2
file, using the bzip2 compression scheme. A bunzip2
program is required, and is installed on most Linux distributions. Binary bunzip2
utilities for other UNIX-like operating systems can be downloaded from the Internet.
Note
Pyzor uses TCP port 24441 for communicating with a server, so any firewall must be configured to allow outgoing connections on that port.
Pyzor is available in RPM format only for Mandrake Linux. The rpm -i
command can be used to install the RPM once it is downloaded. Packages are also available for Gentoo Linux and Debian Linux. Use emerge pyzor
or apt-get pyzor
respectively.
For all other distributions and operating systems, Pyzor should be installed from source. Pyzor can be downloaded from the Pyzor website...
Although the correct term is 'The Distributed Checksum Clearinghouse', it is referred to as DCC here to enhance readability. DCC is the most effective network service, but also the most complex.
DCC is written in C. To build from source (binary packages are rare) a C compiler is required. DCC uses UDP port 6277 to communicate with servers, so this should be enabled through any firewall in use.
DCC is available in RPM format for Mandrake, but for no other RPM-based distribution. Use the rpm -i
command to install it. DCC is available in Gentoo Linux and Debian Linux; use emerge net-mail/dcc
under Gentoo, and apt-get dcc-client
in Debian. For other distributions and versions of UNIX, DCC should be installed from source.
The source for DCC can be downloaded from http://www.dcc-servers.net/dcc/.
The source is packaged as a tar
file. Unpack this and then run the configure
script. This script will automatically detect any required software libraries or inform if they are missing...
A spamtrap is an email address that has never been associated with a real person role in a company. The spamtrap is placed on web pages in such a way that it can only be picked up by spammer web spiders. When email is received at the spamtrap address, it can only be spam, and so the email can be sent to the Razor network as definite spam.
Normally a spamtrap is hidden from view by using a tiny font, by hiding the email address behind another element of the page, by using the same color for the text and the background, or by another technique). The spammer's web spider will nevertheless detect the email address and add it to its database of valid email addresses.
Spamtraps can also be added to postings on Usenet, as long as it is made clear that the email address should not be used for real replies.
Choosing a Spamtrap Address
A spamtrap address should be made of completely random characters. Using an address such as info@domain.com, contact@domain.com
, or other popular generic addresses...
Network tests allow a site to benefit from other sites reporting email relays and spam-advertised websites. SpamAssassin includes support for RBLs and SURBLs. The latter provide a promising new technique against spam, which works by detecting the URIs that are advertised in spam emails. RBLs, Razor, Pyzor, and DCC are email comparison systems. DCC is considered the most effective. These tests can be used together and most settings are configurable on a site-wise or per-user basis.