Packt+ | Advance your knowledge in tech

You're reading from SpamAssassin: A practical guide to integration and configuration

Product type Book

Published in Sep 2004

Publisher Packt

ISBN-13 9781904811121

Pages 240 pages

Edition 1st Edition

Languages

Concepts

Cybersecurity

Table of Contents (24) Chapters

SpamAssassin

Credits

About the Author

About the Reviewers

1. Introduction

1. Introducing Spam

2. Spam and Anti-Spam Techniques

3. Open Relays

4. Protecting Email Addresses

5. Detecting Spam

6. Installing SpamAssassin

7. Configuration Files

8. Using SpamAssassin

9. Bayesian Filtering

10. Look and Feel

11. Network Tests

12. Rules

13. Improving Filtering

14. Performance

15. Housekeeping and Reporting

16. Building an Anti-Spam Gateway

17. Email Clients

18. Choosing Other Spam Tools

Glossary

Chapter 11. Network Tests

SpamAssassin on its own can detect a high proportion of spam. By using network tests, spam detection can be further improved. SpamAssassin includes support for Realtime BlockLists (RBLs) and Spam URI Realtime BlockLists (SURBLs). All these external services are easy to integrate into SpamAssassin.

The effectiveness of network tests varies from a 60% detection rate upwards. By using them in conjunction with SpamAssassin, spam detection rates are much higher, typically over 95%! However, network tests slow down spam detection. This means that the SpamAssassin processes will take longer to complete and will increase the memory usage of the email server.

This chapter describes the support SpamAssassin has for RBLs and SURBLs, and focuses on three external services:

Vipul's Razor
Pyzor
The Distributed Checksum Clearinghouse (DCC)

RBLs are blocklists of known sources of spam. By default, SpamAssassin uses a number of RBLs to check the source of the email.

A SURBL is a blocklist...

RBLs

A number of RBLs are enabled with the default configuration of SpamAssassin. These are defined in /usr/share/spamassassin/20_dnsbl_tests.cf. An example definition is shown here:

header RCVD_IN_NJABL eval:check_rbl('njabl', 'dnsbl.njabl.org.')
describe RCVD_IN_NJABL Received via a relay in dnsbl.njabl.org
tflags RCVD_IN_NJABL net

One set of definitions appears for each RBL configured. Rule definitions are explained in more detail in Chapter 12.

All the rules include a line that sets tflags to net. This groups the rules as network tests, and allows SpamAssassin to treat them as a group. There are two main reasons for this. The first is that network tests may take a long time to complete, especially at busy times. SpamAssassin uses a timeout for network tests, but it also applies this timeout in a progressive manner. If most of the network tests have completed, SpamAssassin will not wait for the last tests to complete. Specific details are given in the Mail::SpamAssassin::Conf main page...

SURBLs

Spam URI Realtime BlockLists are a relatively recent technique and SpamAssassin 3.0 supports a relatively small number of SURBLs. SURBLs are configured much like RBLs. SpamAssassin 2.63 can use a different plug-in, described later. Details on SURBLs can be found at http://www.surbl.org.

The SURBLs are defined in /usr/share/spamassassin/25_uribl.cf. An example definition is shown below:

uridnsbl URIBL_SBL sbl.spamhaus.org. TXT
header URIBL_SBL eval:check_uridnsbl('URIBL_SBL')
describe URIBL_SBL Contains a URL listed in the SBL blocklist
tflags URIBL_SBL net

One set of definitions appears for each SURBL configured.

As with RBLs, the SURBL rules set the tflags to net, to enable timeouts to be used, and to enable the rules to be switched on and off together.

SURBLs are implemented as a SpamAssassin plug-in. Plug-ins allow SpamAssassin to be extended with new types of tests and rules without changing SpamAssassin itself. To be enabled, the plug-in must be loaded. On SpamAssassin version 3...

Vipul's Razor

Perl is required to install and use Vipul's Razor. This will already be installed as SpamAssassin uses it. A C compiler is also required, except on Debian Linux, for which a binary package is available.

To operate, Razor requires a constant internet connection. The Razor communication uses TCP port 2703, and Razor also uses TCP pings on port 7 to determine which servers are closest, so firewalls will have to be configured to enable these ports.

Installing Razor

There are no mainstream RPM packages available for Razor. However, Razor is available in Gentoo and Debian Linux. To install in Gentoo, use the emerge razor command, and to install in Debian, use apt-get razor. On other Linux distributions and UNIX variants it can be installed from source.

Razor is available for download from the home page at http://razor.sourceforge.net/. Razor is not available via CPAN. Two packages are available, razor-agents and razor-agents-sdk. Both packages should be downloaded. The razor-agents...

Pyzor

Pyzor is written in Python, and so the Python language needs be installed. This is included with most modern Linux distributions and is available for other operating systems including AIX, Solaris, and HP/UX. Pyzor source is packaged in a tar.bz2 file, using the bzip2 compression scheme. A bunzip2 program is required, and is installed on most Linux distributions. Binary bunzip2 utilities for other UNIX-like operating systems can be downloaded from the Internet.

Note

Pyzor uses TCP port 24441 for communicating with a server, so any firewall must be configured to allow outgoing connections on that port.

Installing Pyzor

Pyzor is available in RPM format only for Mandrake Linux. The rpm -i command can be used to install the RPM once it is downloaded. Packages are also available for Gentoo Linux and Debian Linux. Use emerge pyzor or apt-get pyzor respectively.

For all other distributions and operating systems, Pyzor should be installed from source. Pyzor can be downloaded from the Pyzor website...

DCC

Although the correct term is 'The Distributed Checksum Clearinghouse', it is referred to as DCC here to enhance readability. DCC is the most effective network service, but also the most complex.

DCC is written in C. To build from source (binary packages are rare) a C compiler is required. DCC uses UDP port 6277 to communicate with servers, so this should be enabled through any firewall in use.

Installing DCC

DCC is available in RPM format for Mandrake, but for no other RPM-based distribution. Use the rpm -i command to install it. DCC is available in Gentoo Linux and Debian Linux; use emerge net-mail/dcc under Gentoo, and apt-get dcc-client in Debian. For other distributions and versions of UNIX, DCC should be installed from source.

The source for DCC can be downloaded from http://www.dcc-servers.net/dcc/.

The source is packaged as a tar file. Unpack this and then run the configure script. This script will automatically detect any required software libraries or inform if they are missing...

Spamtraps

A spamtrap is an email address that has never been associated with a real person role in a company. The spamtrap is placed on web pages in such a way that it can only be picked up by spammer web spiders. When email is received at the spamtrap address, it can only be spam, and so the email can be sent to the Razor network as definite spam.

Normally a spamtrap is hidden from view by using a tiny font, by hiding the email address behind another element of the page, by using the same color for the text and the background, or by another technique). The spammer's web spider will nevertheless detect the email address and add it to its database of valid email addresses.

Spamtraps can also be added to postings on Usenet, as long as it is made clear that the email address should not be used for real replies.

Choosing a Spamtrap Address

A spamtrap address should be made of completely random characters. Using an address such as info@domain.com, contact@domain.com, or other popular generic addresses...

Summary

Network tests allow a site to benefit from other sites reporting email relays and spam-advertised websites. SpamAssassin includes support for RBLs and SURBLs. The latter provide a promising new technique against spam, which works by detecting the URIs that are advertised in spam emails. RBLs, Razor, Pyzor, and DCC are email comparison systems. DCC is considered the most effective. These tests can be used together and most settings are configurable on a site-wise or per-user basis.