Packt+ | Advance your knowledge in tech

You're reading from SpamAssassin: A practical guide to integration and configuration

Product type Book

Published in Sep 2004

Publisher Packt

ISBN-13 9781904811121

Pages 240 pages

Edition 1st Edition

Languages

Concepts

Cybersecurity

Table of Contents (24) Chapters

SpamAssassin

Credits

About the Author

About the Reviewers

1. Introduction

1. Introducing Spam

2. Spam and Anti-Spam Techniques

3. Open Relays

4. Protecting Email Addresses

5. Detecting Spam

6. Installing SpamAssassin

7. Configuration Files

8. Using SpamAssassin

9. Bayesian Filtering

10. Look and Feel

11. Network Tests

12. Rules

13. Improving Filtering

14. Performance

15. Housekeeping and Reporting

16. Building an Anti-Spam Gateway

17. Email Clients

18. Choosing Other Spam Tools

Glossary

Chapter 13. Improving Filtering

SpamAssassin has a high spam detection rate, but despite this, some spam emails always escape detection. Conversely, legitimate emails are sometimes marked as spam.

This chapter looks at whitelists and blacklists—techniques for spam filtering that mark known good and bad senders. We then discuss the situation where emails have been wrongly classified, and how to resolve this by altering scoring on rules. Finally, we discuss filtering out certain foreign languages and character sets as a method of reducing spam.

Whitelists and Blacklists

SpamAssassin works very well at detecting spam, but there is always a risk of false positives or false negatives. By using a list of email addresses that are known spam producers (a blacklist), email from spammers who use consistently use the same email addresses or domains can be filtered out. With a list of email addresses that are legitimate email senders (a whitelist), emails from regular or important correspondents are guaranteed...

Whitelists and Blacklists

Blacklists that list individual emails have limited use—spammers normally use different or random email addresses for each spam run. However, some spammers use the same domain for multiple runs. As SpamAssassin allows wildcards in its blacklisting, entire domains can be blacklisted. This is more useful for filtering out spam.

SpamAssassin uses a manual blacklist and whitelist, and also manages an automatic whitelist...

The Auto-Whitelist

SpamAssassin manages an automatic whitelist (AWL). It actually functions as both an auto-blacklist and an auto-whitelist. Generally, an auto-blacklist is ineffective as spammers rarely use the same email address for any period of time. However, SpamAssassin tracks both the IP address of email sources and the email addresses used, adding to its effectiveness.

The auto-whitelist keeps a record of the SpamAssassin scores for emails from senders. Senders that only send ham emails receive a weighting towards ham by the auto-whitelist. If they later send an email marked as spam, then the SpamAssassin score for the new email will be adjusted downwards, due to their past behavior as a source of only ham emails. The converse is true of those who normally send spam.

The AWL works by adjusting the score of the email being processed towards the average of all previous emails received from that sender. The amount or strength of this adjustment can be altered using the auto_whitelist_factor...

Resolving Incorrect Classifications

The consequences of an email being wrongly classified can range from a minor inconvenience to a major catastrophe. If a spam email is marked as ham, then the recipient will only spend a few seconds removing it from their inbox. If an unimportant ham email is marked as spam, it may be no great problem. However, marking an important email as spam could be embarrassing at the least, and could well result in serious consequences, such as financial loss.

Consequently, it is worthwhile making a backup of emails marked as spam before purging them. Emails compress well, so it may be possible to keep this backup on local disk storage rather than tape or other removable media. When it becomes apparent that an email has gone missing, the archive of spam can be searched and the email retrieved.

Due to the potential cost of an incorrect classification, users should be encouraged to review any spam they have received on a regular basis. Any false positives should be brought...

Character Sets and Languages

SpamAssassin can detect certain languages and character sets. Both language and character set information are added by email clients when emails are composed and sent, so that the receiving email client can display the message correctly. There are many languages and character sets in use. If received messages are expected or known to use only some of them, then the others can be filtered out.

Disallowing Languages

SpamAssassin detects languages by using email headers. There is a large list of languages that SpamAssassin can detect; these are listed in the documentation for Mail::SpamAssassin::Conf. Use the man or perldoc commands to view the documentation:

$ perldoc Mail::SpamAssassin::Conf
$ man Mail::SpamAssassin::Conf

Many man and perldoc implementations use the / key to search for text. Once the page is displayed, enter /ok_languages to locate the correct part in the documentation. Press the space bar to scroll forward through the documentation.

Once the languages...

Summary

SpamAssassin allows the user and system administrator to improve detection of spam using a number of techniques. Whitelists aid in preventing false-positives. SpamAssassin's auto-whitelist will prevent an occasional spam-like email from an otherwise spam-free correspondent from being filtered as spam.

The spam threshold of whitelists can be altered to reduce false-positive or false-negative emails, and individual rule scores can be altered to prevent incorrect classifications. Character sets and languages can be used to filter out spam from other countries or regions.