\n"; ?>
Linux on servers

Server based Spam Filtering Methods

Jerome Griessmeier

Abstract

Unsolicited Commercial Mails (UCE), better known as Spam Mails, have become a big problem for internet users. Once registered for an online account, game, newsletter or whatever it does not take much time until the first unsolicited Mails find their way into the users mailbox. To prevent users from wasting hours of time by deleting and sorting their mails there are several forms of filtering. This filter programs help the user by sorting out unsolicited emails automatically. This paper will show different types of filtering methods and give some suggestions what can be done to prevent spam mails.

Introduction

What is Spam

Today most people have an email account, even if they do not have frequent or permanent access to the Internet. The costs of sending an email are very low, so it costs nearly nothing to send huge amounts of emails over the Internet. Also there is practically no regulation of email traffic and sending emails to accounts which do not want to have commercial email. This is one of the reason why many business men have chosen this way to advertise their products by sending lots of mails to email accounts without asking the account's users whether they would like to receive such mails or not.

How do Spammers get my Email-Address

All they have to do is to gain access to as many email accounts as possible. And there are different ways how they do it. For example they write programs crawling through the web and catching as many email addresses as they can. They are crawling everywhere in the world wide web. On private websites, on newsgroups, in forums etc. . Once an email address has been proven as valid, which means that spam mails have passed to this account or people have answered to one of these mails they can be sold to other companies advertising by the same ways. The kind of mails people are receiving from such people are also known as "Spam Mails", "Junk Mails" or "unsolicited commercial mail (UCE)" Unfortunately Spam Mailing has become an own business. Spamming companies are using farms of Email servers and are making money by selling mail addresses and sending Commercial Mails.

How to reduce these Spam-Mails

What you should never do to prevent Spam Mails

First of all it has been proven that the worst thing people can do is to answer to an incoming spam mail or to follow links out of these mail which show web interfaces where the user can type in his email address to get removed from a newsletter or advertising mailing list. These tricks are just used to validate the email address and to use it for other mailing operations or to sell it to other companies.

What can be done to prevent Spam Mails

To really get rid of spam mails there are other things that can and should be done. The first step to reduce the amount of email is not to register everywhere to internet services with ones first email address.

When the user knows that he will just need this account once or just a few times he should use an address he can easily change and give up when it´s address has come to spammer business and it is not more practicable to use this email account.

The points mentioned above can help not getting more and more Spam Mails a day. But in practice email accounts often can´t be given up or a Spammer got the account by other ways and people don´t want to use different accounts several times a year.

Nevertheless these accounts are not lost or do not have to be given up. What can be used in these cases are filtering programs. A filtering program look for keywords and the addresses where the mail was send from . Then it decides whether a mail is delivered or not. In fact there are two different ways of filtering.

Client Based Filtering

One method of filtering emails is to have a filtering program installed on the client computer where the emails are read. So Mails are received by the mail server and the installed filtering program looks for patterns often used in Spam mails or for header addresses known as Spammer addresses and sorts them out.

One problem in this case is that these mails are first sent to the server and downloaded to a local computer. This causes a lot of traffic. Just imagining millions of Spam Mails with just some kilobyte sent and downloaded several times a day it is easy to calculate that Spammer are causing a lot of traffic which in the end has to be paid by the people downloading their mails.

Because of this problem I will introduce another way of email filtering.

Server Based Filtering

The Server Based Filtering mainly works like the Client Based Filtering, but with some important differences:
Mails from Sender Addresses known as Spammers are immediately refused by the Server after receiving the header of the email. This prevents huge network traffic and a waste of server resources. At this point I want to show how a standard mail configuration system called postfix can be configured to block Spam Emails.

Postfix Filtering Methods

Postfix is a nowadays often used Mail Transfer Agent (MTA) Tool for receiving and sending mail which brings its own possibility for preventing Spam Mails. Postfix is mainly configured by the file main.cf which contains the configuration options for the postfix system.

In addition to this configuration file there can be defined rules which tell the postfix system whether an incoming mail should be accepted or rejected.

For example several check files can be enabled by adding just a entry in the main configuration file main.cf. Here just an example for adding the usage of an header check file:

header_checks = regexp:/etc/postfix/maps/header_checks

The header check file contains rules which define which headers have an action as a consequence. For example, it is possible to perform an action after receiving special keywords in the subject line of the header, or to reject all mails with a date earlier than a certain reference date. Other important files for the configuration of postfix which can also be enabled in main.cf are for example the body_checks file, which contains rules of actions concerning the body of the message. In the access file whole domains or single email addresses can be configured with actions.

The actions which can be performed are almost the same for any of the above mentioned configuration files:

  • REJECT
    This is probably the most common configuration option. It tells the system to block the mail before it can enter the system. As addition to this a text can be entered after REJECT which appears in the standard email logfile on the server and in the header which is created when the email is bounced to the sender of the original mail.
  • IGNORE
    Using this option the header of the incoming mail will be removed but the content of the mail delivered to mail user.
  • WARN
    Mails incoming with this option are marked with a warning in the mail logfile. This can be very useful for testing new spam rules.
  • HOLD
    This option holds the mail in the mail queue so that the system administrator can decide what to do with it.
  • DISCARD
    A discarded email will not be delivered but the sending server will be informed that the mail has been properly delivered.
  • FILTER
    Here one can tell the system to bounce the mail to the server or to a defined filter. After FILTER one has to specify the next hop.

The main disadvantage is that it takes a lot of time to configure a mail server by editing all these files and updating them at least once a week. Although there are providers which offer up-to-date files for download and although this download can be automated by the use of a cron-job I was looking for a completely different method of fighting spam mails.

What I finally found was a system called Spam Assassin which need not be updated as often and which was quite easy to install.

Spam Assassin

This open-source package which is written mostly in Perl is a free server-based email filtering tool mainly designed for a UNIX server environment, but in commercial versions it is also available for Windows and other platforms. The program is distributed under the same licence as Perl and works with a wide variety of setups. This means that it works well with different kinds of Mail Transfer Agent Systems like procmail, qmail, postfix and many others.

In following I just want to describe in general how Spam Assassin works. One of the advantages of the filtering software is that it has different spam identification tactics:

  • Text analysis: spam mails often use characteristic disclaimer and are written in characteristic style.
  • Header analysis spammer often manipulate email header to hide their identity or to make one believe an email as valid.
  • Blacklists: Spam assassin supports existing blacklists like ordb.org, spamcop.net or mail-abuse.org. These blacklists tracks ip-addresses from open relay mail servers which are not properly configured and used by spammers.
  • Vipul´s Razor: Spam assassin can handle this spam-tracking database. Once a user has added a spam mail through his account to this database other users of vipul´s razor will automatically block these mails.

First of all there is to say that once the tool is installed on a server it can be installed for all users just as well as for single user or groups of users. The software automatically creates two mail folder, in default case in the home directory of the system user. One folder contains mails the system is really sure they are UCE mails, and the other one contains all mails the system think to be potential spam. The software runs very well out of the box without making changes to the rules and to the rules' weights. But for interested System Administrator it is easy to add or to modify rules which are all stored in text configuration files.

With Spam Assassin in version 2.50 or higher it has also become possible to train the system to recognize already received spam and non spam mails.

Conclusion

In my opinion it is quite necessary to use Spam Filters. These Filters should work on the existing Mail Servers instead of the clients. This can prevent people from wasting their time deleting unsolicited mails. It also reduces the traffic and Internet access costs. One problem of the Server Based Filtering is that it is quite complex and the amount of filtering programs is very large. The tools often use just one of several forms of blocking and filtering, so that they usually do not work efficiently and with intelligent algorithms.

In opinion Spam Assassin combines just the right components and thus is a very powerful tool for fighting spam. It is able to learn by the analysis of already received mails, it can work together with online databases and it is relatively easy to install and to maintain. This is important because the configuration of the Mail Transfer Agent like sendmail or postfix takes too much time. One interesting thing that could be done to improve Spam Assassin could be the development and integration of new learning algorithms. (streichen:And) Another thing that could be done is to migrate Spam Assassin into several common ly used web-based email programs for easier use and wider configuration possibilities for the individual users.

Bibliography

  1. Postfix http://www.postfix.org
  2. Spam Assassin http://spamassassin.apache.org
  3. Mail-FAQ http://www.faqs.org/faqs/de-net-abuse/mail-faq/
  4. Retreiving and Filtering email
\n"; ?>