How to Stop Spam

This is a long, detailed article. We also have a much shorter summary article about spam available.

Any consideration of internet safety for kids must include a look at spam. Spam represents the most common way that inappropriate materials arrive in the family home. Any concern for children's online safety needs to address how to ensure safe email for kids and for the whole family.

What is spam?

Spam is the commonly used term for what is more accurately described called Unsolicited Commercial Email (UCE). Spamming is the misuse of email to send messages, typically advertisements, to lots of people who usually don't want to receive them.

Because the cost of sending an email is incredibly low, less discerning companies can use spam as a very low cost way of marketing their products. It's not without cost at all - for example, even as individuals we pay an Internet Service Provider to connect us to the internet and this can be considered part of the cost of sending an email. Laws vary in different countries as to the use of unsolicited commercial email, but it's universally seen as undesirable and it's a big problem for both individuals and businesses the world over.

If you have an email address the chances are that you've seen some spam already. Typically spammers advertise products which can be purchased online and shipped to the buyer. The most common of these are prescription and non prescription medications, but also common are products of an adult nature. Advertisements for financial services like credit cards and loans are also common. An often asked question, given the near universal disgruntlement with spam messages, is why do these companies bother to use this mechanism to promote their products? The answer is a hard commercial reality - money. Because sending spam is so inexpensive, it only takes one person to actually go ahead and buy the product for it to be worthwhile.

Why is spam bad?

Whilst the cost of sending spam messages is very low, the cost to recipients, particularly businesses, is very high. There is the cost of receiving the messages, the computer storage needed to store them, the cost of anti-spam systems and most of all, the cost of individuals having to "process" the messages, typically by reading as little of them as possible to decide that they're spam and deleting them. As an example, the Californian legislature estimated that spam cost corporations in the United States of America over $13,000,000,000 in 2007.

Even without the monetary impact of spam, it's generally considered highly undesirable. Spam takes time for people to receive and remove, some people find the contents of spam messages very disturbing and others find that under a deluge of spam it is difficult to use email effectively at all.

How do spammers get email addresses?

There are several ways in which spammers build a list of email addresses to which to send their messages.

Bought mailing lists

One of the most common ways for spammers to get your email address is to buy lists of addresses. The original source of the entries on these lists is typically one of the other methods of gaining an address to spam that are detailed below. The critical point is that once you're on a spammer’s list it's likely that list will be given or sold to other spammers, thus amplifying the problem. Kids email addresses are indistinguishable from adult accounts and so just as likely to be sold on.

"Sign up" details being sold

When you sign up to services on some websites, or buy products, or even buy products offline (i.e. in a retail outlet), sometimes you're asked for your email address. Discerning companies will have a published privacy policy (and in some countries you may be protected by privacy and data protection laws) but others may choose to sell your personal information, including your email address. These addresses are collated by companies specialising in mailing lists and sold to spammers.

Web scraping and bots

If you write a message that ends up being displayed on the internet in any form, for example a discussion forum, a web page, a comment on a blog entry, photograph or video site, that information is generally visible for all to see. If you include your email address in that message then automated processes called "bots" can be used to "scrape" that information from the web page. Because email addresses are quite easy for a machine to detect (due to their format, with some letters before an @ and more letters separated with dots) it is a trivial task to build up a list of email addresses in this way.

Compromised address books

Less common, but still popular, is a highly unscrupulous (and in most countries, illegal) practice of stealing email addresses from people's own computer address books. Typically what happens is that a virus gets on to the victim's computer, which then transmits the contents of their address book to the spammer. Alternatively, it may even instruct the victim's computer to send the spam email directly to each of the people in their address book.

Random addresses

Increasingly common is the use of complete random address by spammers. They find lists of domains (the part of the email address after the @) which are publicly available and they use random collections of letters before it to make something that may or may not be a valid email address. Additionally they may try common names in an attempt to generate a valid address. Clearly this approach is very hit and miss, but as the cost of sending multiple messages is so low, it is fairly commonly used.

What else do I need to know about spammers address lists?

Asking to be removed from a spammer’s list, if they are true spammers, is futile. Some lists are cleansed once in a while, such that if an email cannot ever be delivered to an address on the list it may be removed, but this is fairly uncommon. More sophisticated spammers do monitor the acceptance rate of addresses on their list. Additionally, some spammers collect additional demographic type information, particularly with regard to buying preferences. These make their lists more valuable. Typically this occurs by attaching to your email address some details of products you've purchased previously, on the assumption that you might be interested in purchasing more of the same.

Why can't spam be stopped easily?

Spam is not easy to stop because of the nature of the design of the protocol used to send email, which is called SMTP (Simple Mail Transfer Protocol). This protocol was designed many years ago and the designers of it didn't foresee the misuse of the email system that is commonplace today. The most fundamental flaw in the SMTP design is that anyone can send an email purporting to come from anyone else. This makes it very difficult (impossible using basic SMTP) to know that a mail is coming from its claimed source. So even if you set up a system whereby you only accept email from certain places, spammers can still reach you. There is no simple way to determine the true origin of the sender using SMTP.

Replacing the SMTP mechanism, whilst technically possible, would be very difficult, because it is so wide spread and servers that didn't use it wouldn't be able to communicate with those that do.

Another problem is that email is so inexpensive to send. Some people propose some kind of cost or tax to be applied to sending email; a very small amount which wouldn't deter normal use of email but that which would become significant if sending many millions of emails every day, as spammers do. However, this idea gathers little favour for obvious reasons.

Finally, overcoming spam through technical means, such as using the methods we will outline below, is difficult because spammers are wise to the methods and have evolved their own systems to counteract these techniques. Spam is a lucrative industry and spammers are often very sophisticated. You may have seen examples of this in action. For example spammers may fill emails with legitimate looking text beneath their spam message, deliberately misspell words that would otherwise be easy to spot, use images instead of text which is harder to analyze, and so on.

What methods are used to try to stop spam?

There are several approaches to stopping spam, but it's worth remembering that no approach is likely to be 100% effective without some "false positives" (i.e. rejecting as spam genuine email that you would actually have wanted to receive) or a good deal of manual effort that may end up exceeding the effort required to just delete the spam in the first place.

Success in avoiding spam generally comes from combining a number of measures. Here we will explain some of the most popular techniques and follow them with a recommendation of a good, effective, practical approach.

Spam detection software

Spam detection software can run on an email server (typically operated by a corporation like SAFEnSOUNDmail or Internet Service Provider) or within an email client (the program you use to read and write email on your computer, e.g. Microsoft Outlook, Mac Mail, Thunderbird, etc.). These systems use various techniques to identify spam and they vary in their effectiveness. The most commonly used method is simple analysis. This looks at the content of the message and looks for certain keywords. Because spammers are wise to this it is common to see words deliberately misspelled in spam messages. For example, a spammer might write "pi11s" instead of "pills", using 1's instead of l's. In the ever evolving battle between anti-spam software writers and the spammers, the software writers try to detect all these variations.

In addition to simple analysis, some tools use various types of what is known as Baysian analysis. This is a machine learning technique whereby the software is given examples of both spam and non-spam messages and it attempts to learn the differences between them. It then uses this knowledge to analyze the content of an email. One of the powers of this technique is that if you take the time to "train" the system by providing it with examples of the type of spam you receive and the type of genuine email you receive, it adapts to your own personal requirements. Keep in mind that what constitutes spam to one person might not be unwanted by someone else; if you're in the market for non prescription medication then you might be pleased to receive emails about it. Not only that but all kinds of bulk messages can be identified (incorrectly) as spam, such as mailing lists, notifications and so on.

Spam detection software also typically analyzes the structure and content of the "headers" of an email. These are the details of the sender, the recipient, the date and time the message was sent, the servers through which it flowed and other information about the email. Spammers often don't follow the conventions of normal email in these regards and the software can detect these anomalies.

These techniques are generally used in combination and emails are typically given scores, accumulating points for each indicator that the message might be spam. A threshold is set, which can sometimes be adjusted by the user, and if the email exceeds that threshold it is treated as spam.

Whitelists

Many anti-spam systems allow whitelists. Whitelists can exist on both email servers and email client progams. This is a list of email addresses, maintained by the user. If the sender of an email appears on this list then email they send is immediately classified as not spam. Some anti-spam systems take the highly restrictive approach of accepting mail only from people on a whitelist. This means that you can never receive an email from somebody who you've not yet added to your whitelist. Even this approach, which is quite cumbersome to operate, does not guarantee that spam won't be received, because sometimes spammers will send their spam emails purporting to be from someone who is already on your whitelist. This is particularly true in the case of spammers gathering your email address from a compromised address book as described above.

Blacklists

Blacklists operate by simply rejecting email from any sender on that blacklist. As with whitelists, blacklists can exist either on the email server or the client email program. The idea is that as a user you added each spam mail sender to your blacklist. This is particularly effective at preventing online bullying. However, this is a particularly ineffective technique because spammers long ago started to use a variety of email addresses from which to send their email.

Realtime Blacklists

Realtime blacklists are special lists constructed by groups of people and made available to share. These blacklists have a wide variety of contributors, so the effort of identifying spam is shared amongst many people. When a message comes in the contents of it and/or the sender of it is checked against a realtime blacklist and if it appears on that list it's rejected. These blacklists are quite effective, but they don't catch all spam messages. They can take some time to update and spam moves very quickly.

Greylisting

Despite the name, greylisting is quite distinct from white and blacklisting. What happens is that when a process attempts to deliver a mail to the mail server, the server checks to see if this is the first time it has heard from this sender. If it is, it replies with a code that translates to "I am temporarily unable to accept your email, please try again later". A legitimate email sender will understand this instruction and will try again later, typically after 15 to 60 minutes. When it tries again, the server notices that it's the same sender trying again and accepts the email. The reason this is effective is that spammers generally don't try again. They are using simple automated processes which don't have a way of handling responses, so they just ignore that recipient and move to the next. Grey listing is not a common technique but is highly effective. However, its efficacy may reduce in years to come as spammers work around it.

Challenge and response

A challenge and response system works by first checking incoming email against a whitelist. Email from a sender on the whitelist is allowed to flow through to your inbox. Email from a sender not on your whitelist generates an email reply called a "challenge". Typically the challenge explains to the sender that as a precaution against spam, they must reply to that challenge email: the "response". Once this reply is received then the sender is whitelisted and the original email they sent is delivered to the original recipient. The theory here is that spammers, who use automated systems to send their emails, are generally not sophisticated enough to reply to a challenge. Additionally, spammers rarely use legitimate email addresses as the "from" address for their emails. However, herein lies a significant problem with this otherwise fairly effective anti-spam mechanism. Imagine a scenario whereby a spammer sends you an email, but purports that the email comes from steven@xyz.com. Your challenge and response system, which always runs on an email server, sends a challenge reply to steven@xyz.com. Unfortunately, Steven never was the original sender and doesn't know why you're sending him a response. Worse still, if Steven also uses a challenge and response system, his system will send a challenge to your challenge. Additionally problems exist for things like notification emails, for example those that you might receive from an online retailer when you purchase something. These systems are generally unable to process your challenge response, so you never receive the email. The solution to this problem is manually adding to the whitelist and/or checking the pending queue of incoming messages, if your particular anti-spam challenge and response system provides it.

Challenge and response is very effective, but the need for manual intervention and the generation of lots of extra emails, some of which are not delivered to unintended recipients, does make it a less than ideal solution.

SPF and DomainKeys

Sender Policy Framework (SPF) and DomainKeys are complex techniques that have been introduced by some of the large free email providers like HotMail and Yahoo. The idea is that they can be used to supplement the Simple Mail Transfer Protocol and provide additional protection, by attempting to eliminate the forgery of the source of emails. However, until they gain universal acceptance, which they may never do, they are of limited use. Some spam detection systems do use these mechanisms, but they generally only count towards a spam likelihood calculation, because they cannot be relied on completely until they are in use by everyone.

Sender Verify

Sender Verify is a technique implemented by some of the more sophisticated email servers. What happens is that before an email is accepted for delivery the server attempts to verify the legitimacy of the senders email address. The mail server essential says "Okay, so I have this email address that you say you're sending from; if I tried to send a reply, what would happen?" If an address is a fictitious one, as it often is with spam email, then this verification process will fail. If this happens then the email server can refuse to accept the message.

Moderated email

A method that involves all incoming mail being moderated. The moderator reads the email and verifies it is not spam before passing it to the intended recipient. It is possible, though rare, to use a human being, typically located in a low-wage country to perform this task. More commonly, for example, an assistant on behalf of an executive or a parent on behalf of a child may undertake it. This technique is very effective at blocking spam and is the only 100% reliable way of blocking all spam. However, it does have the disadvantage of being somewhat labour intensive, particularly in the case of an executive that may receive potentially large numbers of email each day. Its success also depends on the efficiency of the mechanism used for approving or rejecting emails.

Limited use email addresses

Some providers offer the ability to create additional email addresses which are limited in some manner. Typically these addresses might work for a limited period of time or receive a limited number of incoming messages. These can be quite effective but take effort to create and manage and thus are not a very useful means of controlling spam.

Dedicated email addresses

This is a simple and effective technique whereby when you make known your email address for any purpose, typically when signing up for a service on a website, you use a unique email address just for that one purpose. Not many email providers support this useful technique but it offers two useful features. For example, if your usual email address is "fred@email.com", you enter your email address as something like "fred-ebay@email.com". Now if the address "fred-ebay@email.com" were to fall in to the hands of spammers, you could set up a rule within your email system to reject all email to that address, without stopping all your other email arriving at your usual address. Additionally, you would know the source of the illicit email address usage.

Recommended anti-spam measures

As mentioned above, the most successful method of blocking spam is to use a multi-layered approach consisting of several of the above techniques as used by SAFEnSOUNDmail. A good approach must not only block as much spam as possible, but also allow through legitimate emails and take the minimum amount of manual effort. A comprehensive solution might look like this:

Additionally, in the cases where avoiding spam is particularly important, such as for children, using a final layer of moderated email is strongly advisable.

This multi-layered approach is that taken by SAFEnSOUNDmail. The vast majority of spam is rejected by the system and generally speaking only legitimate email is passed to the parent for moderation. In the event that a spam mail does get through the initial filtering, it's a simple task for the parent to reject it (indeed, if they do nothing and don't explicitly approve the message, it's never delivered to the child).

Other techniques for avoiding being a targeted by spammers

Despite the widespread nature of spam there are various techniques you can adopt to minimize the amount of spam you receive.

Avoid publishing your email address. Though it won't stop all spam, if nobody has your email address it will receive far less spam. Avoid publishing it on websites, blogs, forums and suchlike. Be careful when signing up for things online - check the privacy policy to make sure they won't sell your email address to a spammer. In practice, this is difficult - nobody likes reading pages and pages of small print. In reality, it the website is a reputable business then you're probably safe. If you must display your email address on a publicly accessible source, try disguising it to make it hard for a computer to recognize as an email address, but still readable by humans. Keep in mind however that spammers are becoming wise to this tactic and will often be able to decode the common methods of doing this such as writing "steven at hotmail dot com".

When children sign up to kids websites, it is good practice for them to form the habit of adding trusted sources to their whitelists or address books. This way they can ensure the safe, trusted mail isn't lost as spam.'

It is often recommended that you shouldn't reply to spam. However, this is something of a fallacy in some respects - the vast majority of spam messages don't have legitimate reply addresses anyway. However, there is a good reason not to reply to spam messages and that is that some of the more sophisticated anti spam systems, including that used by SAFEnSOUNDmail, will automatically whitelist anybody to whom you send email.

Similarly, the suggestion that you should avoid using the "unsubscribe" mechanism often included in mass emails because it validates your address to the spammer is also common. This isn't bad advice but it needs to be applied selectively. If the email is a true spam message, from a spammer, then you should not try to follow the unsubscribe instructions as the spammer is unlikely to honour your request to not receive further spam. However, if the sender is a legitimate business, the chances are they will stop sending you spam if you follow the instructions. These vary from company to company. There is no easy answer to which times you should try to unsubscribe and which times you shouldn't; you can only use your best judgement.

Rejecting Spam

The issue of rejecting spam messages is not straightforward. There are two basic approaches. Some systems choose to "bounce" a message they believe to be spam. This means sending an automated reply saying that the message has not been accepted because it looks like spam. The second approach is to silently ignore the spam message - deleting it so that it's never seen or sometimes placing it in a spam or junk folder where it can be reviewed as required.

Each of these approaches has problems associated with it. Sending a bounce message is inefficient if the message is genuinely spam. The sender, the spammer, will never see it - usually the return address is fake and even if it's real, the spammer has no interest in reading responses. Often times, a bounce message will itself bounce - creating useless and needless internet traffic. Worse still, if bounce message gets sent to a fake email address that actually accepts incoming email, the owner of that email account is bombarded with a huge number of bounce messages, for emails he or she never sent in the first place.

The second approach avoids all this extra email bouncing by silently ignoring the message. This works well for genuine spam messages - they get deleted and no extra traffic is generated. The problem comes when a legitimate email is identified as spam - a false positive. In this case, the sender doesn't know the email wasn't received. The recipient doesn't know the sender sent it. The email silently disappears.

The best solution to rejecting spam is rather complex; it is to reject the message during the SMTP transaction. To understand it fully requires a detailed understanding of the way SMTP works, which is beyond the scope of this article. Essentially what happens is that instead of undertaking the various techniques for identifying spam after the server has received the message, they are undertaken whilst the mail server is receiving the mail from the sender. The beauty of doing it at this time is that the receiving mail server can elect to inform the sending process that it is not going to accept the email. This is distinct from bouncing the message, which results in a completely new email being created (the bounce message) and sent to the purported sender. Instead, the rejection is passed directly to the actual sender, at the time of the sending. There is no new email generated and the notification only ever goes to the actual process that was sending the message, not its purported sender.

This solution is by far the best, but unfortunately is implemented by very few email providers. SAFEnSOUNDmail uses this solution in combination with further processing once the email has been accepted, followed of course by parental moderation.

While there is no guaranteed way to avoid all spam, taking sensible precautions and using a sophisticated email provider like SAFEnSOUNDmail is a simple and effective way to protect your family email from spam and unwanted emails.