Blacklist Bias

September 6, 2017

DNS blacklists and other classes of filters are defense mechanisms with biases that affect their performance depending on geography, language and other factors. Why is that, how does it affect email senders and what can be done about it?

How does the bias affect email senders?

The effect can vary wildly depending on where you send your email from and where are you sending it to. Email senders have little choice on the “where to send” part as this is determined by their mailing list composition. List composition in turn is affected by factors such as language, geography, business sector and possibly others. This is an area where we can help a lot.

As an example, consider that email senders wishing to engage with Chinese recipients will likely have to deal with delivering email to qq.com, while others not interested in that demographic could very well accept sub-par deliverability there.

However, email senders can decide where to send their email from. This is a decision that can get that bias we’re talking about, working for or against your email. Some sending locations will cause your email to be subjected to more scrutiny, while others might actually reduce the number of checks applied.

There are no quick and easy rules. The key here is testing and comparing. Try to send a subset of your mailing list from prospective IP addresses for a few weeks to a few months and see what the results are. In some cases you’ll be surprised what a simple change like this can mean to the effectiveness of your campaigns.

When assisting clients in Spanish-speaking countries, we often found that DNS blacklists would show far less effectiveness and false positive rates than what we observed at customers based in the US. And this is undesirable for both email senders and receivers.

Where is the bias coming from?

The primary driver for bias can be described as a sampling bias, meaning that the data used to power the DNS blacklist or filter does not have a truly uniform distribution. Keep in mind that the data used to publish DNS blacklists and train other classes of filters often comes from spamtraps, which tend to capture certain classes of email.

For instance, spamtraps associated with domain names or email addresses used in financial applications will tend to receive more spam related to that class of goos or services. Likewise, spamtraps associated with English language will receive few — if any — pieces of spam in other languages.

This makes it hard or impossible for anti-spam operators to tune the effectiveness of the defense — DNS blacklist or filter — properly, as there might not be enough representation of spam from a given region. Sometimes, a lack of good email — often called “ham” — inverts the biases, causing too aggressive listings.

The lack on variety and quality of email samples biases the results of any analysis performed on that email, because the messages will only represent a fraction of all the spam out there. Even large blacklists with access to large groups of spamtraps suffer of this bias because of the different social and economic factors at play on each country.

What can be done about it?

The sad truth is that there’s little that can be done. Currently there’s a large difference in number of users between coutries. With less Internet users in a specific country, it’s natural to assume that users from said country will be under-represented in terms of spamtraps capturing spam destined to them, so they’ll have to cope with more spam leaking through filters.

Other than that, reporting the spam you receive as an end user might be the best way to help until such time as spamtrap collections are expanded to cover all the under-served regions of the world.

As an email sender, you need to be alert with your metrics and be open to test sending your email periodically from other locations. When promising locations are found, consider moving part of your mailing list, to spread the risk.

What do you think? Hit us up on Twitter and let us know!