Toby Howard :: Toby's Tips :: Foreign-language spam


A simple tactic for identifying spam mail written in languages you're not interested in is to note that many such messages list a Windows encoding (usually a one-byte encoding) in their Subject line.

For example, this mail

   Subject: Важная информация для Руководителя

has this message source:

  Subject: =?windows-1251?B?wuDm7eD/IOjt9O7w7OD26P8g5Ov/INDz6u7i7uTo8uXr/w==?=

("The important information for the Head", according to rustran.com.)

So, make a filter in your mailreader to look for mails with Subject lines containing "windows-1251" and chuck them away. You might also include another common Russian encoding, koi8.

There are lots of Windows encodings (Microsoft page):

1250 (Central Europe)
1251 (Cyrillic)
1252 (Latin I)
1253 (Greek)
1254 (Turkish)
1255 (Hebrew)
1256 (Arabic)
1257 (Baltic)
1258 (Vietnam)
874 (Thai)

So you can choose whichever you don't fancy.

Of course there are other ways to encode non-English characters (UTF-8 etc) so this simple scheme won't work on those. But it does seem to catch a lot of spam, especially cyrillic.


Home                                                                                                                    

site stats