Formspamcheck PHP Class.
Sept 6th, 2011
The phpList site was quite heavily hit by spam signups, so I decided to investigate what's available on the internet to do something about it. I found three services that can help to fight this kind of activity. I decided to make a class that uses all of them.
You can find the source here: FormspamCheck
The class is documented with phpDocumentor, and you can read it online.
There's no need to use all three services, just one or two can be used as well. When a service it not configured, it won't be used.
Quickest use is like this:
<?php$honeypotApiKey = 'My API key for Honeypot Project';$akismetApiKey = 'My API key for Akismet';$akismetBlogURL = 'http://www.example.com';include 'formspamcheck.class.php';$fsc = new FormSpamCheck($honeypotApiKey,$akismetApiKey,$akismetBlogURL);if ($fsc->isSpam(array('username' => 'someusername','email' => ''))) {print "This is spam";} else {print "This is ham";}
1. If you want to use this class, make sure to only do your calls on a POST request in your application. I made the mistake initially to do it on every request, causing an overload of the API of stop forum spam. They have a 20.000 limit of calls per day, and by noon I had reached that limit. Instead, when doing the call on a POST, the number of API calls are only a few hundred per day.
2. I've applied this class in both the phpList forums and the phpList Hosted signups. The class has a lot of logging built in, which is then used to graph the activity with munin . If you're interested in the munin plugins as well, let me know.
Here's an example. This graph shows the activity in the forums and the filtering applied by using this class and calling all three services. As you can see over 50% of activity in the forums is spam.
3. When using the class, you can do a comparative analysis of the blocking of spam by any of the three services. In my case, StopForumSpam (SFS) filters most. Occasionally Honeypot (HP) and Akismet (AKI) contribute, but most often a hit in SFS has misses in HP and AKI, and only rarely is a hit in HP or AKI a miss in SFS.
However, that may mean that SFS is filtering out more than necessary and is less lenient towards IPs and usernames. For example SFS won't allow anyone to sign up with the name "Ron", which you can verify with this API call. Now, I don't think that's a big problem, they can just register with a different name, but it depends on the context.
But it highlights the need to be specific with the class:
4. In an application you can ask the class whether a request is spam, but then return a few more details.
But whatever you do, it will be best to not make the spammers any wiser as to why you are blocking them.
$class->matchedBy will return which service considered it spam
$class->matchedOn will return what field was used to determine that. This will be "ip", "username", "email" or "unknown". This information is only available from SFS and HP will only check the IP. Akismet does not reveal what they used to determine something is spam (I asked but they won't).
So, if you do this, you may want to do something like this:
switch ($fsc->matchedOn) {
case 'username': return "that name is already taken, please choose another one";
case 'email': return "this email has already been registered";
default: return "error processing your request, please try again later";
}
}
5. The two main issues with a system like this are false positives and performance. For performance reasons, I've added memcached support and minimize calls to the APIs because each call will cause a delay. I'm tracking the delay and will add a graph for it. It looks like the average is well below half a second, with the occasional one that is more than a second. That is, when using all three services.
If you want to improve performance, use the "isSpam" call without the "checkAll" option. That will minimise the calls to the services. I have the checkAll on, in order to be able to graph a comparison between them.
As for false positives, I've kept an eye on it, and haven't seen any yet. That doesn't meant there aren't any. In many cases, when someone hits a wall, they'll just walk away, and won't tell you about it. I'll try to see if there's a way to measure this.
ResultsIt looks like in general Stop Forum Spam catches most spam, but Akismet and Honeypot contribute as well, and therefore the combination of the three is the best.
Here's the relevant bit of the graph showing the spam attack on the site. In a few months' time, I'll post an update, which should show a return to the original straight line of signups. This graph shows the registered users in the site.
(you can interpolate the gaps, they are caused by moving the munin server around).
Within the time span of a few weeks, over 10% of registered users were spam accounts.
graph: registered users until Sep 2011.
Continued on One Year Later