With many visitors and a great deal of exposure, our
Save The Train web site gets the attention of unwelcome content providers - people who will come on to our forum or blog and post articles and comments that are way off topic. Why do they do it? Primarily to sell their pharmaceutical products, loans, betting schemes to the search engines - to get themselves ranking on our good name and popularity. Unfortunately, such posts also dilute our content, lower our ranking and at times shock and offend some of our readers. How to "solve" the problem [on the forum]?
We COULD go for a manually authorised signup procedure and (at the levels we're looking at) the three moderators of the forum could cope with this. But it adds an extra hurdle into the loop for newcomers and it's likely to put them off having to wait, perhaps a few hours, before they can make their first post.
We COULD use a
captcha scheme where the new arrival has to retype a series of letters - great against the "autobots" but more and more of these signups are made by paid workers in low-wage parts of the world - kids there doing it for minimal pocket money.
We COULD add a filter in to refuse messages as they're posted which match a pattern that we want to reject - but the posters would know straight away that their payload had not been placed, and would be flagged to look for alternatives.
So what's the solution? There's no "100% solution" that I know of, but I have implemented a "clean sweep" systems that goes around the boards from time to time, deleting posts which conform to certain criteria. It's run automatically under "crontab" so there no need for any interaction of my / our administrator's part. It's been tuned to err on the side of saftey - in other words, any genuine newcomer is highly unlikely to have his / her first post killed. And it means that our board-spammers leave thinking that they have successfully delivered their payload.
If anyone would like to use the algorithm on their own board ... here's my SQL that finds the rogue posts. It would, mind you, need individual tuning.
select id_msg, smf_messages.id_member, posts, totalTimeLoggedIn, membername from smf_messages left join smf_members on smf_messages.id_member = smf_members.id_member where posts < 2 and (body like "%[url%[url%" or body like "%href%href%" ) and body not like "%train%" and body not like "%wilts%" and body not like "%station%" and body not like "%swindon%" and id_msg > 5000 order by id_msg
Disadvantages?
* A few spam messages make it through and still need manual deletes
* Users will see occasional recent spams before they are deleted
* The "latest post" for each board isn't recalculated; a good clue to us "in the know" that we have trapped a spam post, but perhaps a "bug" to users
* Rare chance of deleting a genuine post.
(written 2007-05-17, updated 2007-05-18)
Commentator | says ... | Alex: | Akismet (http://akismet.com/) is a nice solution that we use on 24dash.com and I use on my blog. Basically, the comment/forum post/form entry is posted to Akismet and it decides if it is Spam or not. On the odd occasion it gets it wrong you tell it so and it learns from the mistake. I've found on my blog I get the very occasional false positive, but it's never let a spam through yet.
By the way, this is also my last day with allpay.net :-D (comment added 2007-05-18 13:51:24) |
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
G903 - Well House Consultants - Running and moderating forums and social media sites [22] Falling out over the silliest things - (2004-08-21)
[29] Silence is Golden - (2004-08-26)
[115] Expiration dates or times on web pages - (2004-11-12)
[130] Spelling and grammar - (2004-11-25)
[204] The confidence to allow public comments - (2005-02-06)
[231] Feedback as lifeblood - (2005-02-28)
[248] Use me, but use me effectively - (2005-03-16)
[424] How not to run a forum - (2005-08-24)
[516] Open source questions? Anyone can ask. - (2005-12-03)
[651] Please Register with Opentalk - but just once! - (2006-03-19)
[806] Check your user is human. Have him retype a word in a graphic - (2006-07-17)
[828] Freedom of speech and freedom to post - (2006-08-10)
[841] Forum help - a push in the right direction - (2006-08-21)
[919] Freedom for X is denial of privacy for Y - (2006-11-09)
[923] Why shouldn't I spam? - (2006-11-13)
[948] Running an on line campaign - (2006-11-27)
[1088] Why use BBC code not HTML? - (2007-02-21)
[1362] No Thank You - (2007-09-23)
[1472] The Horse goes on and on - (2007-12-15)
[1485] Copyright and theft of images, bandwidth and members. - (2007-12-26)
[1523] Ive just received an email from myself. Should I be worried? - (2008-01-29)
[1532] Comment spam blocked. Please comment via Forums - (2008-02-05)
[1539] A forum is not always the best vehicle - (2008-02-14)
[1563] Guidlines for posting on a forum - (2008-03-04)
[1569] I dont care - goodbye - (2008-03-09)
[1578] Please don't shout at me! - (2008-03-16)
[1595] First Great Western Weekend - (2008-03-30)
[1678] Software - changes and delays. But courses must run on time! - (2008-06-15)
[1759] While the world sleeps ... - (2008-08-19)
[1923] Making it all worthwhile - (2008-12-04)
[1972] Pettifog and forum boards away from public view - (2009-01-03)
[2103] Ask the Tutor - Open Source forum - (2009-03-25)
[2116] Why do we delay new forum members through authorisation? - (2009-04-03)
[2156] Stopping forum spam - control of the signup process - (2009-05-04)
[2162] Admins thoughts on banning a member from a forum - (2009-05-09)
[2177] Preventing forum spam - checks at sign up - (2009-05-12)
[2254] Forum membership - a privilege not a right - (2009-06-22)
[2386] Computing under the influence of alcohol - (2009-08-29)
[2526] A reluctance to move from old shoes to new - (2009-12-05)
[2527] Flying tonight - (2009-12-05)
[2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
[2781] The 500 pound question to get you started - (2010-05-26)
[2820] Netiquette for forum newcomers - (2010-06-20)
[3479] Practical Extraction and Reporting - using Python and Extreme Programming - (2011-10-14)
[3910] Identifying your real customers and keeping them well informed fast - (2012-11-02)
[4017] Acceptable User Policy / vexatious interacter - (2013-02-24)
[4025] Backups, Codebase, Strategy and more - dealing with forum incidents - (2013-03-03)
[4065] Handling requests to a forum - the background process - (2013-04-17)
[4234] Change to Libel and Defamation laws from 1st January 2014 - (2013-12-31)
[4239] Facebook marketing - early experiences - (2014-01-19)
[4283] Can a legitimate forum post become illegal a year later? - (2014-07-11)
[4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
[4315] Welcoming genuine forum posters quickly - but turning away off topic advertisers - (2014-11-16)
[4403] The unbalanced relationship between customer and provider - (2015-01-21)
[4492] Almost so wrong, but perhaps it's right for some? - (2015-05-11)
Some other Articles
Drawing hands on a clock face - PHPDangerous Dogs and Hotel MarketingWhat are WEB-INF and META-INF directories?Smart English Output - via PHP and Perl ? : operatorSave the Forum - A regular clean sweepMeet, greet and welcomeWhat shape is your shake?Updating a page strictly every minute (PHP, Perl)Two new pages / sitesThemes for the web site