Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Save the Forum - A regular clean sweep

With many visitors and a great deal of exposure, our Save The Train web site gets the attention of unwelcome content providers - people who will come on to our forum or blog and post articles and comments that are way off topic. Why do they do it? Primarily to sell their pharmaceutical products, loans, betting schemes to the search engines - to get themselves ranking on our good name and popularity. Unfortunately, such posts also dilute our content, lower our ranking and at times shock and offend some of our readers. How to "solve" the problem [on the forum]?

We COULD go for a manually authorised signup procedure and (at the levels we're looking at) the three moderators of the forum could cope with this. But it adds an extra hurdle into the loop for newcomers and it's likely to put them off having to wait, perhaps a few hours, before they can make their first post.

We COULD use a captcha scheme where the new arrival has to retype a series of letters - great against the "autobots" but more and more of these signups are made by paid workers in low-wage parts of the world - kids there doing it for minimal pocket money.

We COULD add a filter in to refuse messages as they're posted which match a pattern that we want to reject - but the posters would know straight away that their payload had not been placed, and would be flagged to look for alternatives.

So what's the solution? There's no "100% solution" that I know of, but I have implemented a "clean sweep" systems that goes around the boards from time to time, deleting posts which conform to certain criteria. It's run automatically under "crontab" so there no need for any interaction of my / our administrator's part. It's been tuned to err on the side of saftey - in other words, any genuine newcomer is highly unlikely to have his / her first post killed. And it means that our board-spammers leave thinking that they have successfully delivered their payload.

If anyone would like to use the algorithm on their own board ... here's my SQL that finds the rogue posts. It would, mind you, need individual tuning.
select id_msg, smf_messages.id_member, posts, totalTimeLoggedIn, membername from smf_messages left join smf_members on smf_messages.id_member = smf_members.id_member where posts < 2 and (body like "%[url%[url%" or body like "%href%href%" ) and body not like "%train%" and body not like "%wilts%" and body not like "%station%" and body not like "%swindon%" and id_msg > 5000 order by id_msg

Disadvantages?
* A few spam messages make it through and still need manual deletes
* Users will see occasional recent spams before they are deleted
* The "latest post" for each board isn't recalculated; a good clue to us "in the know" that we have trapped a spam post, but perhaps a "bug" to users
* Rare chance of deleting a genuine post.

(written 2007-05-17, updated 2007-05-18)

Commentatorsays ...
Alex:Akismet (http://akismet.com/) is a nice solution that we use on 24dash.com and I use on my blog. Basically, the comment/forum post/form entry is posted to Akismet and it decides if it is Spam or not. On the odd occasion it gets it wrong you tell it so and it learns from the mistake. I've found on my blog I get the very occasional false positive, but it's never let a spam through yet.

By the way, this is also my last day with allpay.net :-D
(comment added 2007-05-18 13:51:24)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
G903 - Well House Consultants - Running and moderating forums and social media sites
  [4492] Almost so wrong, but perhaps it's right for some? - (2015-05-11)
  [4403] The unbalanced relationship between customer and provider - (2015-01-21)
  [4315] Welcoming genuine forum posters quickly - but turning away off topic advertisers - (2014-11-16)
  [4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
  [4283] Can a legitimate forum post become illegal a year later? - (2014-07-11)
  [4239] Facebook marketing - early experiences - (2014-01-19)
  [4234] Change to Libel and Defamation laws from 1st January 2014 - (2013-12-31)
  [4065] Handling requests to a forum - the background process - (2013-04-17)
  [4025] Backups, Codebase, Strategy and more - dealing with forum incidents - (2013-03-03)
  [4017] Acceptable User Policy / vexatious interacter - (2013-02-24)
  [3910] Identifying your real customers and keeping them well informed fast - (2012-11-02)
  [3479] Practical Extraction and Reporting - using Python and Extreme Programming - (2011-10-14)
  [2820] Netiquette for forum newcomers - (2010-06-20)
  [2781] The 500 pound question to get you started - (2010-05-26)
  [2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
  [2527] Flying tonight - (2009-12-05)
  [2526] A reluctance to move from old shoes to new - (2009-12-05)
  [2386] Computing under the influence of alcohol - (2009-08-29)
  [2254] Forum membership - a privilege not a right - (2009-06-22)
  [2177] Preventing forum spam - checks at sign up - (2009-05-12)
  [2162] Admins thoughts on banning a member from a forum - (2009-05-09)
  [2156] Stopping forum spam - control of the signup process - (2009-05-04)
  [2116] Why do we delay new forum members through authorisation? - (2009-04-03)
  [2103] Ask the Tutor - Open Source forum - (2009-03-25)
  [1972] Pettifog and forum boards away from public view - (2009-01-03)
  [1923] Making it all worthwhile - (2008-12-04)
  [1759] While the world sleeps ... - (2008-08-19)
  [1678] Software - changes and delays. But courses must run on time! - (2008-06-15)
  [1595] First Great Western Weekend - (2008-03-30)
  [1578] Please don't shout at me! - (2008-03-16)
  [1569] I dont care - goodbye - (2008-03-09)
  [1563] Guidlines for posting on a forum - (2008-03-04)
  [1539] A forum is not always the best vehicle - (2008-02-14)
  [1532] Comment spam blocked. Please comment via Forums - (2008-02-05)
  [1523] Ive just received an email from myself. Should I be worried? - (2008-01-29)
  [1485] Copyright and theft of images, bandwidth and members. - (2007-12-26)
  [1472] The Horse goes on and on - (2007-12-15)
  [1362] No Thank You - (2007-09-23)
  [1088] Why use BBC code not HTML? - (2007-02-21)
  [948] Running an on line campaign - (2006-11-27)
  [923] Why shouldn't I spam? - (2006-11-13)
  [919] Freedom for X is denial of privacy for Y - (2006-11-09)
  [841] Forum help - a push in the right direction - (2006-08-21)
  [828] Freedom of speech and freedom to post - (2006-08-10)
  [806] Check your user is human. Have him retype a word in a graphic - (2006-07-17)
  [651] Please Register with Opentalk - but just once! - (2006-03-19)
  [516] Open source questions? Anyone can ask. - (2005-12-03)
  [424] How not to run a forum - (2005-08-24)
  [248] Use me, but use me effectively - (2005-03-16)
  [231] Feedback as lifeblood - (2005-02-28)
  [204] The confidence to allow public comments - (2005-02-06)
  [130] Spelling and grammar - (2004-11-25)
  [115] Expiration dates or times on web pages - (2004-11-12)
  [29] Silence is Golden - (2004-08-26)
  [22] Falling out over the silliest things - (2004-08-21)


Back to
Meet, greet and welcome
Previous and next
or
Horse's mouth home
Forward to
Smart English Output - via PHP and Perl ? : operator
Some other Articles
Drawing hands on a clock face - PHP
Dangerous Dogs and Hotel Marketing
What are WEB-INF and META-INF directories?
Smart English Output - via PHP and Perl ? : operator
Save the Forum - A regular clean sweep
Meet, greet and welcome
What shape is your shake?
Updating a page strictly every minute (PHP, Perl)
Two new pages / sites
Themes for the web site
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/1190_Sav ... sweep.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat