We've been discovered! Or rather ... our brochure request form has been discovered, just like the comment submission form to this diary has been discovered, by "spam engines".
These "spam engines" locate web forms, then complete them with information about on line gaming, pharmacutical products, and other goods and services that we're not interested in. They're characterised by including a very high proportion of links - especially in text areas. I believe that they're hoping to find forms that will let them post information onto bulletin boards and other web sites ....
How to deal with this nuisance? I've amended our
information request form response script to compare the length of the text entered "raw" with the length of the text entered once "href" tags are stripped out ... and if it shrinks by a third or more, it's probably a spam. It's hard to be sure, so I'm now in a testing phase that simply marks the emails sent by the brochure request system.
Code (In Perl) to accumulate the full and stripped lengths - run on each field of the form
$full_length += length($value);
$value =~ s/<a\s+href[^>]+>/ /ig;
$stripped_length += length($value);
Code that evaluates whether or not the posting is a spam
$spamfactor = $full_length / $stripped_length;
if ($spamfactor > 1.4) {
$extraword = "SPAM";
} else {
$extraword = "OK";
}
Note that I have also initialised the $full_length and $stripped_length variables to 1 not 0, in case anyone (or any automata) submits a blank form
(written 2005-04-05 06:32:03)
Associated topics are indexed under
G902 - Well House Consultants - Web site techniques, utility and visibilityG909 - Well House Consultants - Spam, Spamming and Spammers
Some other Articles
More to programming than just programmingDifferent course every dayNULL in MySQLFree parking for short errands in MelkshamInformation request forms, cleaning up spamSearching security holesA beautiful place to live and learnBusiness practise, 2005 style100% TrainingHarmony