Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
20.9.2014 - We have just updated our course layouts and descriptions and added our 2015 schedule.

Information request forms, cleaning up spam

We've been discovered! Or rather ... our brochure request form has been discovered, just like the comment submission form to this diary has been discovered, by "spam engines".

These "spam engines" locate web forms, then complete them with information about on line gaming, pharmacutical products, and other goods and services that we're not interested in. They're characterised by including a very high proportion of links - especially in text areas. I believe that they're hoping to find forms that will let them post information onto bulletin boards and other web sites ....

How to deal with this nuisance? I've amended our information request form response script to compare the length of the text entered "raw" with the length of the text entered once "href" tags are stripped out ... and if it shrinks by a third or more, it's probably a spam. It's hard to be sure, so I'm now in a testing phase that simply marks the emails sent by the brochure request system.

Code (In Perl) to accumulate the full and stripped lengths - run on each field of the form

$full_length += length($value);
$value =~ s/<a\s+href[^>]+>/ /ig;
$stripped_length += length($value);

Code that evaluates whether or not the posting is a spam

$spamfactor = $full_length / $stripped_length;
if ($spamfactor > 1.4) {
$extraword = "SPAM";
} else {
$extraword = "OK";
}

Note that I have also initialised the $full_length and $stripped_length variables to 1 not 0, in case anyone (or any automata) submits a blank form
(written 2005-04-05, updated 2006-06-05)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
G902 - Well House Consultants - Web site techniques, utility and visibility
  [4239] Facebook marketing - early experiences - (2014-01-19)
  [4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
  [4115] More or less back - what happened to our server the other day - (2013-06-14)
  [4076] Web site - fully back! - (2013-04-29)
  [4001] Helping search engines with appropriate 400 error codes - (2013-02-11)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [3896] An email marathon - (2012-10-15)
  [3776] Some traps it's so easy to fall into in designing your web site - (2012-06-23)
  [3745] Legal change - You need to obtain user consent if you use cookies on your website - (2012-06-01)
  [3744] Short Web Addresses for Melksham - (2012-05-30)
  [3734] QR codes with marketing logos embedded - (2012-05-16)
  [3623] Some TestWise examples - helping use Ruby code to check your web site operation - (2012-02-24)
  [3589] Promoting a single one of your domains on the search engines - (2012-01-22)
  [3563] How big is a web page these days? Does the size of your pages matter? - (2011-12-26)
  [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
  [3532] Sharing the user experience - designing a form with the customer in mind - (2011-11-29)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3426] Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09)
  [3367] Google +1 - what is it? - (2011-07-22)
  [3197] Finding and diverting image requests from rogue domains - (2011-03-08)
  [3149] Looking back at www.wellho.net - (2011-01-28)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3022] Retaining web site visitors - reducing the one page wonders - (2010-10-31)
  [2981] How to set up short and meaningfull alternative URLs - (2010-10-02)
  [2668] Is it worth it? - (2010-03-09)
  [2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)
  [2532] Analysing Google arrivals by country of origin - (2009-12-10)
  [2519] Status Page / breaks of service in early December - (2009-11-30)
  [2410] Removal of technical resources from this site - (2009-09-19)
  [2389] Writing with our customers words - (2009-09-01)
  [2341] Koulutus, Open Source tietokone kielillä - (2009-08-09)
  [2340] ldning, Open Source dator språk - (2009-08-09)
  [2339] Opplæring, Open Source datamaskinen språk - (2009-08-09)
  [2338] Uddannelse, Open Source computer sprog - (2009-08-09)
  [2337] Opleiding, Open Source computertalen - (2009-08-09)
  [2336] Formação, Open Source computador línguas - (2009-08-09)
  [2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
  [2334] Formazione, Open Source computer lingue - (2009-08-09)
  [2333] Formación, de los lenguajes de código abierto - (2009-08-09)
  [2332] Formation, des langages Open Source - (2009-08-09)
  [2225] How important is a front page ranking on a search engine? - (2009-06-09)
  [2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
  [2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
  [1982] Cooking bodies and URLs - (2009-01-08)
  [1970] Plagarism - who is copying my pages? - (2009-01-02)
  [1961] Making our things easier to find - (2008-12-26)
  [1955] How to avoid duplicating web page maintainance - (2008-12-20)
  [1888] Find the link - (2008-11-16)
  [1856] A few of my favourite things - (2008-10-26)
  [1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
  [1797] I have been working hard but I do not expect you noticed - (2008-09-14)
  [1793] Which country does a search engine think you are located in? - (2008-09-11)
  [1756] Ever had One of THOSE mornings? - (2008-08-16)
  [1747] Who is watching you? - (2008-08-10)
  [1711] Rapid growth leads to server move - (2008-07-17)
  [1653] How do Google Ads work? - (2008-05-25)
  [1634] Kiss and Book - (2008-05-07)
  [1630] To provide external links, or not? - (2008-05-04)
  [1610] PHP course dot co, dot uk - (2008-04-13)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1541] Colour, Composition or Content - (2008-02-16)
  [1534] Where in the world / country is my visitor from? - (2008-02-07)
  [1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
  [1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
  [1505] Script to present commonly used images - PHP - (2008-01-13)
  [1494] A time to update pictures - (2008-01-03)
  [1437] Above the fold with First Great Western - (2007-11-19)
  [1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [1212] What brought YOU to our web site? - (2007-06-01)
  [1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
  [1198] From Web to Web 2 - (2007-05-21)
  [1186] Two new pages / sites - (2007-05-14)
  [1184] Finding resources - some pointers - (2007-05-13)
  [1177] Sorting out for a site map - (2007-05-05)
  [1104] Drawing dynamic graphs in PHP - (2007-03-09)
  [1055] Above the fold - (2007-01-28)
  [1029] Our search engine placement is dropping. - (2007-01-11)
  [1015] Search engine placement - long term strategy and success - (2006-12-30)
  [994] Training on Cascading Style Sheets - (2006-12-17)
  [976] Santa at the station - (2006-12-09)
  [916] Driving customers away - (2006-11-07)
  [893] Visibility - (2006-10-14)
  [800] Effective web campaign? - (2006-07-12)
  [767] Finding the language preference of a web site visitor - (2006-06-18)
  [757] Horse and Python training - (2006-06-12)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [718] Protecting images from theft - (2006-05-12)
  [681] Mirroring a dynamic site - (2006-04-12)
  [658] Keeping the visitors happy and browsing - (2006-03-26)
  [649] Denial of Service ''attack'' - (2006-03-17)
  [533] Bigger Box Campaign - (2005-12-18)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [510] Dynamic Web presence - next generation web site - (2005-11-29)
  [492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
  [414] Form Madness - (2005-08-14)
  [376] What brings people to my web site? - (2005-07-13)
  [369] CMS - the minefield of Choices - (2005-07-05)
  [348] Graveyard pages - (2005-06-15)
  [347] Frightening and from-friend viruses and spams - (2005-06-14)
  [322] More maps - (2005-05-23)
  [320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
  [314] What language is this written in? - (2005-05-17)
  [311] Growth pains - (2005-05-14)
  [288] Colour blindness for web developers - (2005-04-22)
  [284] The Iconish language - (2005-04-19)
  [278] Cover all the options - (2005-04-13)
  [276] An apology to Mr Boneparte - (2005-04-11)
  [274] Our most popular resources - (2005-04-10)
  [261] Putting a form online - (2005-03-29)
  [259] Responding to spam - (2005-03-27)
  [222] Who are all these visitors? - (2005-02-20)
  [202] Searching for numbers - (2005-02-04)
  [197] Allow for peak traffic on your web site - (2005-02-01)
  [182] Your personal Google ranking - (2005-01-19)
  [179] The hunt for unique words - (2005-01-16)
  [173] Data Mining - (2005-01-09)
  [165] Implementing an effective site search engine - (2005-01-01)
  [142] Colour for access - (2004-12-06)
  [117] A case of case - (2004-11-14)
  [109] URLs - a service and not a hurdle - (2004-11-04)
  [98] No more 'Error 404' pages. Something better. - (2004-10-24)
  [32] Web design platoon - (2004-08-29)
  [23] Skills and responsibilities - (2004-08-22)

G909 - Well House Consultants - Spam, Spamming and Spammers
  [4135] Introducing your product to Well House Consultants - single, personally tuned email please - (2013-07-08)
  [3946] Moving from a warning system to a control system - PHP, forum spammers - (2012-12-07)
  [3912] Sand to Arabia, Coals to Newcastle or Woodburners to Russia - (2012-11-04)
  [3910] Identifying your real customers and keeping them well informed fast - (2012-11-02)
  [3661] Keeping forum and blog comments clean - (2012-03-19)
  [3506] Cold call contacts - preference services and turning off spam sales approaches - (2011-11-03)
  [3352] World Trade Register - Certainly NOT worth 2985 Euros. - (2011-07-09)
  [3316] Twitter Phishing Trips ... and a great new alert service - (2011-06-04)
  [3190] What do the following web sites have in common? - (2011-03-03)
  [3166] Well house is strong - confirmed? - (2011-02-11)
  [3016] The legal considerations of your web presence - revisited - (2010-10-26)
  [2884] Hotlinked images onto adult material sites - (2010-07-23)
  [2697] Email metrics and filtering - (2010-03-28)
  [2398] Websitemediasolution and a goldfish called Carl Johnson - (2009-09-06)
  [2276] Who is Marc Schneider of Multilingual Search Engine Optimization Inc - (2009-07-10)
  [2179] Offers that I can refuse - (2009-05-12)
  [2177] Preventing forum spam - checks at sign up - (2009-05-12)
  [2019] Baby Caleb and Fortune City in your web logs? - (2009-01-31)
  [1978] From spam to mod_alias - finding resources - (2009-01-05)
  [1817] Marc Schneider is still having email trouble - (2008-09-30)
  [1763] Co-operating to save, yet we dont - (2008-08-21)
  [1532] Comment spam blocked. Please comment via Forums - (2008-02-05)
  [1523] Ive just received an email from myself. Should I be worried? - (2008-01-29)
  [1115] Unexpected visitors to our site - (2007-03-22)
  [1037] Impact Engineering and Backscatter - (2007-01-16)
  [872] Email metrics - (2006-09-20)
  [495] More spam - a success story - (2005-11-13)
  [417] Telephone Preference Service - we're registered - (2005-08-17)
  [338] OO techniques are hard to teach - (2005-06-06)


Back to
Searching security holes
Previous and next
or
Horse's mouth home
Forward to
Free parking for short errands in Melksham
Some other Articles
More to programming than just programming
Different course every day
NULL in MySQL
Free parking for short errands in Melksham
Information request forms, cleaning up spam
Searching security holes
A beautiful place to live and learn
Business practise, 2005 style
100% Training
Harmony
4289 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/268_Info ... -spam.html • PAGE BUILT: Thu Sep 18 15:30:25 2014 • BUILD SYSTEM: WomanWithCat