Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
20.9.2014 - We have just updated our course layouts and descriptions and added our 2015 schedule.

Spotting and stopping denial of service attacks

We had a problem when our web site access logs doubled in size in one day. It wasn't good selling on our part or a massive return to work after a public holiday; all the extra traffic came from a single location in (in our case) Slovakia.

HOW WE NOW PREVENT SUCH AGGRESSIVE BROWSING USING PHP

All of our pages use PHP and pull in a standard file of web helper routines (that's a chosen design feature that lets us make global changes very easily), so all I've had to do is to add a short section of code in there:

# Log current request

$nowsec = time();
$rip = $_SERVER[REMOTE_ADDR];
$wanted = $_SERVER[REQUEST_URI];
# Our database connection is already open ...
$logit = "INSERT INTO recent (tstamp, remoteaddr, calledfor) ".
 "values (".
        $nowsec . ", ".
        "\"$rip\", ".
        "\"$wanted\") ";
@mysql_query($logit);

# How many requests in the test time period

$keeptime = 120;
$hurdle = 50;

$nn = $nowsec - $keeptime;
$q = @mysql_query("select count(tstamp) from recent ".
 "where remoteaddr = '$rip' and tstamp > $nn");
$res = @mysql_fetch_row($q);
$balloon = $res[0];

# $hurdle pages per $keeptime seconds - aggressive!

if ($balloon > $hurdle) {
        @mysql_query("INSERT INTO warned (tstamp, remoteaddr, ".
  "calledfor) values (".
                $nowsec . ", ".
                "\"$rip\", ".
                "\"$wanted\") ");
        sleep(10); # Keep 'em waiting!
        # $response = file($_SERVER[DOCUMENT_ROOT]."/dos.html");
        # print (join(" ",$response));
        # exit();
        }

        $q = @mysql_query("delete from recent where tstamp < $nn");

# Keep database creation commands here so that we can
# have then to hand for when we port the software

/*
        $q = @mysql_query("create table recent ( tstamp bigint,".
                " remoteaddr text, calledfor text, ".
                " rid bigint primary key not null auto_increment)");
        $q = @mysql_query("create table warned ( tstamp bigint,".
                " remoteaddr text, calledfor text, ".
                " rid int primary key not null auto_increment)"); */

AN EXPLANATION

Our database connection is always opened when you call up a page on our web site, so the first thing we do is to add in a note of where every single request has come from. Normally, we have to take huge care in web programming (and PHP is a huge help) to avoid one request being linked with any others. On things like auction sites, and in dynamic traffic monitoring too, that's not the case so we're using a common database table.

Once the access has been logged, we check to see how many other accesses have come from the same location (IP address) in a chosen time period, and we see if a limit has been reached. We've chosen, for testing, a limit of 50 pages in 2 minutes - see below.

When our chosen threshold is reached, we log the "violation" to a separate database table and take our immediate action to deal with the heavy traffic.

Finally (in all cases), we delete records older than our threshhold from the table so that we don't end up with a monstrously growing database table that needs major work every so often.

HOW TO CHOOSE AND SET THE LIMITS.

A Tricky one! You want to catch people before they do too much harm, but be sure that you won't stop an important but slightly aggressive robot. The figures that we've chosen are slightly above the hit level we've been experiencing from the Google and MSN crawlers; both of these tend to come through in fits and starts, so that a limit of 30 hits per minute was occasionally triggering, but a lower limit (25 per minute) spread over 2 minutes seems OK.

Our visitor from Slovakia who provoked the writing of the code and article was grabbing a page every second for hours on end, and would clearly be trapped.

We also need to be aware that high traffic levels *can* be legitimate. Browsing our live web site from our own training centre, all requests appear to come from a single IP address; with a maximum class size of 7 trainees, plus three staff members, all browsing our site at the same time, the limiter would be hit if the average user called up more than one page every 24 seconds, consistently for 2 minutes.

ACTION TO TAKE WHEN LIMIT'S HIT

In our example above, we've simply put a 10 second delay into the code - a minimalist response that's intended to slow down the high traffic generator without effecting the content that they see.

Alternative code, commented out in our example, generates an alternative response page that warns the user that he's triggered our limits; "what use is that to a spider" you may ask - well - if the spidering is following links, then it somewhat trims down the number of different links to follow and provides an element of traffic control.

Other actions (not shown in the sample code) include:

= emailing the server admin when the limit is hit
  (but beware the possibility of spamming yourself)
= "blacklisting" warned IP addresses so that they
  can't quickly step their hits back up

We could also check the user agent - in other words the program that's being used to call up all the pages - and respond differently to known and welcomed crawlers that are getting a bit over-enthusiastic. It might even be a good idea to log who's checked the robots.txt file ...

BACK TO THE BEGINNING

So - what *was* the problem that triggered all this careful investigation and filtering?

Looking through our logs, it appears to be a user in a University in Slovakia who's using the Wget utility to grab a web page and everything it calls up too; I'm inclined to think that he / she found our site useful and decided to put a local copy on his / her own laptop for later use, but I may be wrong and it might be more sinister. In any case, he / she hasn't visited our robots.txt to see whether automata are welcome and where they may go, so I don't feel too bad about capping.

Did I try asking what was going on? Yes, but from web logs you can't identify the user and so I had to use a rather more blunt tool of writing to the sys admin at the site. As their web site isn't in my native tongue (and clicking on the word "English" on their from page gives a 404 error), I'm not holding out much hope that they'll even receive and understand my message.

Our immediate user will have simply found our pages to a bit slow and erratic if he continues to spider next week (I think he's off for the weekend as I write this on Sunday). If I choose to enable the alternative document that's commented out in my test code, he'll get:

<html>
<head><title>Well House Consultants -
probable aggressive spidering notice</title></head>
<body bgcolor=#FF9999 text=black>
<h1>Your IP address has requested more than 50 pages in
the two minutes</h1>
<b>We welcome spiders to index our site, but request
that they are "polite" - that they check our robots.txt
file, and that they crawl gently so as not to burn up all
our bandwidth and deny access to others.<br><br>
You have been sent this page because over 50 pages were
requested from our server within 120 seconds from the
same IP address. If you're spidering up, please adjust
your spider so that it's more gentle in its actions. This
"incident" has been logged, and we'll be taking a closer
look at our records.<br><br>
If you feel that you should not have received this
message, please email me (graham@wellho.net). Thanks!</b>
</body>
</html>


See also Using MySQL from PHP

Please note that articles in this section of our web site were current and correct to the best of our ability when published, but by the nature of our business may go out of date quite quickly. The quoting of a price, contract term or any other information in this area of our website is NOT an offer to supply now on those terms - please check back via our main web site

Related Material

Security in PHP
  [3813] Injection Attacks - PHP, SQL, HTML, Javascript - and how to neutralise them - (2012-07-22)
  [3747] An easy way to comply with the new cookie law if your site is well designed - (2012-06-02)
  [3698] How to stop forms on other sites submitting to your scripts - (2012-04-15)
  [3210] Catchable fatal error in PHP ... How to catch, and alternative solutions such as JSON - (2011-03-22)
  [2939] Protecting your images from use out of context - (2010-08-29)
  [2688] Security considerations in programming - what do we teach? - (2010-03-22)
  [2628] An example of an injection attack using Javascript - (2010-02-08)
  [2025] Injection Attack if register_globals in on - PHP - (2009-02-04)
  [1779] Injection Attacks - avoiding them in your PHP - (2008-08-31)
  [1747] Who is watching you? - (2008-08-10)
  [1694] Defensive coding techniques in PHP? - (2008-07-02)
  [1679] PHP - Sanitised application principles for security and useability - (2008-06-16)
  [1542] Are nasty programs looking for security holes on your server? - (2008-02-17)
  [1482] A story about benchmarking PHP - (2007-12-23)
  [1396] Using PHP to upload images / Store on MySQL database - security questions - (2007-10-19)
  [1387] Error logging to file not browser in PHP - (2007-10-11)
  [1323] Easy handling of errors in PHP - (2007-08-27)
  [1086] Injection attacks - safeguard your PHP scripts - (2007-02-20)
  [1052] Learning to write secure, maintainable PHP - (2007-01-25)
  [947] What is an SQL injection attack? - (2006-11-27)
  [920] A lion in a cage - PHP - (2006-11-10)
  [426] Robust checking of data entered by users - (2005-08-27)
  [345] Spotting a denial of service attack - (2005-06-12)

Web site techniques, utility and visibility
  [4239] Facebook marketing - early experiences - (2014-01-19)
  [4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
  [4115] More or less back - what happened to our server the other day - (2013-06-14)
  [4076] Web site - fully back! - (2013-04-29)
  [4001] Helping search engines with appropriate 400 error codes - (2013-02-11)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [3896] An email marathon - (2012-10-15)
  [3776] Some traps it's so easy to fall into in designing your web site - (2012-06-23)
  [3745] Legal change - You need to obtain user consent if you use cookies on your website - (2012-06-01)
  [3744] Short Web Addresses for Melksham - (2012-05-30)
  [3734] QR codes with marketing logos embedded - (2012-05-16)
  [3623] Some TestWise examples - helping use Ruby code to check your web site operation - (2012-02-24)
  [3589] Promoting a single one of your domains on the search engines - (2012-01-22)
  [3563] How big is a web page these days? Does the size of your pages matter? - (2011-12-26)
  [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
  [3532] Sharing the user experience - designing a form with the customer in mind - (2011-11-29)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3426] Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09)
  [3367] Google +1 - what is it? - (2011-07-22)
  [3197] Finding and diverting image requests from rogue domains - (2011-03-08)
  [3149] Looking back at www.wellho.net - (2011-01-28)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3022] Retaining web site visitors - reducing the one page wonders - (2010-10-31)
  [2981] How to set up short and meaningfull alternative URLs - (2010-10-02)
  [2668] Is it worth it? - (2010-03-09)
  [2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)
  [2532] Analysing Google arrivals by country of origin - (2009-12-10)
  [2519] Status Page / breaks of service in early December - (2009-11-30)
  [2410] Removal of technical resources from this site - (2009-09-19)
  [2389] Writing with our customers words - (2009-09-01)
  [2341] Koulutus, Open Source tietokone kielillä - (2009-08-09)
  [2340] ldning, Open Source dator språk - (2009-08-09)
  [2340] ldning, Open Source dator språk - (2009-08-09)
  [2339] Opplæring, Open Source datamaskinen språk - (2009-08-09)
  [2338] Uddannelse, Open Source computer sprog - (2009-08-09)
  [2337] Opleiding, Open Source computertalen - (2009-08-09)
  [2336] Formação, Open Source computador línguas - (2009-08-09)
  [2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
  [2334] Formazione, Open Source computer lingue - (2009-08-09)
  [2333] Formación, de los lenguajes de código abierto - (2009-08-09)
  [2332] Formation, des langages Open Source - (2009-08-09)
  [2225] How important is a front page ranking on a search engine? - (2009-06-09)
  [2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
  [2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
  [1982] Cooking bodies and URLs - (2009-01-08)
  [1970] Plagarism - who is copying my pages? - (2009-01-02)
  [1961] Making our things easier to find - (2008-12-26)
  [1955] How to avoid duplicating web page maintainance - (2008-12-20)
  [1888] Find the link - (2008-11-16)
  [1856] A few of my favourite things - (2008-10-26)
  [1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
  [1797] I have been working hard but I do not expect you noticed - (2008-09-14)
  [1793] Which country does a search engine think you are located in? - (2008-09-11)
  [1756] Ever had One of THOSE mornings? - (2008-08-16)
  [1747] Who is watching you? - (2008-08-10)
  [1711] Rapid growth leads to server move - (2008-07-17)
  [1653] How do Google Ads work? - (2008-05-25)
  [1634] Kiss and Book - (2008-05-07)
  [1630] To provide external links, or not? - (2008-05-04)
  [1610] PHP course dot co, dot uk - (2008-04-13)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1541] Colour, Composition or Content - (2008-02-16)
  [1534] Where in the world / country is my visitor from? - (2008-02-07)
  [1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
  [1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
  [1505] Script to present commonly used images - PHP - (2008-01-13)
  [1494] A time to update pictures - (2008-01-03)
  [1437] Above the fold with First Great Western - (2007-11-19)
  [1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [1212] What brought YOU to our web site? - (2007-06-01)
  [1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
  [1198] From Web to Web 2 - (2007-05-21)
  [1186] Two new pages / sites - (2007-05-14)
  [1184] Finding resources - some pointers - (2007-05-13)
  [1177] Sorting out for a site map - (2007-05-05)
  [1104] Drawing dynamic graphs in PHP - (2007-03-09)
  [1055] Above the fold - (2007-01-28)
  [1029] Our search engine placement is dropping. - (2007-01-11)
  [1015] Search engine placement - long term strategy and success - (2006-12-30)
  [994] Training on Cascading Style Sheets - (2006-12-17)
  [976] Santa at the station - (2006-12-09)
  [916] Driving customers away - (2006-11-07)
  [893] Visibility - (2006-10-14)
  [800] Effective web campaign? - (2006-07-12)
  [767] Finding the language preference of a web site visitor - (2006-06-18)
  [757] Horse and Python training - (2006-06-12)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [718] Protecting images from theft - (2006-05-12)
  [681] Mirroring a dynamic site - (2006-04-12)
  [658] Keeping the visitors happy and browsing - (2006-03-26)
  [649] Denial of Service ''attack'' - (2006-03-17)
  [533] Bigger Box Campaign - (2005-12-18)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [510] Dynamic Web presence - next generation web site - (2005-11-29)
  [492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
  [414] Form Madness - (2005-08-14)
  [376] What brings people to my web site? - (2005-07-13)
  [369] CMS - the minefield of Choices - (2005-07-05)
  [348] Graveyard pages - (2005-06-15)
  [347] Frightening and from-friend viruses and spams - (2005-06-14)
  [322] More maps - (2005-05-23)
  [320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
  [314] What language is this written in? - (2005-05-17)
  [311] Growth pains - (2005-05-14)
  [288] Colour blindness for web developers - (2005-04-22)
  [284] The Iconish language - (2005-04-19)
  [278] Cover all the options - (2005-04-13)
  [276] An apology to Mr Boneparte - (2005-04-11)
  [274] Our most popular resources - (2005-04-10)
  [268] Information request forms, cleaning up spam - (2005-04-05)
  [261] Putting a form online - (2005-03-29)
  [259] Responding to spam - (2005-03-27)
  [222] Who are all these visitors? - (2005-02-20)
  [202] Searching for numbers - (2005-02-04)
  [197] Allow for peak traffic on your web site - (2005-02-01)
  [182] Your personal Google ranking - (2005-01-19)
  [179] The hunt for unique words - (2005-01-16)
  [173] Data Mining - (2005-01-09)
  [165] Implementing an effective site search engine - (2005-01-01)
  [142] Colour for access - (2004-12-06)
  [117] A case of case - (2004-11-14)
  [109] URLs - a service and not a hurdle - (2004-11-04)
  [98] No more 'Error 404' pages. Something better. - (2004-10-24)
  [32] Web design platoon - (2004-08-29)
  [23] Skills and responsibilities - (2004-08-22)

resource index - PHP
Solutions centre home page

You'll find shorter technical items at The Horse's Mouth and delegate's questions answered at the Opentalk forum.

At Well House Consultants, we provide training courses on subjects such as Ruby, Lua, Perl, Python, Linux, C, C++, Tcl/Tk, Tomcat, PHP and MySQL. We're asked (and answer) many questions, and answers to those which are of general interest are published in this area of our site.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/solutions/php-spot ... tacks.html • PAGE BUILT: Wed Mar 28 07:47:11 2012 • BUILD SYSTEM: wizard