Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Spotting and stopping denial of service attacks

We had a problem when our web site access logs doubled in size in one day. It wasn't good selling on our part or a massive return to work after a public holiday; all the extra traffic came from a single location in (in our case) Slovakia.

HOW WE NOW PREVENT SUCH AGGRESSIVE BROWSING USING PHP

All of our pages use PHP and pull in a standard file of web helper routines (that's a chosen design feature that lets us make global changes very easily), so all I've had to do is to add a short section of code in there:

# Log current request

$nowsec = time();
$rip = $_SERVER[REMOTE_ADDR];
$wanted = $_SERVER[REQUEST_URI];
# Our database connection is already open ...
$logit = "INSERT INTO recent (tstamp, remoteaddr, calledfor) ".
 "values (".
        $nowsec . ", ".
        "\"$rip\", ".
        "\"$wanted\") ";
@mysql_query($logit);

# How many requests in the test time period

$keeptime = 120;
$hurdle = 50;

$nn = $nowsec - $keeptime;
$q = @mysql_query("select count(tstamp) from recent ".
 "where remoteaddr = '$rip' and tstamp > $nn");
$res = @mysql_fetch_row($q);
$balloon = $res[0];

# $hurdle pages per $keeptime seconds - aggressive!

if ($balloon > $hurdle) {
        @mysql_query("INSERT INTO warned (tstamp, remoteaddr, ".
  "calledfor) values (".
                $nowsec . ", ".
                "\"$rip\", ".
                "\"$wanted\") ");
        sleep(10); # Keep 'em waiting!
        # $response = file($_SERVER[DOCUMENT_ROOT]."/dos.html");
        # print (join(" ",$response));
        # exit();
        }

        $q = @mysql_query("delete from recent where tstamp < $nn");

# Keep database creation commands here so that we can
# have then to hand for when we port the software

/*
        $q = @mysql_query("create table recent ( tstamp bigint,".
                " remoteaddr text, calledfor text, ".
                " rid bigint primary key not null auto_increment)");
        $q = @mysql_query("create table warned ( tstamp bigint,".
                " remoteaddr text, calledfor text, ".
                " rid int primary key not null auto_increment)"); */

AN EXPLANATION

Our database connection is always opened when you call up a page on our web site, so the first thing we do is to add in a note of where every single request has come from. Normally, we have to take huge care in web programming (and PHP is a huge help) to avoid one request being linked with any others. On things like auction sites, and in dynamic traffic monitoring too, that's not the case so we're using a common database table.

Once the access has been logged, we check to see how many other accesses have come from the same location (IP address) in a chosen time period, and we see if a limit has been reached. We've chosen, for testing, a limit of 50 pages in 2 minutes - see below.

When our chosen threshold is reached, we log the "violation" to a separate database table and take our immediate action to deal with the heavy traffic.

Finally (in all cases), we delete records older than our threshhold from the table so that we don't end up with a monstrously growing database table that needs major work every so often.

HOW TO CHOOSE AND SET THE LIMITS.

A Tricky one! You want to catch people before they do too much harm, but be sure that you won't stop an important but slightly aggressive robot. The figures that we've chosen are slightly above the hit level we've been experiencing from the Google and MSN crawlers; both of these tend to come through in fits and starts, so that a limit of 30 hits per minute was occasionally triggering, but a lower limit (25 per minute) spread over 2 minutes seems OK.

Our visitor from Slovakia who provoked the writing of the code and article was grabbing a page every second for hours on end, and would clearly be trapped.

We also need to be aware that high traffic levels *can* be legitimate. Browsing our live web site from our own training centre, all requests appear to come from a single IP address; with a maximum class size of 7 trainees, plus three staff members, all browsing our site at the same time, the limiter would be hit if the average user called up more than one page every 24 seconds, consistently for 2 minutes.

ACTION TO TAKE WHEN LIMIT'S HIT

In our example above, we've simply put a 10 second delay into the code - a minimalist response that's intended to slow down the high traffic generator without effecting the content that they see.

Alternative code, commented out in our example, generates an alternative response page that warns the user that he's triggered our limits; "what use is that to a spider" you may ask - well - if the spidering is following links, then it somewhat trims down the number of different links to follow and provides an element of traffic control.

Other actions (not shown in the sample code) include:

= emailing the server admin when the limit is hit
  (but beware the possibility of spamming yourself)
= "blacklisting" warned IP addresses so that they
  can't quickly step their hits back up

We could also check the user agent - in other words the program that's being used to call up all the pages - and respond differently to known and welcomed crawlers that are getting a bit over-enthusiastic. It might even be a good idea to log who's checked the robots.txt file ...

BACK TO THE BEGINNING

So - what *was* the problem that triggered all this careful investigation and filtering?

Looking through our logs, it appears to be a user in a University in Slovakia who's using the Wget utility to grab a web page and everything it calls up too; I'm inclined to think that he / she found our site useful and decided to put a local copy on his / her own laptop for later use, but I may be wrong and it might be more sinister. In any case, he / she hasn't visited our robots.txt to see whether automata are welcome and where they may go, so I don't feel too bad about capping.

Did I try asking what was going on? Yes, but from web logs you can't identify the user and so I had to use a rather more blunt tool of writing to the sys admin at the site. As their web site isn't in my native tongue (and clicking on the word "English" on their from page gives a 404 error), I'm not holding out much hope that they'll even receive and understand my message.

Our immediate user will have simply found our pages to a bit slow and erratic if he continues to spider next week (I think he's off for the weekend as I write this on Sunday). If I choose to enable the alternative document that's commented out in my test code, he'll get:

<html>
<head><title>Well House Consultants -
probable aggressive spidering notice</title></head>
<body bgcolor=#FF9999 text=black>
<h1>Your IP address has requested more than 50 pages in
the two minutes</h1>
<b>We welcome spiders to index our site, but request
that they are "polite" - that they check our robots.txt
file, and that they crawl gently so as not to burn up all
our bandwidth and deny access to others.<br><br>
You have been sent this page because over 50 pages were
requested from our server within 120 seconds from the
same IP address. If you're spidering up, please adjust
your spider so that it's more gentle in its actions. This
"incident" has been logged, and we'll be taking a closer
look at our records.<br><br>
If you feel that you should not have received this
message, please email me (graham@wellho.net). Thanks!</b>
</body>
</html>


See also Using MySQL from PHP

Please note that articles in this section of our web site were current and correct to the best of our ability when published, but by the nature of our business may go out of date quite quickly. The quoting of a price, contract term or any other information in this area of our website is NOT an offer to supply now on those terms - please check back via our main web site

Related Material

Security in PHP
  [345] - ()
  [426] - ()
  [920] - ()
  [947] - ()
  [1052] - ()
  [1086] - ()
  [1323] - ()
  [1387] - ()
  [1396] - ()
  [1482] - ()
  [1542] - ()
  [1679] - ()
  [1694] - ()
  [1747] - ()
  [1779] - ()
  [2025] - ()
  [2628] - ()
  [2688] - ()
  [2939] - ()
  [3210] - ()
  [3698] - ()
  [3747] - ()
  [3813] - ()
  [4642] - ()

Web site techniques, utility and visibility
  [23] - ()
  [32] - ()
  [98] - ()
  [109] - ()
  [117] - ()
  [142] - ()
  [165] - ()
  [173] - ()
  [179] - ()
  [182] - ()
  [197] - ()
  [202] - ()
  [222] - ()
  [259] - ()
  [261] - ()
  [268] - ()
  [274] - ()
  [276] - ()
  [278] - ()
  [284] - ()
  [288] - ()
  [311] - ()
  [314] - ()
  [320] - ()
  [322] - ()
  [347] - ()
  [348] - ()
  [369] - ()
  [376] - ()
  [414] - ()
  [492] - ()
  [510] - ()
  [528] - ()
  [533] - ()
  [649] - ()
  [658] - ()
  [681] - ()
  [718] - ()
  [732] - ()
  [757] - ()
  [767] - ()
  [800] - ()
  [893] - ()
  [916] - ()
  [976] - ()
  [994] - ()
  [1015] - ()
  [1029] - ()
  [1055] - ()
  [1104] - ()
  [1177] - ()
  [1184] - ()
  [1186] - ()
  [1198] - ()
  [1207] - ()
  [1212] - ()
  [1237] - ()
  [1297] - ()
  [1437] - ()
  [1494] - ()
  [1505] - ()
  [1506] - ()
  [1513] - ()
  [1534] - ()
  [1541] - ()
  [1554] - ()
  [1610] - ()
  [1630] - ()
  [1634] - ()
  [1653] - ()
  [1711] - ()
  [1747] - ()
  [1756] - ()
  [1793] - ()
  [1797] - ()
  [1833] - ()
  [1856] - ()
  [1888] - ()
  [1955] - ()
  [1961] - ()
  [1970] - ()
  [1982] - ()
  [2056] - ()
  [2065] - ()
  [2225] - ()
  [2332] - ()
  [2333] - ()
  [2334] - ()
  [2335] - ()
  [2336] - ()
  [2337] - ()
  [2338] - ()
  [2339] - ()
  [2340] - ()
  [2340] - ()
  [2341] - ()
  [2389] - ()
  [2410] - ()
  [2519] - ()
  [2532] - ()
  [2552] - ()
  [2569] - ()
  [2668] - ()
  [2981] - ()
  [3022] - ()
  [3087] - ()
  [3149] - ()
  [3197] - ()
  [3367] - ()
  [3426] - ()
  [3491] - ()
  [3532] - ()
  [3554] - ()
  [3563] - ()
  [3589] - ()
  [3623] - ()
  [3734] - ()
  [3744] - ()
  [3745] - ()
  [3776] - ()
  [3896] - ()
  [3974] - ()
  [4001] - ()
  [4076] - ()
  [4115] - ()
  [4136] - ()
  [4239] - ()
  [4376] - ()
  [4401] - ()
  [4474] - ()
  [4492] - ()

resource index - PHP
Solutions centre home page

You'll find shorter technical items at The Horse's Mouth and delegate's questions answered at the Opentalk forum.

At Well House Consultants, we provide training courses on subjects such as Ruby, Lua, Perl, Python, Linux, C, C++, Tcl/Tk, Tomcat, PHP and MySQL. We're asked (and answer) many questions, and answers to those which are of general interest are published in this area of our site.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2022: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/solutions/php-spot ... tacks.html • PAGE BUILT: Wed Mar 28 07:47:11 2012 • BUILD SYSTEM: wizard