We've had 45000 page requests in the last week from the
University of Illinois - 17000 of which were within a period of a few hours yesterday. Not bad going? Had we been recommended to the whole University as a site worth seeing? Alas, no; all the requests were coming from a single host computer, which is quite remarkable seeing that we've not got anywhere like 17000 pages on our site, and it's also quite remarkable as those extra hits came after I had been in email contact with the University who had promised to desist.

uiuc is a quarter of our daily traffic
and that was just in a couple of hours
The university has apologised to me, and I owe
an apology in turn to people who have had trouble with our site in the last few days - we've been working to counter this traffic, which was causing a denial of service to our regular users. It's been quite an interesting couple of days!
So - what happened?
An Overview of automated browsing
The web was designed for human browsers - visitors who pull up a page and then come back seconds or minutes later for another page. But the protocol used is a straightforward one, so it's very easy to write automata (robots and crawlers) that go methodically through a large number of pages at electronic speed.
Automata such as this come in various flavours::
* Search engines such as Google, Yahoo, MSN and others which the site owner encourages for his own purposed
* Other engines such as the
turnitin "bot" where a commercial outfit reaps all the content of a site for their own (or their customer's) purposes - in this case to sell universities and anti-plagiarism service.
* Utilities which are intended to gather a series of pages for an individual so that the individual can pre-load a few pages over a slow line for more efficient browsing
* Automata written or run with the purpose of causing disruption or expense to the web site owner.
Good practise for automated browsers
Technically, it is very easy indeed to run an automata - one you wrote yourself, or one that's out there already. But that doesn't mean it's good practise to do so, or that you'll be welcomed if you do. Automata should:
* check a file called robots.txt in the domain's home directory from time to time to see if they're welcome, and respect what it says.
* declare themselves to be automata (and which one they are) via the user agent string that is usually sent in all requests. This should include a URL in case the site they're visiting wants to know what they're up to
* respect the bandwidth and resources of the sites they visit, and the needs of other users to those sites - in other words, not visit a lot of pages in quick succession, nor call up lots of pages in parallel.
Alas, malicious automata (they're in a tiny minority, thank goodness) don't respect these rules. And another minority - not quite so tiny - don't fully understand these rules and their effect on sites they visit. And so, these days, web site owners need to consider defences and safety nets.
If more that 100 pages are requested in 300 seconds (I think those are the figures; we change them from time to time) on our web site, from a single location, we start to get worried and our web server provides a delayed response - it sleeps the request for a few seconds to throttle back the visitor and to give others a chance to have a look in.
So what happened in this case?
That's worked well in the past, except it seems that this latest attack (which evidence tells me was research code written without sufficient knowledge or thought to the effect it would have, rather than being malicious) was made from a cluster of parallel processes ... so that if one was put into a delay, others simply jumped on as well and we had so many concurrent visitors to the site that the queues couldn't cope. Rather like an ambassador being sent along to discuss something in diplomatic terms, and when he's kept waiting for his turn, the troops being sent in behind. Sorry,
University of Illinois, this host is banned. Bully-boy tactics such as these are not acceptable, especially from a centre of excellence in learning which should know better (written 2006-03-17, updated 2008-05-04)
Commentator | says ... | John Moylan: | Ouch!
So what did you do to counteract this? (comment added 2006-03-23 20:46:30) |
Graham Ellis: | We've simply denied GET requests to our web server from the system(s) concerned - deny by IP address in our .htaccess file. A bit crude but it had to be a quick fix and although I had a promise that they would desist, they were still attacking the following day.
(comment added 2006-03-24 07:14:39) |
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
W512 - Web and Intranet - Site Design Aspects [229] A fortunate accident - (2005-02-27)
[261] Putting a form online - (2005-03-29)
[288] Colour blindness for web developers - (2005-04-22)
[319] Accommodation and landing pages - (2005-05-21)
[345] Spotting a denial of service attack - (2005-06-12)
[352] Improved mining techniques! - (2005-06-19)
[391] One mans pleasure is another mans poison - (2005-07-26)
[510] Dynamic Web presence - next generation web site - (2005-11-29)
[556] Colour doesn't have to mean colourful - (2006-01-06)
[718] Protecting images from theft - (2006-05-12)
[795] Remember a site's non-technical issues too - (2006-07-07)
[823] An excellent use for a visitor count? - (2006-08-05)
[859] Put the answer in context - it may be printed - (2006-09-08)
[918] Databases needn't be frightening, hard or expensive - (2006-11-08)
[1015] Search engine placement - long term strategy and success - (2006-12-30)
[1047] Maintainable code - some positive advice - (2007-01-21)
[1054] UK legal requirements for your commercial web site - (2007-01-27)
[1353] Mood shots - (2007-09-16)
[1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
[2214] Global Index to help you find resources - (2009-06-01)
[3517] Tags used in writing this blog - (2011-11-12)
[3563] How big is a web page these days? Does the size of your pages matter? - (2011-12-26)
[3589] Promoting a single one of your domains on the search engines - (2012-01-22)
G902 - Well House Consultants - Web site techniques, utility and visibility [23] Skills and responsibilities - (2004-08-22)
[32] Web design platoon - (2004-08-29)
[98] No more 'Error 404' pages. Something better. - (2004-10-24)
[109] URLs - a service and not a hurdle - (2004-11-04)
[117] A case of case - (2004-11-14)
[142] Colour for access - (2004-12-06)
[165] Implementing an effective site search engine - (2005-01-01)
[173] Data Mining - (2005-01-09)
[179] The hunt for unique words - (2005-01-16)
[182] Your personal Google ranking - (2005-01-19)
[197] Allow for peak traffic on your web site - (2005-02-01)
[202] Searching for numbers - (2005-02-04)
[222] Who are all these visitors? - (2005-02-20)
[259] Responding to spam - (2005-03-27)
[268] Information request forms, cleaning up spam - (2005-04-05)
[274] Our most popular resources - (2005-04-10)
[276] An apology to Mr Boneparte - (2005-04-11)
[278] Cover all the options - (2005-04-13)
[284] The Iconish language - (2005-04-19)
[311] Growth pains - (2005-05-14)
[314] What language is this written in? - (2005-05-17)
[320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
[322] More maps - (2005-05-23)
[347] Frightening and from-friend viruses and spams - (2005-06-14)
[348] Graveyard pages - (2005-06-15)
[369] CMS - the minefield of Choices - (2005-07-05)
[376] What brings people to my web site? - (2005-07-13)
[414] Form Madness - (2005-08-14)
[492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
[528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
[533] Bigger Box Campaign - (2005-12-18)
[658] Keeping the visitors happy and browsing - (2006-03-26)
[681] Mirroring a dynamic site - (2006-04-12)
[732] Where is a web site visitor browsing from - (2006-05-24)
[757] Horse and Python training - (2006-06-12)
[767] Finding the language preference of a web site visitor - (2006-06-18)
[800] Effective web campaign? - (2006-07-12)
[893] Visibility - (2006-10-14)
[916] Driving customers away - (2006-11-07)
[976] Santa at the station - (2006-12-09)
[994] Training on Cascading Style Sheets - (2006-12-17)
[1029] Our search engine placement is dropping. - (2007-01-11)
[1055] Above the fold - (2007-01-28)
[1104] Drawing dynamic graphs in PHP - (2007-03-09)
[1177] Sorting out for a site map - (2007-05-05)
[1184] Finding resources - some pointers - (2007-05-13)
[1186] Two new pages / sites - (2007-05-14)
[1198] From Web to Web 2 - (2007-05-21)
[1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
[1212] What brought YOU to our web site? - (2007-06-01)
[1237] What proportion of our web traffic is robots? - (2007-06-19)
[1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
[1437] Above the fold with First Great Western - (2007-11-19)
[1494] A time to update pictures - (2008-01-03)
[1505] Script to present commonly used images - PHP - (2008-01-13)
[1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
[1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
[1534] Where in the world / country is my visitor from? - (2008-02-07)
[1541] Colour, Composition or Content - (2008-02-16)
[1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
[1610] PHP course dot co, dot uk - (2008-04-13)
[1630] To provide external links, or not? - (2008-05-04)
[1634] Kiss and Book - (2008-05-07)
[1653] How do Google Ads work? - (2008-05-25)
[1711] Rapid growth leads to server move - (2008-07-17)
[1747] Who is watching you? - (2008-08-10)
[1756] Ever had One of THOSE mornings? - (2008-08-16)
[1793] Which country does a search engine think you are located in? - (2008-09-11)
[1797] I have been working hard but I do not expect you noticed - (2008-09-14)
[1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
[1856] A few of my favourite things - (2008-10-26)
[1888] Find the link - (2008-11-16)
[1955] How to avoid duplicating web page maintainance - (2008-12-20)
[1961] Making our things easier to find - (2008-12-26)
[1970] Plagarism - who is copying my pages? - (2009-01-02)
[1982] Cooking bodies and URLs - (2009-01-08)
[2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
[2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
[2225] How important is a front page ranking on a search engine? - (2009-06-09)
[2332] Formation, des langages Open Source - (2009-08-09)
[2333] Formaci[83][c2]ón, de los lenguajes de c[83][c2]ódigo abierto - (2009-08-09)
[2334] Formazione, Open Source computer lingue - (2009-08-09)
[2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
[2336] Forma[83][c2]ç[83][c2]ão, Open Source computador l[83][c2]ínguas - (2009-08-09)
[2337] Opleiding, Open Source computertalen - (2009-08-09)
[2338] Uddannelse, Open Source computer sprog - (2009-08-09)
[2339] Oppl[83][c2]æring, Open Source datamaskinen spr[83][c2]åk - (2009-08-09)
[2340] ldning, Open Source dator spr[83][c2]åk - (2009-08-09)
[2341] Koulutus, Open Source tietokone kielill[83][c2]ä - (2009-08-09)
[2389] Writing with our customers words - (2009-09-01)
[2410] Removal of technical resources from this site - (2009-09-19)
[2519] Status Page / breaks of service in early December - (2009-11-30)
[2532] Analysing Google arrivals by country of origin - (2009-12-10)
[2552] Web site traffic - real users, or just noise? - (2009-12-26)
[2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
[2668] Is it worth it? - (2010-03-09)
[2981] How to set up short and meaningfull alternative URLs - (2010-10-02)
[3022] Retaining web site visitors - reducing the one page wonders - (2010-10-31)
[3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
[3149] Looking back at www.wellho.net - (2011-01-28)
[3197] Finding and diverting image requests from rogue domains - (2011-03-08)
[3367] Google +1 - what is it? - (2011-07-22)
[3426] Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09)
[3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
[3532] Sharing the user experience - designing a form with the customer in mind - (2011-11-29)
[3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
[3623] Some TestWise examples - helping use Ruby code to check your web site operation - (2012-02-24)
[3734] QR codes with marketing logos embedded - (2012-05-16)
[3744] Short Web Addresses for Melksham - (2012-05-30)
[3745] Legal change - You need to obtain user consent if you use cookies on your website - (2012-06-01)
[3776] Some traps it's so easy to fall into in designing your web site - (2012-06-23)
[3896] An email marathon - (2012-10-15)
[3974] TV show appearance - how does it effect your web site? - (2013-01-13)
[4001] Helping search engines with appropriate 400 error codes - (2013-02-11)
[4076] Web site - fully back! - (2013-04-29)
[4115] More or less back - what happened to our server the other day - (2013-06-14)
[4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
[4239] Facebook marketing - early experiences - (2014-01-19)
[4376] Well House Consultants, Well House Manor, First Great Western Coffee shop, TransWilts / 2014 web site reports - (2015-01-01)
[4401] Selecting RECENT and POPULAR news and trends for your web site users - (2015-01-19)
[4474] Effect on external factors on traffic to our web sites - an update - (2015-04-26)
[4492] Almost so wrong, but perhaps it's right for some? - (2015-05-11)
A603 - Web Application Deployment - Further httpd Configuration [466] Separating 'per instance' data from binaries and web sites - (2005-10-16)
[526] Apache httpd - serving web documents from different directories - (2005-12-12)
[550] 2006 - Making business a pleasure - (2006-01-01)
[631] Apache httpd to Tomcat - jk v proxy - (2006-03-03)
[662] An unhelpful error message from Apache httpd - (2006-03-30)
[755] Using different URLs to navigate around a single script - (2006-06-11)
[853] To list a directory under httpd on a web server, or not? - (2006-09-02)
[934] Clustering, load balancing, mod_rewrite and mod_proxy - (2006-11-21)
[1009] Passing GET parameters through Apache mod_rewrite - (2006-12-27)
[1080] httpd.conf or .htaccess? - (2007-02-14)
[1121] Sharing the load with Apache httpd and perhaps Tomcat - (2007-03-29)
[1351] Compressing web pages sent out from server. Is it worth it? - (2007-09-14)
[1355] .php or .html extension? Morally Static Pages - (2007-09-17)
[1377] Load Balancing with Apache mod_jk (httpd/Tomcat) - (2007-10-02)
[1381] Using a MySQL database to control mod_rewrite via PHP - (2007-10-06)
[1551] Which modules are loaded in my Apache httpd - (2008-02-23)
[1564] Default file (MiMe types) for Apache httpd and Apache Tomcat - (2008-03-04)
[1566] Strange behaviour of web directory requests without a trailing slash - (2008-03-06)
[1619] User and Group settings for Apache httpd web server - (2008-04-22)
[1636] What to do if the Home Page is missing - (2008-05-08)
[1707] Configuring Apache httpd - (2008-07-12)
[1762] WEB-INF (Tomcat) and .htaccess (httpd) - (2008-08-20)
[1767] mod_proxy and mod_proxy_ajp - httpd - (2008-08-22)
[1778] Pointing all the web pages in a directory at a database - (2008-08-30)
[1939] mod_proxy_ajp and mod_proxy_balancer examples - (2008-12-13)
[1954] mod_rewrite for newcomers - (2008-12-20)
[1974] Moving a directory on your web site - (2009-01-03)
[2060] Database connection Pooling, SSL, and command line deployment - httpd and Tomcat - (2009-03-01)
[2272] Monitoring and loading tools for testing Apache Tomcat - (2009-07-07)
[2478] How did I do THAT? - (2009-10-26)
[2900] Redirecting a page - silent, temporary or permanent? - (2010-08-03)
[3133] An image from a website that occasionally comes out as hyroglyphics - (2011-01-14)
[3449] Apache Internal Dummy Connection - what is it and what should I do with it? - (2011-09-19)
[3635] Parse error: parse error, unexpected T_STRING on brand new web site - why? - (2012-03-03)
[3862] Forwarding a whole domain, except for a few directories - Apache http server - (2012-09-17)
[3955] Building up from a small PHP setup to an enterprise one - (2012-12-16)
[4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
Some other Articles
Easy feed!Morning PostPlease Register with Opentalk - but just once!A person of few wordsDenial of Service ''attack''West Wilts Railway Users Group and trains to LacockChecking for MySQL errorsPHP - London course, Melksham Course, Evening courseLost CamelUsing a MySQL database from Perl