I can recall a colleague of mine (OK, if you're reading this Peter, yes you were the boss
) using the term "Open Kimono" to describe certain approaches at certain times, and (truth be told) I wasn't sure if there was something a little naughty in the connotations that the term conjured up. Yet the term came back to me this morning when I was wondering whether to post up some recent experiences / comments from the growth curve we have been seeing in resource usage on our web server. But I think I'm OK to use the term ... the My Open Kimono Blog
uses it, for example.
Why move the web site?
Here goes. We moved our main domain to a dedicated server six months ago. Traffic levels were such that our daily Apache httpd logs on a shared server in the USA were around 15 Mbytes each, and we had concerns at the lag time taken for traffic from the UK (our primary market) to make the round trip. We were also concerned that search engines were seeing us, with a ".net" top level domain, as being located and trading in the country in which our server was located rather than in the UK. And we had some security concerns with regard to the peaky load that others were putting on the server, and the possibility of PHP injection attacks into our scripts by others on the machine (or, rather, due to loopholes left by others sharing the system - see here
The first problem, and a warning sign
Anyway ... a few of the teething troubles that were only to be expected as we learned out way into the new server, and the web site was transferred and live in a quite remarkably short time. But then it died in the middle of one night. And that technical story is told here
. Finding an issue like this is rather like looking for a needle in a lot of hay - not even in a single haystack, as the potential issues are many any varied, and there can be just the one trigger.
But there was a serious latent issue. How could a single script's running - even if it caused 20 seconds of cpu time to be burned up, cause an ongoing problem, as it appeared to have done when it ran that night?
The Current Issue
Traffic has now risen; from a 15 Mbyte daily log file in July, traffic has risen in less that 6 months to peak at nearly 50 Mbytes per day ... and we have seen other occasions when the server's queue length as reported by uptime - usually between 0.2 and 0.8 - has swept majestically upwards to 40, 50 or more and has stuck there. A temporary cure has proven to be easy enough - just a stop of the httpd and mysql daemons, then a restart and the whole thing has started purring along sweetly until the next time.
So have httpd and / or MySQL been stuck in some sort of loop?
No - I don't think so. I think we have simply filled up the server's memory and it's been running on the backup of 'swap space', with more processes / threads of httpd and MySQL than can fit in the real memory fighting for that space, and with the disk 'thrashing' about. And more requests will be joining the queue, now quicker than completed ones are being peeled off. In other words, it's a self perpetuating problem which, once it has started to occur, is likely to get progressively worse. Unlike a bus queue where you can see
you've got a wait ... pop off and get a coffee and come back a bit later ... you have no such option on a web server ...
... and in effect it's made worse by the driver of each and every bus having to stop and re-organise the queue on every trip, thus cutting down the capacity for the queue to be handled at the very time it's most needed!
What evidence do I have that it's pure load rather than one particular script? Well - the problem was triggering just after 6 a.m. in the morning, on some mornings - and that's the time an extra load (a server backup) gets added on to the job queue - actually several jobs, including a database dump and some tars. Each runs perfectly well manually, at a quiet time, but if the server happens to be busy they'll start popping it in to an unrecoverable overdrive.
And then the problem triggered, it seemed, at around lunchtime and again between 4:30 and 6:00 in the afternoon - the busiest times on our server, with the UK and European traffic heavy just after noon, and then the UK traffic still very busy at the time the USA traffic was picking up too towards then end of the afternoon / early evening. And Saturdays and Sundays, when our servers are notably quieter, it ran sweetly (this gave me false hope as I tried to fix the issues at the weekend!)
There's a technical article here
in which I show a top report comparing our server when well behaved and when thrashing.
More buses, more efficient buses, and taking measures to turn the very occasional person away when the queue is starting to get to the "needs marshalling" stage. We can also make sure that everyone in the queue really wants to travel!
How do those work in web server terms?
• More buses.
For the moment, let's put that one on the back burner. We could cross the palms of our WSP with more silver each month, but there's little point in purchasing something that's not needed.
• More efficient buses.
If we can get the buses to run trips more quickly, the same number of buses will handle more customers and will stop the queue bursting. There are quite significant elements of PHP in most of our pages, and quite a bit of MySQL too - indeed, most of our images are fed from a database.
I have reduced bookkeeping operations in our scripts so that they're run not on every page, but only randomly on around one page in five. A few excess records in "what happened in the last quarter hour" really don't matter.
Various other smaller actions.
And the big one - I have added an index based on the URL to our 15,000 page stats database that we use to provide the relative importance map for Google, and the Google-like search results on our resources pages. It's probably significant that some of the problems only started to occur at around the time that these extra databases started to be collected! [detail]
• Fast track service desks
I have taken about a dozen images which are served several times each served very frequently indeed and moved them to plain files, rather than serving them via PHP and MySQL.
• Limiting the queue
I don't want to turn people away - in fact I HATE doing it - but a very few dropped connections from time to time is far, far better that having the whole queue come to a screaming halt until the server's heartbeat is missed on our monitoring machine which screams for the administrator.
I have tuned our queues ... and there is a technical article that I've added to the site here
that tells you about how I've done that.
• Restricting to really wanted travellers
You may recall articles about libwww
earlier on this blog. This sort of traffic, generated by automata, is very peaky and (in the case of the examples quoted) totally unwanted .. the articles linked just above tell you how I have turned away a great deal of that traffic at the front door, and how I have ensured that much of the rest of it is "fast track"ed as above.
• Cutting out needless journeys
Do you like this picture of Charlie, our cat, who's in the habit of coming up to say "good morning" to me when I'm checking my email, and to ask for a stroke and breakfast? It's a nice picture ... perhaps you will come back to this page again in a couple of minutes for another look? Well ... please keep the original copy and look at it again!
as there is little point in me giving you exactly the same information, or doing exactly the same work, time after time. The web server can add cache and store headers onto pages (and if you're using PHP to serve images, this is a real "must") and you can also use facilities like memcached
to save repeated expensive server calculation operations.
Here's part of our PHP script which manages our image database - the part where it tells the browser to keep the information it's been given for up to an hour, and not to keep asking for it. This is very important for images like your logo which will appear on every page!
# Send out image
You will also have seen me talking about adding restrictions into our robots.txt file to avoid needless crawling of pages that really shouldn't be indexed, or where our scripts generate URL loops that can trip the spiders. See here
for some past experiences, and there's a sample copy of our file here
. I have added a few 'loop killers' since I wrote that example.
Have you ever seen a nice picture on someone else's web site and added a link to it on yours? It's called hot linking and if you link to an image on an obscure site from a very popular one, you can have a detrimental and sudden effect on that site. There are occasions where our web site suddenly gets hundreds or thousands of hits from our of the blue - and really it's theft of bandwidth and probably of images. We are monitoring / watching such images - you can use my monitor tool here
and see what's a popular steal at the moment. And you can read about past comments I have made and technical ways to discourage the habit here
Finally, you can cut out some excess traffic by telling people that pages are broken. You may recall past articles (possibly no longer around even here) showing how you can divert erroneous URL requests to your site search and return a good page. Fabulously useful technique for real visitors, but it's almost designed to set the search engines off in a feeding frenzy if they get a bad URL - especially if you suggest other searches. Take care with scripts like this ... and ensure that your automata users are sent "404" responses, while being much more helpful to the customer who has just guessed at a UTR by serving him with useful guidance and content.
I don't think I've reached the end of the story yet. Traffic will go on increasing and - at best - we've currently got something of a lid on it; occasional queues which will potentially get longer. Yes, I know there's a recession or depression on - but it's not depressed or recessed our traffic (perhaps people have more free time and spend more time browsing, quite apart from the fact that this is a rather good site!). So keep reading The Horse's Mouth
and you'll see the story continue to unfold.
If you have found this article useful
, please remember that we can help you with issues like this in relation to your own servers. We offer Linux / Unix Web Server courses
and also a variety of PHP training
and a MySQL course too
. But in addition / as a starter, please feel free to ask! A day of help of advise may pay for itself a hundred times over - even if I can't come up with a complete solution, I can certainly give pointers and help look at your own, individual case. The easiest way to contact me is via this form
and I'll be back to you within 24 hours. (written 2009-02-26)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articlesG902 - Well House Consultants - Web site techniques, utility and visibility 
Skills and responsibilities - (2004-08-22) 
Web design platoon - (2004-08-29) 
No more 'Error 404' pages. Something better. - (2004-10-24) 
URLs - a service and not a hurdle - (2004-11-04) 
A case of case - (2004-11-14) 
Colour for access - (2004-12-06) 
Implementing an effective site search engine - (2005-01-01) 
Data Mining - (2005-01-09) 
The hunt for unique words - (2005-01-16) 
Your personal Google ranking - (2005-01-19) 
Allow for peak traffic on your web site - (2005-02-01) 
Searching for numbers - (2005-02-04) 
Who are all these visitors? - (2005-02-20) 
Responding to spam - (2005-03-27) 
Putting a form online - (2005-03-29) 
Information request forms, cleaning up spam - (2005-04-05) 
Our most popular resources - (2005-04-10) 
An apology to Mr Boneparte - (2005-04-11) 
Cover all the options - (2005-04-13) 
The Iconish language - (2005-04-19) 
Colour blindness for web developers - (2005-04-22) 
Growth pains - (2005-05-14) 
What language is this written in? - (2005-05-17) 
Ordnance Survey - using a 'Get a map' - (2005-05-22) 
More maps - (2005-05-23) 
Frightening and from-friend viruses and spams - (2005-06-14) 
Graveyard pages - (2005-06-15) 
CMS - the minefield of Choices - (2005-07-05) 
What brings people to my web site? - (2005-07-13) 
Form Madness - (2005-08-14) 
New Navigation Aid - Launch of My Wellho - (2005-11-11) 
Dynamic Web presence - next generation web site - (2005-11-29) 
Getting favicon to work - avoiding common pitfalls - (2005-12-14) 
Bigger Box Campaign - (2005-12-18) 
Denial of Service ''attack'' - (2006-03-17) 
Keeping the visitors happy and browsing - (2006-03-26) 
Mirroring a dynamic site - (2006-04-12) 
Protecting images from theft - (2006-05-12) 
Where is a web site visitor browsing from - (2006-05-24) 
Horse and Python training - (2006-06-12) 
Finding the language preference of a web site visitor - (2006-06-18) 
Effective web campaign? - (2006-07-12) 
Visibility - (2006-10-14) 
Driving customers away - (2006-11-07) 
Santa at the station - (2006-12-09) 
Training on Cascading Style Sheets - (2006-12-17) 
Search engine placement - long term strategy and success - (2006-12-30) 
Our search engine placement is dropping. - (2007-01-11) 
Above the fold - (2007-01-28) 
Drawing dynamic graphs in PHP - (2007-03-09) 
Sorting out for a site map - (2007-05-05) 
Finding resources - some pointers - (2007-05-13) 
Two new pages / sites - (2007-05-14) 
From Web to Web 2 - (2007-05-21) 
Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27) 
What brought YOU to our web site? - (2007-06-01) 
What proportion of our web traffic is robots? - (2007-06-19) 
Stuffing content into a web page - easy maintainance - (2007-08-09) 
Above the fold with First Great Western - (2007-11-19) 
A time to update pictures - (2008-01-03) 
Script to present commonly used images - PHP - (2008-01-13) 
Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14) 
Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20) 
Where in the world / country is my visitor from? - (2008-02-07) 
Colour, Composition or Content - (2008-02-16) 
Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24) 
PHP course dot co, dot uk - (2008-04-13) 
To provide external links, or not? - (2008-05-04) 
Kiss and Book - (2008-05-07) 
How do Google Ads work? - (2008-05-25) 
Rapid growth leads to server move - (2008-07-17) 
Who is watching you? - (2008-08-10) 
Ever had One of THOSE mornings? - (2008-08-16) 
Which country does a search engine think you are located in? - (2008-09-11) 
I have been working hard but I do not expect you noticed - (2008-09-14) 
Web Bloopers - good form design - avoiding pitfalls - (2008-10-11) 
A few of my favourite things - (2008-10-26) 
Find the link - (2008-11-16) 
How to avoid duplicating web page maintainance - (2008-12-20) 
Making our things easier to find - (2008-12-26) 
Plagarism - who is copying my pages? - (2009-01-02) 
Cooking bodies and URLs - (2009-01-08) 
Static mirroring through HTTrack, wget and others - (2009-03-03) 
How important is a front page ranking on a search engine? - (2009-06-09) 
Formation, des langages Open Source - (2009-08-09) 
Formaci[c2]ón, de los lenguajes de c[c2]ódigo abierto - (2009-08-09) 
Formazione, Open Source computer lingue - (2009-08-09) 
Ausbildung, die Open-Source-Sprachen - (2009-08-09) 
Forma[c2]ç[c2]ão, Open Source computador l[c2]ínguas - (2009-08-09) 
Opleiding, Open Source computertalen - (2009-08-09) 
Uddannelse, Open Source computer sprog - (2009-08-09) 
Oppl[c2]æring, Open Source datamaskinen spr[c2]åk - (2009-08-09) 
ldning, Open Source dator spr[c2]åk - (2009-08-09) 
Koulutus, Open Source tietokone kielill[c2]ä - (2009-08-09) 
Writing with our customers words - (2009-09-01) 
Removal of technical resources from this site - (2009-09-19) 
Status Page / breaks of service in early December - (2009-11-30) 
Analysing Google arrivals by country of origin - (2009-12-10) 
Web site traffic - real users, or just noise? - (2009-12-26) 
How to run a successful online poll / petition / survey / consultation - (2010-01-10) 
Is it worth it? - (2010-03-09) 
How to set up short and meaningfull alternative URLs - (2010-10-02) 
Retaining web site visitors - reducing the one page wonders - (2010-10-31) 
Making the most of critical emails - reading behind the scene - (2010-12-16) 
Looking back at www.wellho.net - (2011-01-28) 
Finding and diverting image requests from rogue domains - (2011-03-08) 
Google +1 - what is it? - (2011-07-22) 
Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09) 
Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21) 
Sharing the user experience - designing a form with the customer in mind - (2011-11-29) 
Learning more about our web site - and learning how to learn about yours - (2011-12-17) 
How big is a web page these days? Does the size of your pages matter? - (2011-12-26) 
Promoting a single one of your domains on the search engines - (2012-01-22) 
Some TestWise examples - helping use Ruby code to check your web site operation - (2012-02-24) 
QR codes with marketing logos embedded - (2012-05-16) 
Short Web Addresses for Melksham - (2012-05-30) 
Some traps it's so easy to fall into in designing your web site - (2012-06-23) 
An email marathon - (2012-10-15) 
TV show appearance - how does it effect your web site? - (2013-01-13) 
Helping search engines with appropriate 400 error codes - (2013-02-11) 
Web site - fully back! - (2013-04-29) 
More or less back - what happened to our server the other day - (2013-06-14) 
How do I post automatically from a PHP script to my Twitter account? - (2013-07-10) 
Facebook marketing - early experiences - (2014-01-19) 
Well House Consultants, Well House Manor, First Great Western Coffee shop, TransWilts / 2014 web site reports - (2015-01-01) 
Selecting RECENT and POPULAR news and trends for your web site users - (2015-01-19) 
Effect on external factors on traffic to our web sites - an update - (2015-04-26) 
Almost so wrong, but perhaps it's right for some? - (2015-05-11)
Some other Articles
Database connection Pooling, SSL, and command line deployment - httpd and TomcatSharing the load between servers - httpd and TomcatInvoker and cgi servlets on Tomcat 6Train and Coach fares from London (and airports) to MelkshamWeb Site Loading - experiences and some solutions sharedEffect on server when memory runs out and swapping startsTuning httpd / the supermarket checkout comparisonWhat a difference a MySQL Index madeHow was my web site compromised?A Presentation about our company - web and PHP