Here's a mystery for you.
Background
Over the past weekend, I was "fighting" server outages on another computer where - about once an hour - the httpd daemon appeared to be running away in some sort of hole or denial of service attack. Tricky one to find, as the temporary fix I had in place was in the form of a "heartbeat" script that killed all existing connections and freshened up the server. And when the server was busy, it was so much "treacle" that I couldn't run any Linux commands from a shell to see what was going on.
Mystery
I was aware from my heartbeat log of a total of around 20 seconds per hour during which the server was not accepting requests - that's about 0.5% of the time. Yet I had a user who was telling me that in his experience, downtime was around 10%. Wow - that's some scary figure, isn't it?
Any ideas?
Turns out to be a case of how you gather your statistics!
Solution
My heartbeat script clicks in at the start of every minute and if there's a problem it tidies up - 5 seconds. Having clicked in once, it then does a further precautionary clean the following "top" of minute, and perhaps if it's not sure that load levels are dropping as they should, the following minute. So in a bad hour, 4 outages of 5 seconds = 20 seconds.
It turns out that my user was running an automated script to check our server, again at the top of the minute. So he had syncronised his tests to our server in such a way that he always saw it during that brief clean up. Looking at his log activity later, I noticed that if he got a failure he had programmed in a second hit straight away to confirm it - so he was seeing 4/60 or 8/64 failures - that's 6.5% or 12.5% to report.
Lies, damned lies and statistics
This is a "object lesson" in being careful with statistics - at best, they're helpful and at worst they can give a totally incorrect picture. But I have to say that this example really took the biscuit!
Footnote - server issue solved. Availability now over 99.8% and the remaining outages in the last couple of days relate to me testing. (written 2008-05-28 06:47:30)
Associated topics are indexed under
A606 - Web Application Deployment - Apache httpd - log files and log tools [1796] libwww-perl and Indy Library in your server logs? - (2008-09-13)
[1780] Server overloading - turns out to be feof in PHP - (2008-09-01)
[1761] Logging Cookies with the Apache httpd web server - (2008-08-20)
[1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
[1503] Web page (http) error status 405 - (2008-01-12)
[1237] What proportion of our web traffic is robots? - (2007-06-19)
[376] What brings people to my web site? - (2005-07-13)
Some other Articles
Korn shell - some nuggetsString, Integer, Array, Associative Array - ksh variablesSome useful variables and settings in the Korn ShellFarewell, Newcastle to Stavanger, Haugsund and BergenBe careful of misreading server statisticsA date for your diary - 16th July 2008The old sayings are the best (FSB)How do Google Ads work?Old Sarum airfield brings back fond memoriesls command - favourite options