Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
Be careful of misreading server statistics

Here's a mystery for you.

Background

Over the past weekend, I was "fighting" server outages on another computer where - about once an hour - the httpd daemon appeared to be running away in some sort of hole or denial of service attack. Tricky one to find, as the temporary fix I had in place was in the form of a "heartbeat" script that killed all existing connections and freshened up the server. And when the server was busy, it was so much "treacle" that I couldn't run any Linux commands from a shell to see what was going on.

Mystery

I was aware from my heartbeat log of a total of around 20 seconds per hour during which the server was not accepting requests - that's about 0.5% of the time. Yet I had a user who was telling me that in his experience, downtime was around 10%. Wow - that's some scary figure, isn't it?

Any ideas?

Turns out to be a case of how you gather your statistics!

Solution

My heartbeat script clicks in at the start of every minute and if there's a problem it tidies up - 5 seconds. Having clicked in once, it then does a further precautionary clean the following "top" of minute, and perhaps if it's not sure that load levels are dropping as they should, the following minute. So in a bad hour, 4 outages of 5 seconds = 20 seconds.

It turns out that my user was running an automated script to check our server, again at the top of the minute. So he had syncronised his tests to our server in such a way that he always saw it during that brief clean up. Looking at his log activity later, I noticed that if he got a failure he had programmed in a second hit straight away to confirm it - so he was seeing 4/60 or 8/64 failures - that's 6.5% or 12.5% to report.

Lies, damned lies and statistics

This is a "object lesson" in being careful with statistics - at best, they're helpful and at worst they can give a totally incorrect picture. But I have to say that this example really took the biscuit!

Footnote - server issue solved. Availability now over 99.8% and the remaining outages in the last couple of days relate to me testing.
(written 2008-05-28)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
A606 - Web Application Deployment - Apache httpd - log files and log tools
  [4491] Web Server Admin - some of those things that happen, and solutions - (2015-05-10)
  [4404] Which (virtual) host was visited? Tuning Apache log files, and Python analysis - (2015-01-23)
  [4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
  [3984] 20 minutes in to our 15 minutes of fame - (2013-01-20)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [3670] Reading Google Analytics results, based on the relative populations of countries - (2012-03-24)
  [3554] Learning more about our web site - and learning how to learn about yours - (2011-12-17)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3447] Needle in a haystack - finding the web server overload - (2011-09-18)
  [3443] Getting more log information from the Apache http web server - (2011-09-16)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3027] Server logs - drawing a graph of gathered data - (2010-11-03)
  [3019] Apache httpd Server Status - monitoring your server - (2010-10-28)
  [3015] Logging the performance of the Apache httpd web server - (2010-10-25)
  [1796] libwww-perl and Indy Library in your server logs? - (2008-09-13)
  [1780] Server overloading - turns out to be feof in PHP - (2008-09-01)
  [1761] Logging Cookies with the Apache httpd web server - (2008-08-20)
  [1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
  [1503] Web page (http) error status 405 - (2008-01-12)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [376] What brings people to my web site? - (2005-07-13)


Back to
A date for your diary - 16th July 2008
Previous and next
or
Horse's mouth home
Forward to
Farewell, Newcastle to Stavanger, Haugsund and Bergen
Some other Articles
Korn shell - some nuggets
String, Integer, Array, Associative Array - ksh variables
Some useful variables and settings in the Korn Shell
Farewell, Newcastle to Stavanger, Haugsund and Bergen
Be careful of misreading server statistics
A date for your diary - 16th July 2008
The old sayings are the best (FSB)
How do Google Ads work?
Old Sarum airfield brings back fond memories
ls command - favourite options
4724 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2017: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/1656_Be- ... stics.html • PAGE BUILT: Sat Jun 11 12:16:26 2016 • BUILD SYSTEM: WomanWithCat