Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Learning more about our web site - and learning how to learn about yours

There are quite a number of tools out there which will give you statistics about your web site - and quite a lot of people who will tell you various statistics about yours and theirs. But there's "Lies, Damned lies and statistics" according to Benjamin Disraeli. How do you really understand your traffic and site? I think you should look at it in lots of different directions, understand how the figures are reached, make incremental changes to your methodology to explore the feel of the site in more detail, and crosscompare multiple sites and multiple time periods.

We keep (Apache httpd) log files on our servers, look at them with certain tools on a regular basis and try out other things (and some new things) from time to time.

Here are some statistics from a demonstration program I wrote yesterday, and from an example written on a previous course but re-run with the same set of log files - which are from our main server for November 2011, and total over 800 Mbytes of input data.

Statistics and diagrams

  completed ac_20111104 8825 visitors
  completed ac_20111105 5900 visitors
  completed ac_20111106 6436 visitors
  completed ac_20111107 9562 visitors
  completed ac_20111108 10192 visitors
  completed ac_20111109 10114 visitors
  completed ac_20111110 9871 visitors
  completed ac_20111111 8862 visitors
  completed ac_20111112 6181 visitors
  completed ac_20111113 7002 visitors


  code 200 - count  3037207 -    95.36%
  code 206 - count     3459 -     0.11%
  code 226 - count        4 -     0.00%
  code 301 - count     3118 -     0.10%
  code 302 - count     2538 -     0.08%
  code 304 - count    30282 -     0.95%
  code 400 - count      347 -     0.01%
  code 403 - count    10750 -     0.34%
  code 404 - count    96625 -     3.03%
  code 405 - count       26 -     0.00%
  code 408 - count       65 -     0.00%
  code 416 - count        1 -     0.00%
  code 500 - count      259 -     0.01%
  code n/a - count      188 -     0.01%


  Sum of distinct hosts each day -      269544
  Number of distinct visiting hosts -   182849
  Total URLs requested -               3184869
  Total web pages requested -          2138893


The above statistics are from yesterday's program - source code [here].










The above diagrams are from a Python program that uses numpy and matplotlib from a prior private advanced Python Course, rerun on the same data that was used for the statistical tables. Source of that program [here]

Methodology

a) Analysis of log filer. Both of my programs have read through each of the daily log files line by line, and extracted required data from each line. Part of the analysis for the statistical program differentiates between "primary URLs" - the sort of thing you would type into a browser - and "Secondary URLs" - things like images, icons, style sheets and JavaScript which typically aren't fresh page requests from a visitor, but are called up from within other requests. We have very little ajax traffic, and very few pages indeed with Frames to there was no need on my sample demonstration program to make allowances for the skew which they would add.

b) Elimination of parameters. Many of our pages can have parameters supplied via the "GET" method, and we have used regular expressions to trim those values off the end of the URLs when we came to count accesses to different pages. As a separate exercise, analysis of these strings could be very useful indeed.

c) Graphics. The images are all showing the number of URL hits (primary and secondary) within an hour period, joined to form a contour plot / heat map. A more technically accurate display would be a block diagram - a 3D historgram, as the data isn't really "sloping" in the way shown. Never the less, the displays are very effective in highlighting the way traffic increases and decreases during the day. Even on a site with traffic as high as ours, spikes can occur and there's a certain randomness. The third diagram is intended to help demonstrate undelying trends, but care should be taken in reading any significance into the figures. The maximum figure shown (7000) is certainly not the maximum number of requests made in an hour (9000)

d) Not sum of daily. One of the big myths ... is that 1000 unique visitors a day means 30,000 unique visitors a month. It doesn't; visitors come back to mamy web sites day after day and for an average of 1,000 unique visitors per day, you would hope that the "Unique visitors per month" figure was well below 30,000!

e) Broken lines. Our anaysis shows a few "n/a" status codes. The log file format that's used by httpd needs a bit more reverse engineering than I've used to get every line 100% right - but with no more than 7 lines in 100,000 having problems on a simplified algorithm, I've chosen to go with that.

Conclusions

1. Weekly Cycle. This is fantastic news for us. Look how the traffic during the week (Monday to Friday) is hovering around 10,000 unique daily visitors, but that's down to 6,000 to 7,000 at the weekend. Friday's a lower figure (POETS day - Piss off Early; Tomorrow's Saturday) helps confirm work / business customer use. And the lower figure on Friday, with Sunday higher than Saturday too, possibly reflecting Muslim counties with a Friday / Saturday weekend, or possibly reflecting UK habits of going out on Saturday and doing hobby things including computing on Sundays.

2. Daily cycle. (From the graphics only). A very interesting demonstration of peak traffic during the UK working day, with a surprisingly early start (perhaps because India is about 5 hours ahead of the UK), and a busy evening (we also get considerable traffic from the USA as other analyses have shown).

3. Repeat Visitors. There were 183,000 unique visitors in the month. But there were 270,000 visitors if you add up the number of unique visitors each day. So that means 87,000 return visits. Bear in mind that I visit every day - so that's 29 repeats - it's NOT 87,000 different returning individuals, but it's still an interesting statistic!

4. Images / Avatars / FGW. Here's an interesting piece of background. Our domain / server also hosts some images (and my avatar) used on the First Great Western Coffee Shop, and that's a busy site and active forum. This will account for some of the difference between the 2 million pages and the 3 million URL requests. Further analysis called for, I think.

5. 403 / 404 / 500 comments 19 out of 20 accesses to the server returned a good page and response - code 200. Many other return values (206, 301, 302, 304) are perfectly acceptable in moderation. But what about the other codes? Common wisdom has it that you don't want any 400 or 500 series errors, but to some extend I disagree. There's nothing wrong in sending a search engine crawler a "404" page not found if a page has been withdrawn and not replaced, for example. The particular server that we've analysed for this report goes further, intentionally returning code 403, 404 and 500 to requests which are testing the security of our site / looking for holes - we're saying "Go away - that's not here", "You cant have that" and "broken" where appropriate to these nastys - in a (perhaps vain) hope that they'll stop knocking on the door.

6. Staying power. Each visiting host made 17 requests. There's a lot more analysis possible here. Yet, interestingly, on our site we consider that a single page hit is often a success - someone lands from a search engine on a page that answers their question. Job done. Also marketing done - our name's out there and they may well remember how helpful we are in the future when they need a course.

7. Monetarise. An interesting suggestion has been made - that we should cash in / make money from our very heavy traffic - advertising, click-thru, agent sales, charging for use, building up a saleable email address database are all possible. We're very careful about venturing down these paths - we monetarise via course and hotel room sales at present, and I suspect that majority of users of our page don't want to be added to lists from which they're barraged with emails. That is OUR. We may build more agency sales at some point, though.

8. Much more! Which pages? Parts of world? I have only just started to scratch the surface.

(written 2011-12-17)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
Y118 - Python - numpy, scipy and matplotlib
  [2990] What are numpy and scipy? - (2010-10-09)
  [2991] Loading and saving data - Python / numpy - (2010-10-09)
  [2992] Matplotlib - graphing in Python - teaching examples - (2010-10-10)
  [2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
  [2997] 3D graphics - web site usage - simple matplotlib and python example - (2010-10-12)
  [4440] A first graph with Matplotlib in Python - (2015-02-22)
  [4445] Graphing presentations in Python - huge data, numpy and matplotlib - (2015-02-28)

Y107 - Python - Dictionaries
  [103] Can't resist writing about Python - (2004-10-29)
  [955] Python collections - mutable and imutable - (2006-11-29)
  [1144] Python dictionary for quick look ups - (2007-04-12)
  [1145] Using a list of keys and a list of values to make a dictionary in Python - zip - (2007-04-13)
  [2368] Python - fresh examples of all the fundamentals - (2009-08-20)
  [2915] Looking up a value by key - associative arrays / Hashes / Dictionaries - (2010-08-11)
  [2986] Python dictionaries - reaching to new uses - (2010-10-05)
  [2994] Python - some common questions answered in code examples - (2010-10-10)
  [3464] Passing optional and named parameters to python methods - (2011-10-04)
  [3488] Python sets and frozensets - what are they? - (2011-10-20)
  [3555] Football league tables - under old and new point system. Python program. - (2011-12-18)
  [3662] Finding all the unique lines in a file, using Python or Perl - (2012-03-20)
  [3934] Multiple identical keys in a Python dict - yes, you can! - (2012-11-24)
  [4027] Collections in Python - list tuple dict and string. - (2013-03-04)
  [4029] Exception, Lambda, Generator, Slice, Dict - examples in one Python program - (2013-03-04)
  [4409] Setting up and using a dict in Python - simple first example - (2015-01-30)
  [4469] Sorting in Python 3 - and how it differs from Python 2 sorting - (2015-04-20)
  [4661] Unique word locator - Python dict example - (2016-03-06)
  [4668] Sorting a dict in Python - (2016-04-01)

G902 - Well House Consultants - Web site techniques, utility and visibility
  [23] Skills and responsibilities - (2004-08-22)
  [32] Web design platoon - (2004-08-29)
  [98] No more 'Error 404' pages. Something better. - (2004-10-24)
  [109] URLs - a service and not a hurdle - (2004-11-04)
  [117] A case of case - (2004-11-14)
  [142] Colour for access - (2004-12-06)
  [165] Implementing an effective site search engine - (2005-01-01)
  [173] Data Mining - (2005-01-09)
  [179] The hunt for unique words - (2005-01-16)
  [182] Your personal Google ranking - (2005-01-19)
  [197] Allow for peak traffic on your web site - (2005-02-01)
  [202] Searching for numbers - (2005-02-04)
  [222] Who are all these visitors? - (2005-02-20)
  [259] Responding to spam - (2005-03-27)
  [261] Putting a form online - (2005-03-29)
  [268] Information request forms, cleaning up spam - (2005-04-05)
  [274] Our most popular resources - (2005-04-10)
  [276] An apology to Mr Boneparte - (2005-04-11)
  [278] Cover all the options - (2005-04-13)
  [284] The Iconish language - (2005-04-19)
  [288] Colour blindness for web developers - (2005-04-22)
  [311] Growth pains - (2005-05-14)
  [314] What language is this written in? - (2005-05-17)
  [320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
  [322] More maps - (2005-05-23)
  [347] Frightening and from-friend viruses and spams - (2005-06-14)
  [348] Graveyard pages - (2005-06-15)
  [369] CMS - the minefield of Choices - (2005-07-05)
  [376] What brings people to my web site? - (2005-07-13)
  [414] Form Madness - (2005-08-14)
  [492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
  [510] Dynamic Web presence - next generation web site - (2005-11-29)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [533] Bigger Box Campaign - (2005-12-18)
  [649] Denial of Service ''attack'' - (2006-03-17)
  [658] Keeping the visitors happy and browsing - (2006-03-26)
  [681] Mirroring a dynamic site - (2006-04-12)
  [718] Protecting images from theft - (2006-05-12)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [757] Horse and Python training - (2006-06-12)
  [767] Finding the language preference of a web site visitor - (2006-06-18)
  [800] Effective web campaign? - (2006-07-12)
  [893] Visibility - (2006-10-14)
  [916] Driving customers away - (2006-11-07)
  [976] Santa at the station - (2006-12-09)
  [994] Training on Cascading Style Sheets - (2006-12-17)
  [1015] Search engine placement - long term strategy and success - (2006-12-30)
  [1029] Our search engine placement is dropping. - (2007-01-11)
  [1055] Above the fold - (2007-01-28)
  [1104] Drawing dynamic graphs in PHP - (2007-03-09)
  [1177] Sorting out for a site map - (2007-05-05)
  [1184] Finding resources - some pointers - (2007-05-13)
  [1186] Two new pages / sites - (2007-05-14)
  [1198] From Web to Web 2 - (2007-05-21)
  [1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
  [1212] What brought YOU to our web site? - (2007-06-01)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
  [1437] Above the fold with First Great Western - (2007-11-19)
  [1494] A time to update pictures - (2008-01-03)
  [1505] Script to present commonly used images - PHP - (2008-01-13)
  [1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
  [1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
  [1534] Where in the world / country is my visitor from? - (2008-02-07)
  [1541] Colour, Composition or Content - (2008-02-16)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1610] PHP course dot co, dot uk - (2008-04-13)
  [1630] To provide external links, or not? - (2008-05-04)
  [1634] Kiss and Book - (2008-05-07)
  [1653] How do Google Ads work? - (2008-05-25)
  [1711] Rapid growth leads to server move - (2008-07-17)
  [1747] Who is watching you? - (2008-08-10)
  [1756] Ever had One of THOSE mornings? - (2008-08-16)
  [1793] Which country does a search engine think you are located in? - (2008-09-11)
  [1797] I have been working hard but I do not expect you noticed - (2008-09-14)
  [1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
  [1856] A few of my favourite things - (2008-10-26)
  [1888] Find the link - (2008-11-16)
  [1955] How to avoid duplicating web page maintainance - (2008-12-20)
  [1961] Making our things easier to find - (2008-12-26)
  [1970] Plagarism - who is copying my pages? - (2009-01-02)
  [1982] Cooking bodies and URLs - (2009-01-08)
  [2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
  [2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
  [2225] How important is a front page ranking on a search engine? - (2009-06-09)
  [2332] Formation, des langages Open Source - (2009-08-09)
  [2333] Formaci[83][c2]ón, de los lenguajes de c[83][c2]ódigo abierto - (2009-08-09)
  [2334] Formazione, Open Source computer lingue - (2009-08-09)
  [2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
  [2336] Forma[83][c2]ç[83][c2]ão, Open Source computador l[83][c2]ínguas - (2009-08-09)
  [2337] Opleiding, Open Source computertalen - (2009-08-09)
  [2338] Uddannelse, Open Source computer sprog - (2009-08-09)
  [2339] Oppl[83][c2]æring, Open Source datamaskinen spr[83][c2]åk - (2009-08-09)
  [2340] ldning, Open Source dator spr[83][c2]åk - (2009-08-09)
  [2341] Koulutus, Open Source tietokone kielill[83][c2]ä - (2009-08-09)
  [2389] Writing with our customers words - (2009-09-01)
  [2410] Removal of technical resources from this site - (2009-09-19)
  [2519] Status Page / breaks of service in early December - (2009-11-30)
  [2532] Analysing Google arrivals by country of origin - (2009-12-10)
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)
  [2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
  [2668] Is it worth it? - (2010-03-09)
  [2981] How to set up short and meaningfull alternative URLs - (2010-10-02)
  [3022] Retaining web site visitors - reducing the one page wonders - (2010-10-31)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3149] Looking back at www.wellho.net - (2011-01-28)
  [3197] Finding and diverting image requests from rogue domains - (2011-03-08)
  [3367] Google +1 - what is it? - (2011-07-22)
  [3426] Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3532] Sharing the user experience - designing a form with the customer in mind - (2011-11-29)
  [3563] How big is a web page these days? Does the size of your pages matter? - (2011-12-26)
  [3589] Promoting a single one of your domains on the search engines - (2012-01-22)
  [3623] Some TestWise examples - helping use Ruby code to check your web site operation - (2012-02-24)
  [3734] QR codes with marketing logos embedded - (2012-05-16)
  [3744] Short Web Addresses for Melksham - (2012-05-30)
  [3745] Legal change - You need to obtain user consent if you use cookies on your website - (2012-06-01)
  [3776] Some traps it's so easy to fall into in designing your web site - (2012-06-23)
  [3896] An email marathon - (2012-10-15)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [4001] Helping search engines with appropriate 400 error codes - (2013-02-11)
  [4076] Web site - fully back! - (2013-04-29)
  [4115] More or less back - what happened to our server the other day - (2013-06-14)
  [4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
  [4239] Facebook marketing - early experiences - (2014-01-19)
  [4376] Well House Consultants, Well House Manor, First Great Western Coffee shop, TransWilts / 2014 web site reports - (2015-01-01)
  [4401] Selecting RECENT and POPULAR news and trends for your web site users - (2015-01-19)
  [4474] Effect on external factors on traffic to our web sites - an update - (2015-04-26)
  [4492] Almost so wrong, but perhaps it's right for some? - (2015-05-11)

A606 - Web Application Deployment - Apache httpd - log files and log tools
  [1503] Web page (http) error status 405 - (2008-01-12)
  [1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
  [1656] Be careful of misreading server statistics - (2008-05-28)
  [1761] Logging Cookies with the Apache httpd web server - (2008-08-20)
  [1780] Server overloading - turns out to be feof in PHP - (2008-09-01)
  [1796] libwww-perl and Indy Library in your server logs? - (2008-09-13)
  [3015] Logging the performance of the Apache httpd web server - (2010-10-25)
  [3019] Apache httpd Server Status - monitoring your server - (2010-10-28)
  [3027] Server logs - drawing a graph of gathered data - (2010-11-03)
  [3443] Getting more log information from the Apache http web server - (2011-09-16)
  [3447] Needle in a haystack - finding the web server overload - (2011-09-18)
  [3670] Reading Google Analytics results, based on the relative populations of countries - (2012-03-24)
  [3984] 20 minutes in to our 15 minutes of fame - (2013-01-20)
  [4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
  [4404] Which (virtual) host was visited? Tuning Apache log files, and Python analysis - (2015-01-23)
  [4491] Web Server Admin - some of those things that happen, and solutions - (2015-05-10)


Back to
Changes to morning routines
Previous and next
or
Horse's mouth home
Forward to
Football league tables - under old and new point system. Python program.
Some other Articles
Python or Lua - which should I use / learn?
Melksham Christmas Lights - Town, Shops and Private Houses
Aeryn at 1
Learning more about our web site - and learning how to learn about yours
Changes to morning routines
Melksham Training Centre and Hotel internet speed - how does it compare?
Some terms used in programming (Biased towards Python)
Provide a useable train service, and people will use it!
Well House Manor - perhaps the best hotel rooms in Melksham
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3554_Lea ... yours.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb