Training, Open Source Programming Languages

This is page http://www.wellho.net/mouth/3554_Lea ... yours.html

Our email: info@wellho.net • Phone: 01144 1225 708225

Learning more about our web site - and learning how to learn about yours

There are quite a number of tools out there which will give you statistics about your web site - and quite a lot of people who will tell you various statistics about yours and theirs. But there's "Lies, Damned lies and statistics" according to Benjamin Disraeli. How do you really understand your traffic and site? I think you should look at it in lots of different directions, understand how the figures are reached, make incremental changes to your methodology to explore the feel of the site in more detail, and crosscompare multiple sites and multiple time periods.

We keep (Apache httpd) log files on our servers, look at them with certain tools on a regular basis and try out other things (and some new things) from time to time.

Here are some statistics from a demonstration program I wrote yesterday, and from an example written on a previous course but re-run with the same set of log files - which are from our main server for November 2011, and total over 800 Mbytes of input data.

Statistics and diagrams

  completed ac_20111104 8825 visitors
  completed ac_20111105 5900 visitors
  completed ac_20111106 6436 visitors
  completed ac_20111107 9562 visitors
  completed ac_20111108 10192 visitors
  completed ac_20111109 10114 visitors
  completed ac_20111110 9871 visitors
  completed ac_20111111 8862 visitors
  completed ac_20111112 6181 visitors
  completed ac_20111113 7002 visitors


  code 200 - count  3037207 -    95.36%
  code 206 - count     3459 -     0.11%
  code 226 - count        4 -     0.00%
  code 301 - count     3118 -     0.10%
  code 302 - count     2538 -     0.08%
  code 304 - count    30282 -     0.95%
  code 400 - count      347 -     0.01%
  code 403 - count    10750 -     0.34%
  code 404 - count    96625 -     3.03%
  code 405 - count       26 -     0.00%
  code 408 - count       65 -     0.00%
  code 416 - count        1 -     0.00%
  code 500 - count      259 -     0.01%
  code n/a - count      188 -     0.01%


  Sum of distinct hosts each day -      269544
  Number of distinct visiting hosts -   182849
  Total URLs requested -               3184869
  Total web pages requested -          2138893


The above statistics are from yesterday's program - source code [here].

Contour plot of web site usage


Website usage - raw


Website usage - smoothed


The above diagrams are from a Python program that uses numpy and matplotlib from a prior private advanced Python Course, rerun on the same data that was used for the statistical tables. Source of that program [here]

Methodology

a) Analysis of log filer. Both of my programs have read through each of the daily log files line by line, and extracted required data from each line. Part of the analysis for the statistical program differentiates between "primary URLs" - the sort of thing you would type into a browser - and "Secondary URLs" - things like images, icons, style sheets and JavaScript which typically aren't fresh page requests from a visitor, but are called up from within other requests. We have very little ajax traffic, and very few pages indeed with Frames to there was no need on my sample demonstration program to make allowances for the skew which they would add.

b) Elimination of parameters. Many of our pages can have parameters supplied via the "GET" method, and we have used regular expressions to trim those values off the end of the URLs when we came to count accesses to different pages. As a separate exercise, analysis of these strings could be very useful indeed.

c) Graphics. The images are all showing the number of URL hits (primary and secondary) within an hour period, joined to form a contour plot / heat map. A more technically accurate display would be a block diagram - a 3D historgram, as the data isn't really "sloping" in the way shown. Never the less, the displays are very effective in highlighting the way traffic increases and decreases during the day. Even on a site with traffic as high as ours, spikes can occur and there's a certain randomness. The third diagram is intended to help demonstrate undelying trends, but care should be taken in reading any significance into the figures. The maximum figure shown (7000) is certainly not the maximum number of requests made in an hour (9000)

d) Not sum of daily. One of the big myths ... is that 1000 unique visitors a day means 30,000 unique visitors a month. It doesn't; visitors come back to mamy web sites day after day and for an average of 1,000 unique visitors per day, you would hope that the "Unique visitors per month" figure was well below 30,000!

e) Broken lines. Our anaysis shows a few "n/a" status codes. The log file format that's used by httpd needs a bit more reverse engineering than I've used to get every line 100% right - but with no more than 7 lines in 100,000 having problems on a simplified algorithm, I've chosen to go with that.

Conclusions

1. Weekly Cycle. This is fantastic news for us. Look how the traffic during the week (Monday to Friday) is hovering around 10,000 unique daily visitors, but that's down to 6,000 to 7,000 at the weekend. Friday's a lower figure (POETS day - Piss off Early; Tomorrow's Saturday) helps confirm work / business customer use. And the lower figure on Friday, with Sunday higher than Saturday too, possibly reflecting Muslim counties with a Friday / Saturday weekend, or possibly reflecting UK habits of going out on Saturday and doing hobby things including computing on Sundays.

2. Daily cycle. (From the graphics only). A very interesting demonstration of peak traffic during the UK working day, with a surprisingly early start (perhaps because India is about 5 hours ahead of the UK), and a busy evening (we also get considerable traffic from the USA as other analyses have shown).

3. Repeat Visitors. There were 183,000 unique visitors in the month. But there were 270,000 visitors if you add up the number of unique visitors each day. So that means 87,000 return visits. Bear in mind that I visit every day - so that's 29 repeats - it's NOT 87,000 different returning individuals, but it's still an interesting statistic!

4. Images / Avatars / FGW. Here's an interesting piece of background. Our domain / server also hosts some images (and my avatar) used on the First Great Western Coffee Shop, and that's a busy site and active forum. This will account for some of the difference between the 2 million pages and the 3 million URL requests. Further analysis called for, I think.

5. 403 / 404 / 500 comments 19 out of 20 accesses to the server returned a good page and response - code 200. Many other return values (206, 301, 302, 304) are perfectly acceptable in moderation. But what about the other codes? Common wisdom has it that you don't want any 400 or 500 series errors, but to some extend I disagree. There's nothing wrong in sending a search engine crawler a "404" page not found if a page has been withdrawn and not replaced, for example. The particular server that we've analysed for this report goes further, intentionally returning code 403, 404 and 500 to requests which are testing the security of our site / looking for holes - we're saying "Go away - that's not here", "You cant have that" and "broken" where appropriate to these nastys - in a (perhaps vain) hope that they'll stop knocking on the door.

6. Staying power. Each visiting host made 17 requests. There's a lot more analysis possible here. Yet, interestingly, on our site we consider that a single page hit is often a success - someone lands from a search engine on a page that answers their question. Job done. Also marketing done - our name's out there and they may well remember how helpful we are in the future when they need a course.

7. Monetarise. An interesting suggestion has been made - that we should cash in / make money from our very heavy traffic - advertising, click-thru, agent sales, charging for use, building up a saleable email address database are all possible. We're very careful about venturing down these paths - we monetarise via course and hotel room sales at present, and I suspect that majority of users of our page don't want to be added to lists from which they're barraged with emails. That is OUR. We may build more agency sales at some point, though.

8. Much more! Which pages? Parts of world? I have only just started to scratch the surface.

(written 2011-12-17)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
A606 - Web Application Deployment - Apache httpd - log files and log tools
  [4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
  [3984] 20 minutes in to our 15 minutes of fame - (2013-01-20)
  [3974] TV show appearance - how does it effect your web site? - (2013-01-13)
  [3670] Reading Google Analytics results, based on the relative populations of countries - (2012-03-24)
  [3491] Who is knocking at your web site door? Are you well set up to deal with allcomers? - (2011-10-21)
  [3447] Needle in a haystack - finding the web server overload - (2011-09-18)
  [3443] Getting more log information from the Apache http web server - (2011-09-16)
  [3087] Making the most of critical emails - reading behind the scene - (2010-12-16)
  [3027] Server logs - drawing a graph of gathered data - (2010-11-03)
  [3019] Apache httpd Server Status - monitoring your server - (2010-10-28)
  [3015] Logging the performance of the Apache httpd web server - (2010-10-25)
  [1796] libwww-perl and Indy Library in your server logs? - (2008-09-13)
  [1780] Server overloading - turns out to be feof in PHP - (2008-09-01)
  [1761] Logging Cookies with the Apache httpd web server - (2008-08-20)
  [1656] Be careful of misreading server statistics - (2008-05-28)
  [1598] Every link has two ends - fixing 404s at the recipient - (2008-04-02)
  [1503] Web page (http) error status 405 - (2008-01-12)
  [1237] What proportion of our web traffic is robots? - (2007-06-19)
  [376] What brings people to my web site? - (2005-07-13)

G902 - Well House Consultants - Web site techniques, utility and visibility
  [4239] Facebook marketing - early experiences - (2014-01-19)
  [4136] How do I post automatically from a PHP script to my Twitter account? - (2013-07-10)
  [4115] More or less back - what happened to our server the other day - (2013-06-14)
  [4076] Web site - fully back! - (2013-04-29)
  [4001] Helping search engines with appropriate 400 error codes - (2013-02-11)
  [3896] An email marathon - (2012-10-15)
  [3776] Some traps it's so easy to fall into in designing your web site - (2012-06-23)
  [3745] Legal change - You need to obtain user consent if you use cookies on your website - (2012-06-01)
  [3744] Short Web Addresses for Melksham - (2012-05-30)
  [3734] QR codes with marketing logos embedded - (2012-05-16)
  [3623] Some TestWise examples - helping use Ruby code to check your web site operation - (2012-02-24)
  [3589] Promoting a single one of your domains on the search engines - (2012-01-22)
  [3563] How big is a web page these days? Does the size of your pages matter? - (2011-12-26)
  [3532] Sharing the user experience - designing a form with the customer in mind - (2011-11-29)
  [3426] Automed web site testing scripted in Ruby using watir-webdriver - (2011-09-09)
  [3367] Google +1 - what is it? - (2011-07-22)
  [3197] Finding and diverting image requests from rogue domains - (2011-03-08)
  [3149] Looking back at www.wellho.net - (2011-01-28)
  [3022] Retaining web site visitors - reducing the one page wonders - (2010-10-31)
  [2981] How to set up short and meaningfull alternative URLs - (2010-10-02)
  [2668] Is it worth it? - (2010-03-09)
  [2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
  [2552] Web site traffic - real users, or just noise? - (2009-12-26)
  [2532] Analysing Google arrivals by country of origin - (2009-12-10)
  [2519] Status Page / breaks of service in early December - (2009-11-30)
  [2410] Removal of technical resources from this site - (2009-09-19)
  [2389] Writing with our customers words - (2009-09-01)
  [2341] Koulutus, Open Source tietokone kielillä - (2009-08-09)
  [2340] ldning, Open Source dator språk - (2009-08-09)
  [2339] Opplæring, Open Source datamaskinen språk - (2009-08-09)
  [2338] Uddannelse, Open Source computer sprog - (2009-08-09)
  [2337] Opleiding, Open Source computertalen - (2009-08-09)
  [2336] Formação, Open Source computador línguas - (2009-08-09)
  [2335] Ausbildung, die Open-Source-Sprachen - (2009-08-09)
  [2334] Formazione, Open Source computer lingue - (2009-08-09)
  [2333] Formación, de los lenguajes de código abierto - (2009-08-09)
  [2332] Formation, des langages Open Source - (2009-08-09)
  [2225] How important is a front page ranking on a search engine? - (2009-06-09)
  [2065] Static mirroring through HTTrack, wget and others - (2009-03-03)
  [2056] Web Site Loading - experiences and some solutions shared - (2009-02-26)
  [1982] Cooking bodies and URLs - (2009-01-08)
  [1970] Plagarism - who is copying my pages? - (2009-01-02)
  [1961] Making our things easier to find - (2008-12-26)
  [1955] How to avoid duplicating web page maintainance - (2008-12-20)
  [1888] Find the link - (2008-11-16)
  [1856] A few of my favourite things - (2008-10-26)
  [1833] Web Bloopers - good form design - avoiding pitfalls - (2008-10-11)
  [1797] I have been working hard but I do not expect you noticed - (2008-09-14)
  [1793] Which country does a search engine think you are located in? - (2008-09-11)
  [1756] Ever had One of THOSE mornings? - (2008-08-16)
  [1747] Who is watching you? - (2008-08-10)
  [1711] Rapid growth leads to server move - (2008-07-17)
  [1653] How do Google Ads work? - (2008-05-25)
  [1634] Kiss and Book - (2008-05-07)
  [1630] To provide external links, or not? - (2008-05-04)
  [1610] PHP course dot co, dot uk - (2008-04-13)
  [1554] Online hotel reservations - Melksham, Wiltshire (near Bath) - (2008-02-24)
  [1541] Colour, Composition or Content - (2008-02-16)
  [1534] Where in the world / country is my visitor from? - (2008-02-07)
  [1513] Perl, PHP or Python? No - Perl AND PHP AND Python! - (2008-01-20)
  [1506] Ongoing Image Copyright Issues, PHP and MySQL solutions - (2008-01-14)
  [1505] Script to present commonly used images - PHP - (2008-01-13)
  [1494] A time to update pictures - (2008-01-03)
  [1437] Above the fold with First Great Western - (2007-11-19)
  [1297] Stuffing content into a web page - easy maintainance - (2007-08-09)
  [1212] What brought YOU to our web site? - (2007-06-01)
  [1207] Simple but effective use of mod_rewrite (Apache httpd) - (2007-05-27)
  [1198] From Web to Web 2 - (2007-05-21)
  [1186] Two new pages / sites - (2007-05-14)
  [1184] Finding resources - some pointers - (2007-05-13)
  [1177] Sorting out for a site map - (2007-05-05)
  [1104] Drawing dynamic graphs in PHP - (2007-03-09)
  [1055] Above the fold - (2007-01-28)
  [1029] Our search engine placement is dropping. - (2007-01-11)
  [1015] Search engine placement - long term strategy and success - (2006-12-30)
  [994] Training on Cascading Style Sheets - (2006-12-17)
  [976] Santa at the station - (2006-12-09)
  [916] Driving customers away - (2006-11-07)
  [893] Visibility - (2006-10-14)
  [800] Effective web campaign? - (2006-07-12)
  [767] Finding the language preference of a web site visitor - (2006-06-18)
  [757] Horse and Python training - (2006-06-12)
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [718] Protecting images from theft - (2006-05-12)
  [681] Mirroring a dynamic site - (2006-04-12)
  [658] Keeping the visitors happy and browsing - (2006-03-26)
  [649] Denial of Service ''attack'' - (2006-03-17)
  [533] Bigger Box Campaign - (2005-12-18)
  [528] Getting favicon to work - avoiding common pitfalls - (2005-12-14)
  [510] Dynamic Web presence - next generation web site - (2005-11-29)
  [492] New Navigation Aid - Launch of My Wellho - (2005-11-11)
  [414] Form Madness - (2005-08-14)
  [369] CMS - the minefield of Choices - (2005-07-05)
  [348] Graveyard pages - (2005-06-15)
  [347] Frightening and from-friend viruses and spams - (2005-06-14)
  [322] More maps - (2005-05-23)
  [320] Ordnance Survey - using a 'Get a map' - (2005-05-22)
  [314] What language is this written in? - (2005-05-17)
  [311] Growth pains - (2005-05-14)
  [288] Colour blindness for web developers - (2005-04-22)
  [284] The Iconish language - (2005-04-19)
  [278] Cover all the options - (2005-04-13)
  [276] An apology to Mr Boneparte - (2005-04-11)
  [274] Our most popular resources - (2005-04-10)
  [268] Information request forms, cleaning up spam - (2005-04-05)
  [261] Putting a form online - (2005-03-29)
  [259] Responding to spam - (2005-03-27)
  [222] Who are all these visitors? - (2005-02-20)
  [202] Searching for numbers - (2005-02-04)
  [197] Allow for peak traffic on your web site - (2005-02-01)
  [182] Your personal Google ranking - (2005-01-19)
  [179] The hunt for unique words - (2005-01-16)
  [173] Data Mining - (2005-01-09)
  [165] Implementing an effective site search engine - (2005-01-01)
  [142] Colour for access - (2004-12-06)
  [117] A case of case - (2004-11-14)
  [109] URLs - a service and not a hurdle - (2004-11-04)
  [98] No more 'Error 404' pages. Something better. - (2004-10-24)
  [32] Web design platoon - (2004-08-29)
  [23] Skills and responsibilities - (2004-08-22)

Y107 - Python - Dictionaries
  [4029] Exception, Lambda, Generator, Slice, Dict - examples in one Python program - (2013-03-04)
  [4027] Collections in Python - list tuple dict and string. - (2013-03-04)
  [3934] Multiple identical keys in a Python dict - yes, you can! - (2012-11-24)
  [3662] Finding all the unique lines in a file, using Python or Perl - (2012-03-20)
  [3555] Football league tables - under old and new point system. Python program. - (2011-12-18)
  [3488] Python sets and frozensets - what are they? - (2011-10-20)
  [3464] Passing optional and named parameters to python methods - (2011-10-04)
  [2994] Python - some common questions answered in code examples - (2010-10-10)
  [2986] Python dictionaries - reaching to new uses - (2010-10-05)
  [2915] Looking up a value by key - associative arrays / Hashes / Dictionaries - (2010-08-11)
  [2368] Python - fresh examples of all the fundamentals - (2009-08-20)
  [1145] Using a list of keys and a list of values to make a dictionary in Python - zip - (2007-04-13)
  [1144] Python dictionary for quick look ups - (2007-04-12)
  [955] Python collections - mutable and imutable - (2006-11-29)
  [103] Can't resist writing about Python - (2004-10-29)

Y118 - Python - numpy, scipy and matplotlib
  [2997] 3D graphics - web site usage - simple matplotlib and python example - (2010-10-12)
  [2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
  [2992] Matplotlib - graphing in Python - teaching examples - (2010-10-10)
  [2991] Loading and saving data - Python / numpy - (2010-10-09)
  [2990] What are numpy and scipy? - (2010-10-09)


Back to
Changes to morning routines
Previous and next
or
Horse's mouth home
Forward to
Football league tables - under old and new point system. Python program.
Some other Articles
Python or Lua - which should I use / learn?
Melksham Christmas Lights - Town, Shops and Private Houses
Aeryn at 1
Learning more about our web site - and learning how to learn about yours
Changes to morning routines
Melksham Training Centre and Hotel internet speed - how does it compare?
Some terms used in programming (Biased towards Python)
Provide a useable train service, and people will use it!
Well House Manor - perhaps the best hotel rooms in Melksham
4350 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3554_Lea ... yours.html • PAGE BUILT: Thu Sep 18 15:30:25 2014 • BUILD SYSTEM: WomanWithCat