Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
For 2023 - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Unique visitors / visited pages per month
Dictionaries example from a Well House Consultants training course
More on Dictionaries [link]

This example is described in the following article(s):
   • Learning more about our web site - and learning how to learn about yours - [link]

Source code: pax_month.py Module: Y107
# Unique visitors / visited pages per month

import re

# Regular expression to split a web page name from parameters
url = re.compile(r'(/.*?)(?:\?|%3[fF])')

# Regular expression to spot secondary files
image = re.compile(r'\.(jpg|png|ico|gif|css|js|csp)')

statuscounter = {}
pagecounter = {}
hostcounter = {}

totalpages = 0
sumdailyhosts = 0
loglines = 0

# For each day of the month (November is 1 to 30)

for dom in range(1,31):

        # Reset daily counter each day

        dailyhostcounter = {}

        # Open the log file and parse it
        # Note - these log files are NOT in the course profile!

        fname = "logs/ac_201611%02d" % dom
        fh = open(fname,"r")
        for line in fh:

                fields = line.split()
                loglines += 1

                # Eliminate lines with 'broken' formats
                # This only accounts for a handful of lines each day.

                status = fields[8]
                if len(status) != 3 or status == "GET":
                        statuscounter["n/a"] = statuscounter.get("n/a",0) + 1

                # Count statuses and unique hosts overall

                statuscounter[status] = statuscounter.get(status,0) + 1
                hostcounter[fields[0]] = hostcounter.get(fields[0],0) + 1
                dailyhostcounter[fields[0]] = dailyhostcounter.get(fields[0],0) + 1

                # Only analyse page by page for primary URLs such as .html and .php
                # Do not analyse style sheets, images, icon file, Javascript, etc

                if image.findall(fields[6]): continue

                totalpages += 1

                # Remove GET paramets if present from URL

                stuff = url.findall(fields[6])
                if stuff:
                        page = stuff[0]
                        page = fields[6]

                # Count by page

                pagecounter[page] = pagecounter.get(page,0) + 1

        # Daily log for running program ... on a busy site we could be
        # analysing huge amounts of data

        dailyhosts = len(dailyhostcounter.keys())
        sumdailyhosts += dailyhosts
        print("{} {} {} {}".format("completed",fname,dailyhosts,"visitors"))

# Sort pages based on number of accesses

pagename = list(pagecounter.keys()) # excptions make it work for both Python 2 and 3
        pagename.sort(lambda y,x: pagecounter[y] -pagecounter[x])
        pagename.sort(key = lambda x: -pagecounter[x])

# Output individual pages first as they're the longest report

for page in pagename:
        print("{} {}".format(pagecounter[page],page))

# Report on HTTP statuses

statuslist = list(statuscounter.keys())
for status in statuslist:
        percent = 100.0 * statuscounter[status] / loglines
        print("code {0:3s} - count {1:8d} - {2:8.2f}%".format(status,statuscounter[status],percent))

# And final summary

visirots = len(hostcounter.keys())

summary = "{0:35s} {1:8d}"
print(summary.format("Sum of distinct hosts each day -",sumdailyhosts))
print(summary.format("Number of distinct visiting hosts -",visirots))
print(summary.format("Total URLs requested -",loglines))
print(summary.format("Total web pages requested -",totalpages))

""" Sample output

omanWithCat:y107 grahamellis$ python3 pax_month.py
completed logs/ac_20161101 9869 visitors
completed logs/ac_20161102 9477 visitors
completed logs/ac_20161103 9927 visitors
completed logs/ac_20161104 9845 visitors
completed logs/ac_20161105 9182 visitors
completed logs/ac_20161106 7532 visitors
completed logs/ac_20161107 8598 visitors
completed logs/ac_20161108 10210 visitors
completed logs/ac_20161109 10034 visitors
completed logs/ac_20161110 9275 visitors
completed logs/ac_20161111 9729 visitors
completed logs/ac_20161112 8801 visitors
completed logs/ac_20161113 7542 visitors
completed logs/ac_20161114 8001 visitors
completed logs/ac_20161115 9952 visitors
completed logs/ac_20161116 10065 visitors
completed logs/ac_20161117 10082 visitors
completed logs/ac_20161118 10084 visitors
completed logs/ac_20161119 9460 visitors
completed logs/ac_20161120 7678 visitors
completed logs/ac_20161121 8186 visitors
completed logs/ac_20161122 10197 visitors
completed logs/ac_20161123 9946 visitors
completed logs/ac_20161124 9774 visitors
completed logs/ac_20161125 9146 visitors
completed logs/ac_20161126 8551 visitors
completed logs/ac_20161127 7603 visitors
completed logs/ac_20161128 8150 visitors
completed logs/ac_20161129 9508 visitors
completed logs/ac_20161130 9876 visitors
1364901 /coffeeshop/index.php
519244 *
321087 /twapp/info.php
241494 /qr/img.php
153462 /resources/ex.php4
93126 /
78647 /resources/ex.php
65849 /net/alaska.php
[huge snip]
1 /mouth/2921_Doe
1 /info/9759_Bradford_on_Avon_rail_station_petition_in_final_phase.html
1 /info/136_Threat_to_service_at_Bedwyn.html
1 /overview/cao.html
1 /mouth/space-username-thinking.html
1 /resources/modules.html%20%20/Z501.html
1 /webstat/
1 /info/11032_Curious_workings_between_Bristol_amp_Bath.html
1 /dxyylc/md5.asp
1 /resources/A104.html
1 /forum/the-tcl-programming-language/expect-nc-smtp-server.html
1 /forum/programming-in-python-and-ruby/sound-control.html
1 /short/t00145
1 /forum//The-Tcl-
code "-" - count 59673 - 1.00%
code 200 - count 5559678 - 93.16%
code 206 - count 16737 - 0.28%
code 301 - count 18949 - 0.32%
code 302 - count 76638 - 1.28%
code 304 - count 123064 - 2.06%
code 315 - count 10 - 0.00%
code 400 - count 621 - 0.01%
code 403 - count 3020 - 0.05%
code 404 - count 108816 - 1.82%
code 405 - count 29 - 0.00%
code 408 - count 29 - 0.00%
code 410 - count 304 - 0.01%
code 417 - count 1 - 0.00%
code 500 - count 62 - 0.00%
code and - count 1 - 0.00%
code n/a - count 151 - 0.00%
Sum of distinct hosts each day - 276280
Number of distinct visiting hosts - 165369
Total URLs requested - 5967783
Total web pages requested - 4195511
WomanWithCat:y107 grahamellis$


Learn about this subject
This module and example are covered on the following public courses:
 * Learning to program in Python
 * Python Programming
Also available on on site courses for larger groups

Books covering this topic
Yes. We have over 700 books in our library. Books covering Python are listed here and when you've selected a relevant book we'll link you on to Amazon to order.

Other Examples
This example comes from our "Dictionaries" training module. You'll find a description of the topic and some other closely related examples on the "Dictionaries" module index page.

Full description of the source code
You can learn more about this example on the training courses listed on this page, on which you'll be given a full set of training notes.

Many other training modules are available for download (for limited use) from our download centre under an Open Training Notes License.

Other resources
• Our Solutions centre provides a number of longer technical articles.
• Our Opentalk forum archive provides a question and answer centre.
The Horse's mouth provides a daily tip or thought.
• Further resources are available via the resources centre.
• All of these resources can be searched through through our search engine
• And there's a global index here.

Purpose of this website
This is a sample program, class demonstration or answer from a training course. It's main purpose is to provide an after-course service to customers who have attended our public private or on site courses, but the examples are made generally available under conditions described below.

Web site author
This web site is written and maintained by Well House Consultants.

Conditions of use
Past attendees on our training courses are welcome to use individual examples in the course of their programming, but must check the examples they use to ensure that they are suitable for their job. Remember that some of our examples show you how not to do things - check in your notes. Well House Consultants take no responsibility for the suitability of these example programs to customer's needs.

This program is copyright Well House Consultants Ltd. You are forbidden from using it for running your own training courses without our prior written permission. See our page on courseware provision for more details.

Any of our images within this code may NOT be reused on a public URL without our prior permission. For Bona Fide personal use, we will often grant you permission provided that you provide a link back. Commercial use on a website will incur a license fee for each image used - details on request.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2023: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/resources/ex.php • PAGE BUILT: Sun Oct 11 14:50:09 2020 • BUILD SYSTEM: JelliaJamb