Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Spike solution - Data extraction and reporting
this example from a Well House Consultants training course
More on this [link]

This example is described in the following article(s):
   • Practical Extraction and Reporting - using Python and Extreme Programming - [link]

This example references the following resources:

Source code: noisesource Module: Y201
# -*- coding: utf-8 -*-

# Spike solution - Data extraction and reporting

# --------------------------------------------------------------------------
# Note - a "spike solution" is a test of concept program, without full error
# checking and perhaps a bit scruffy in how it runs and is structured. Under
# extreme programming techniques, code such as that shown below would then
# be refactored to make it robust, maintainable, and to allow the algorithms
# to be shared
# --------------------------------------------------------------------------

# This program takes a SPAM SIGNUP LOG FILE (some sample lines are included)
# and analyses them by country and (if required) by region of country in order
# to generate a spam hotspot table.

# - - - - - - - - - - - - - - - - -

# The following external URLs may provide you with further information on
# aspects of this example

# http://www.iso.org/iso/iso-3166-1_decoding_table.html
# http://www.iso.org/iso/country_names_and_code_elements
# http://www.voidspace.org.uk/python/articles/urllib2.shtml
# http://docs.python.org/library/urllib2.html

spamdata = open("spamdata.txt","r").read()

# Sample data from spamdata.txt file:
# 1 RU solomonn test@ckaf.ru Thu, 13 Oct 2011 11:43:13 +0100 Moscow
# 1 CN Moncler outlet xxrrmmssrrmm@gmail.com Thu, 13 Oct 2011 11:45:35 +0100
# 1 RU WhithReutrich bflhaqoixpcx@gmail.com Thu, 13 Oct 2011 11:47:20 +0100 Saint Petersburg

# pick up -v switch from the command line to provide extra data by city / region

import sys
verbose = "-v" in sys.argv
if not verbose: sys.stderr.write("Note - -v flag lets you report by city\n")

import re
import os.path

# Go get the country codes if not mirrored on our server yet
# (could use a test on file time to refresh cache monthly)

if not os.path.exists("iso3166.html"):
        import urllib2
        ccodes = urllib2.urlopen("http://www.iso.org/iso/country_names_and_code_elements")
        isopage = ccodes.read()
        fho = open("iso3166.html","w")

# Extract country codes from ISO database
# Beware - code will need revision if ISO page format changes!

fhi = open("iso3166.html","r")
iso_html = fhi.read()

row_re = re.compile(r"<tr>(.*?)</tr>",re.S)
column_re = re.compile(r"<td>(.*?)</td>",re.S)
country = re.compile(r"\b[A-Z]{2}\b")
lastword = re.compile(r'.*>(.*)')

clookup = {}
rows = row_re.findall(iso_html)
for row in rows:
        columns = column_re.findall(row)
        if len(columns) > 2:
                # print columns[2]; # for testing
                if country.findall(columns[2]):
                        lw = lastword.findall(columns[0])
                        clookup[columns[2]] = lw[0].title()

# Sample data row from ISO database
# <tr>^M
# <td><span class='sortSpan' style='display:none'>3</span>ALBANIA</td>^M
# <td><span class='sortSpan' style='display:none'>4</span>ALBANIE</td>^M
# <td>AL</td>^M
# </tr>^M

# Sort through spam signup request records to analyse and count

city = re.compile(r"\+0100\s(.*)")

bycountry = {}
bycity = {}

for record in spamdata.splitlines():
        if record.find("+0100") < 1: continue
        location = country.findall(record)
        if location:
                c = location[0]
                bycountry[c] = bycountry.get(c,0) +1
                # following code only needed for verbose mode!
                zone = city.findall(record)
                if zone:
                        ci = zone[0]
                        cx = bycity.get(c,{})
                        if not cx: bycity[c] = cx
                        cy = cx.get(ci,0)
                        bycity[c][ci] = cy + 1

# Uncomment next two lines for testing
# print bycountry
# print bycity

# Report, sorted by spammiest country and towns within

countries = bycountry.keys()
countries.sort(lambda x,y: bycountry[y] - bycountry[x])

for country in countries:
        zones = bycity[country]
        cities = zones.keys()
        cities.sort(lambda x,y: zones[y] - zones[x])
        print country, bycountry[country], clookup[country]
        for spamsource in cities:
                ss = spamsource
                if not ss: ss = "[unknown]"
                if verbose: print "\t", ss, zones[spamsource]

""" Sample outputs:

wizard:oct11 graham$ python noisesource
Note - -v flag lets you report by city
RU 41 Russian Federation
CN 38 China
DE 34 Germany
US 17 United States
UA 16 Ukraine
PL 9 Poland
LV 8 Latvia
GE 7 Georgia
FR 6 France
NL 5 Netherlands
AU 4 Australia
KR 4 Korea, Republic Of
DK 3 Denmark
IL 3 Israel
RO 2 Romania
GB 2 United Kingdom
SI 2 Slovenia
BR 1 Brazil
TW 1 Taiwan, Province Of China
NC 1 New Caledonia
CA 1 Canada
GG 1 Guernsey
IN 1 India
SG 1 Singapore
SE 1 Sweden
wizard:oct11 graham$ python noisesource -v
RU 41 Russian Federation
        [unknown] 17
        Saint Petersburg 7
        Moscow 4
        Velikiy Novgorod 3
        Orel 2
        Seversk 2
        Volgograd 2
        Kazan 1
        Chelyabinsk 1
        Zhirkov 1
        Stavropol 1
CN 38 China
        Beijing 18
        [unknown] 4
        Guangzhou 4
        Putian 3
        Shenyang 2
        Shanghai 2
        Jinan 2
        Nanjing 1
        Wuhan 1
        Qingdao 1
DE 34 Germany
        [unknown] 32
        Düsseldorf 1
        Winnenden 1
US 17 United States
        Clarks Summit 8
        Saint Louis 2
        Saint Paul 1
        Fort Worth 1
        Los Angeles 1
        Portland 1
        Florence 1
        Kenmore 1
        Kirkland 1
UA 16 Ukraine
        [unknown] 7
        Kiev 4
        Kiselëv 3
        Kherson 1
        Saltovka 1
PL 9 Poland
        Warsaw 5
        Cracow 4
LV 8 Latvia
        [unknown] 8
GE 7 Georgia
        Tbilisi 7
FR 6 France
        Paris 5
        [unknown] 1
NL 5 Netherlands
        [unknown] 3
        Didam 1
        Group 1
AU 4 Australia
        Sydney 2
        Adelaide 2
KR 4 Korea, Republic Of
        Seocho 2
        [unknown] 1
        Seoul 1
DK 3 Denmark
        Copenhagen 3
IL 3 Israel
        Rehovot 1
        Tel Aviv 1
        Petah Tiqwa 1
RO 2 Romania
        Iasi 2
GB 2 United Kingdom
        [unknown] 2
SI 2 Slovenia
        Slovenj Gradec 2
BR 1 Brazil
        Rio De Janeiro 1
TW 1 Taiwan, Province Of China
        Taipei 1
NC 1 New Caledonia
        Nouméa 1
CA 1 Canada
        Richmond 1
GG 1 Guernsey
        [unknown] 1
IN 1 India
        Bhopal 1
SG 1 Singapore
        Singapore 1
SE 1 Sweden
        Hässleholm 1
wizard:oct11 graham$


Learn about this subject
This module and example are covered as required on private courses. Should you wish to cover this example and associated subjects, and you're attending a public course to cover other topics with us, please see our extra topic program.

Books covering this topic
Yes. We have over 700 books in our library. Books covering Python are listed here and when you've selected a relevant book we'll link you on to Amazon to order.

Other Examples
This example comes from our "this" training module. You'll find a description of the topic and some other closely related examples on the "this" module index page.

Full description of the source code
You can learn more about this example on the training courses listed on this page, on which you'll be given a full set of training notes.

Many other training modules are available for download (for limited use) from our download centre under an Open Training Notes License.

Other resources
• Our Solutions centre provides a number of longer technical articles.
• Our Opentalk forum archive provides a question and answer centre.
The Horse's mouth provides a daily tip or thought.
• Further resources are available via the resources centre.
• All of these resources can be searched through through our search engine
• And there's a global index here.

Web site author
This web site is written and maintained by Well House Consultants.

Purpose of this website
This is a sample program, class demonstration or answer from a training course. It's main purpose is to provide an after-course service to customers who have attended our public private or on site courses, but the examples are made generally available under conditions described below.

Conditions of use
Past attendees on our training courses are welcome to use individual examples in the course of their programming, but must check the examples they use to ensure that they are suitable for their job. Remember that some of our examples show you how not to do things - check in your notes. Well House Consultants take no responsibility for the suitability of these example programs to customer's needs.

This program is copyright Well House Consultants Ltd. You are forbidden from using it for running your own training courses without our prior written permission. See our page on courseware provision for more details.

Any of our images within this code may NOT be reused on a public URL without our prior permission. For Bona Fide personal use, we will often grant you permission provided that you provide a link back. Commercial use on a website will incur a license fee for each image used - details on request.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2022: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/resources/ex.php • PAGE BUILT: Sun Oct 11 14:50:09 2020 • BUILD SYSTEM: JelliaJamb