Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Practical Extraction and Reporting - using Python and Extreme Programming

"We seem to be getting a lot of signups from Germany" - so said my fellow administrator on the First Great Western Coffee Shop forum. At first glance is something of a surprise, as this forum is "provided by a First Great Western Customer, for First Great Western customers" and First Great Western run train services from Paddingon to the West of England and South Wales, with a secondary main line from Portmouth to Cardiff, and regional, local suburban and rural trains on other lines within the same territory, with occasional services venturing as far "off piste" as to Brighton. Nowhere near Germany. So why the interest?

Forums provide an opportunity for people to express their views, add their comments on to others, and post up their information. And as such they can provide a wonderful opportunity for people to get off topic messages onto public readable forums on the Internet. My mailbox contains adverts for pharmaceutical products, get-rich-quick schemes, Books on Steve Jobs (this week), overseas graduate programs, Crocuses, Home Security Systems, dating services, airline tickets and more ... and given half a chance, these same people who, unsolicited, pester me by email would love to advertise on the forum and pester people there too. To keep the wood visible amongst the trees, we limit signups on "The Coffeeshop" to those people who have a genuine interest, and who will post about the issues for which the forum exists. We still get plenty of requests for signup, but our vetting process is such that very few of the "spammers" or rather Wannabe Spammers actually manage to get as far as posting. But it's wasteful of our time, and we're always looking to improve our tools to help us spot the spammers quickly; recently, I added in extra logging of signup requests to help us look at them in a "pageview" mode, and we've now come to the reporting requirement to look at the data that's building up to help keep us even better informed for the future.

So ... the specification for the program and of the requirement looks a bit wooly. And I decided to apply some of the techniques of "Extreme Programming" to the task - writing a short story as to what we wanted - "We would like to be able to count up how many spanners come from wehere so that we can tell which places are the worst / most likely" and then tackle it through a spike solution where I wrote experimental code to see how an answer would look. I selected Python for the task (an excellent language for the job, and the language I've been teaching this week) ... and off I headed.

The story turns out to be, as I start coding, to convert data such as:
  1 LV Haus finanzieren andrahartwick@gmail.com 91.224.246.15 Thu, 13 Oct 2011 06:26:34 +0100
  1 CN cabinet519 zhaominyu15@163.com 113.231.181.142 Thu, 13 Oct 2011 06:26:44 +0100 Shenyang

into results like:
  RU 41 Russian Federation
  CN 38 China
  DE 34 Germany
  US 17 United States
  UA 16 Ukraine
  PL 9 Poland
  LV 8 Latvia
  etc


and then expands that if necessary (in fact a separete "story") by zone:
  CN 38 China
      Beijing 18
      [unknown] 4
      Guangzhou 4
      Putian 3
      Shenyang 2
      Shanghai 2
      Jinan 2
      Nanjing 1
      Wuhan 1
      Qingdao 1


Now that I have got to that point in my exploration of the data, if I needed more I would be refactoring - taking what I have learned and recoding it to make it maintainable. You can see the code [here] with some quite notable comments pointing out its shortcomings ready for the refacoring exercise if that even comes (and if you want to run the program yourself, there's a data sample [here]

I'm sharing this example on our web site under our "Data Munging in Python" heading - for even in its raw form it's a good example of some of the techniques commonly used ... in the source, you'll find coding samples of:
• Regular Expressions (to match patterns in data and extract from them)
• Command Line handling (we've used a -v option to select the versbose / by city report)
• Dictionaries (to keep count by countries as we read the data file
• The urllib2 module (to read a web page from a remote server - the ISO country code lookup!)
• Checking whether a file exists (via os.path.exists)
• routing non-data output so stderr (via sys.syderr)
lambda (to provide single line functions)
read (to slurp an entire file into a variable)
title (to take a country name that's SHOUTED AT YOU and reduce it to more manageable speech!)

Truely, so much of the power of any language comes not so much from the power of individual features, but rather from the power of using them in combination, and from reseaching, refactoring and reusing the code that uses those features.

(written 2011-10-14)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
Y201 - Python for DataMunging and System Admin
  [4088] Some tips and techniques for huge data handling in Python - (2013-05-15)
  [4211] Handling JSON in Python (and a csv, marshall and pickle comparison) - (2013-11-16)
  [4438] Loving programming in Python - and ready to teach YOU how - (2015-02-22)

Y117 - Python - Already written modules
  [2020] Learning Python - many new example programs - (2009-01-31)
  [2506] Good example of recursion in Python - analyse an RSS feed - (2009-11-18)
  [2890] Dates and times in Python - (2010-07-27)
  [2931] Syncronise - software, trains, and buses. Please! - (2010-08-22)
  [3442] A demonstration of how many Python facilities work together - (2011-09-16)
  [3465] How can I do an FTP transfer in Python? - (2011-10-05)
  [4085] JSON from Python - first principles, easy example - (2013-05-13)
  [4086] Cacheing class for Python - using a local SQLite database as a key/value store - (2013-05-14)
  [4441] Reading command line parameters in Python - (2015-02-23)
  [4452] Binary data handling - Python and Perl - (2015-03-09)
  [4696] Programming with random numbers - yet re-using the same values for testing - (2016-06-22)
  [4697] Month, Day, Year number to day of week and month names in Python - English and Swedish - (2016-06-23)
  [4708] Scons - a build system in Python - building hello world - (2016-10-29)
  [4710] Searching a Json or XML structure for a specific key / value pair in Python - (2016-10-30)

G903 - Well House Consultants - Running and moderating forums and social media sites
  [22] Falling out over the silliest things - (2004-08-21)
  [29] Silence is Golden - (2004-08-26)
  [115] Expiration dates or times on web pages - (2004-11-12)
  [130] Spelling and grammar - (2004-11-25)
  [204] The confidence to allow public comments - (2005-02-06)
  [231] Feedback as lifeblood - (2005-02-28)
  [248] Use me, but use me effectively - (2005-03-16)
  [424] How not to run a forum - (2005-08-24)
  [516] Open source questions? Anyone can ask. - (2005-12-03)
  [651] Please Register with Opentalk - but just once! - (2006-03-19)
  [806] Check your user is human. Have him retype a word in a graphic - (2006-07-17)
  [828] Freedom of speech and freedom to post - (2006-08-10)
  [841] Forum help - a push in the right direction - (2006-08-21)
  [919] Freedom for X is denial of privacy for Y - (2006-11-09)
  [923] Why shouldn't I spam? - (2006-11-13)
  [948] Running an on line campaign - (2006-11-27)
  [1088] Why use BBC code not HTML? - (2007-02-21)
  [1190] Save the Forum - A regular clean sweep - (2007-05-17)
  [1362] No Thank You - (2007-09-23)
  [1472] The Horse goes on and on - (2007-12-15)
  [1485] Copyright and theft of images, bandwidth and members. - (2007-12-26)
  [1523] Ive just received an email from myself. Should I be worried? - (2008-01-29)
  [1532] Comment spam blocked. Please comment via Forums - (2008-02-05)
  [1539] A forum is not always the best vehicle - (2008-02-14)
  [1563] Guidlines for posting on a forum - (2008-03-04)
  [1569] I dont care - goodbye - (2008-03-09)
  [1578] Please don't shout at me! - (2008-03-16)
  [1595] First Great Western Weekend - (2008-03-30)
  [1678] Software - changes and delays. But courses must run on time! - (2008-06-15)
  [1759] While the world sleeps ... - (2008-08-19)
  [1923] Making it all worthwhile - (2008-12-04)
  [1972] Pettifog and forum boards away from public view - (2009-01-03)
  [2103] Ask the Tutor - Open Source forum - (2009-03-25)
  [2116] Why do we delay new forum members through authorisation? - (2009-04-03)
  [2156] Stopping forum spam - control of the signup process - (2009-05-04)
  [2162] Admins thoughts on banning a member from a forum - (2009-05-09)
  [2177] Preventing forum spam - checks at sign up - (2009-05-12)
  [2254] Forum membership - a privilege not a right - (2009-06-22)
  [2386] Computing under the influence of alcohol - (2009-08-29)
  [2526] A reluctance to move from old shoes to new - (2009-12-05)
  [2527] Flying tonight - (2009-12-05)
  [2569] How to run a successful online poll / petition / survey / consultation - (2010-01-10)
  [2781] The 500 pound question to get you started - (2010-05-26)
  [2820] Netiquette for forum newcomers - (2010-06-20)
  [3910] Identifying your real customers and keeping them well informed fast - (2012-11-02)
  [4017] Acceptable User Policy / vexatious interacter - (2013-02-24)
  [4025] Backups, Codebase, Strategy and more - dealing with forum incidents - (2013-03-03)
  [4065] Handling requests to a forum - the background process - (2013-04-17)
  [4234] Change to Libel and Defamation laws from 1st January 2014 - (2013-12-31)
  [4239] Facebook marketing - early experiences - (2014-01-19)
  [4283] Can a legitimate forum post become illegal a year later? - (2014-07-11)
  [4307] Identifying and clearing denial of service attacks on your Apache server - (2014-09-27)
  [4315] Welcoming genuine forum posters quickly - but turning away off topic advertisers - (2014-11-16)
  [4403] The unbalanced relationship between customer and provider - (2015-01-21)
  [4492] Almost so wrong, but perhaps it's right for some? - (2015-05-11)


Back to
Testing your Python classes with the unittest package - how to
Previous and next
or
Horse's mouth home
Forward to
Direct Message: Really horrible blog about you ... a clever phishing trip, said to be from an MP
Some other Articles
Canals, watererways in the Melksham area
Taking a boat down Caen Hill Locks
Some thoughts in answer to some Melksham Campus questions
Direct Message: Really horrible blog about you ... a clever phishing trip, said to be from an MP
Practical Extraction and Reporting - using Python and Extreme Programming
Testing your Python classes with the unittest package - how to
Choosing your Python GUI - wx, Qt, Tk or GTK?
Tkinter - an easy to use Python Graphic User Interface - introductory examples
Havant - Shop Frontages.
Python Packages - groupings of modules. An introduction
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3479_Pra ... mming.html • PAGE BUILT: Sun Oct 11 16:07:41 2020 • BUILD SYSTEM: JelliaJamb