I like programming in Java, but I love programming in Python. It's been a real pleasure to get back to Python this morning. I'm teaching a private course in Cambridge this week, and a
public python course the following week. And a new example as work my hand back in ...
Scenario - I require to read records from a whole folder of files and run a combined analysis of them. I'm looking at huge files - our server logs which are between 40Mbytes and 65Mbytes per day, and analysing a month or more of them at the same time.
I've written a class called
dirStream, into the constructor of which I pass the folder name for the files. And I then loop through the data being returned by the stream, which (optionally) can be filtering for only records that match a paricular pattern. The example here has called another method to get the file name and line number in that file where the record was found:
source = dirStream("logs")
for record in source.getRecord(lookfor):
file,line = source.getWhere()
print line,file,record
As this is my test harness, I've then exercised the other methods I've provided - firstly for a brief report:
report = source.getStatus()
for k in report.keys():
print "{0:<20s} {1}".format(k,report[k])
And then for a full report on the number of records and matches in each input file:
for file_info in source.getReport():
print "{1:8d} {2:8d} {0:s}".format(*file_info)
Let's see that in action, searching for "Salisbury" references for the last 3 weeks:
python dirStream.py Salisbury
stream_status completed
current_file_name
current_line_number -1
searching_for Salisbury
lines_read_so_far 4011213
lines_matched_so_far 942
total_number_files 21
searching yes
current_file_number 21
And the detailed output
156228 27 logs/ac_20150201
161144 55 logs/ac_20150202
190542 22 logs/ac_20150203
227646 44 logs/ac_20150204
221454 67 logs/ac_20150205
202896 45 logs/ac_20150206
198114 104 logs/ac_20150207
175836 56 logs/ac_20150208
170156 34 logs/ac_20150209
202743 62 logs/ac_20150210
190289 52 logs/ac_20150211
190397 56 logs/ac_20150212
207429 44 logs/ac_20150213
251313 31 logs/ac_20150214
165796 25 logs/ac_20150215
168314 13 logs/ac_20150216
194138 65 logs/ac_20150217
181487 15 logs/ac_20150218
187665 65 logs/ac_20150219
185631 31 logs/ac_20150220
181995 29 logs/ac_20150221
The complete example's
source code is available to you, with some comments and wrapped so that you can make use of it too for this common "parse all the records in all the files in a directory" requirement.
Of note to delegates / learners - interesting Python things:
• Use of generator within a method
• A constuctor that does more than just store incoming values
• A state holder (this.status_mode)
• Optional parameters
• Use of a dict to return a whole series of named status values
• use of "and" and "or" as a lazy "if" and "else"
• passing in multiple values to a format method using "*" to expand a list
• exception handling to cheaply pick up lack of command line selectors
• use of os.path.join to add in the appropriate file / folder separator character for the current OS
• conditional use of
from to load extra code only if running the test programs
• A method that returns multiple values (a tuple)
I think I said at the start -
I love programming in Python (written 2015-02-22)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
Y201 - Python for DataMunging and System Admin [3479] Practical Extraction and Reporting - using Python and Extreme Programming - (2011-10-14)
[4088] Some tips and techniques for huge data handling in Python - (2013-05-15)
[4211] Handling JSON in Python (and a csv, marshall and pickle comparison) - (2013-11-16)
Y200 - Python - using functions, objects and modules. [418] Difference between import and from in Python - (2005-08-18)
[4719] Nesting decorators - (2016-11-02)
Y110 - Python - File Handling [114] Relative or absolute milkman - (2004-11-10)
[183] The elegance of Python - (2005-01-19)
[1442] Reading a file multiple times - file pointers - (2007-11-23)
[2011] Conversion of OSI grid references to Eastings and Northings - (2009-01-28)
[2282] Checking robots.txt from Python - (2009-07-12)
[2870] Old prices - what would the equivalent price have been in 1966? - (2010-07-14)
[3083] Python - fresh examples from recent courses - (2010-12-11)
[3442] A demonstration of how many Python facilities work together - (2011-09-16)
[3465] How can I do an FTP transfer in Python? - (2011-10-05)
[3558] Python or Lua - which should I use / learn? - (2011-12-21)
[3764] Shell, Awk, Perl of Python? - (2012-06-14)
[4451] Running an operating system command from your Python program - the new way with the subprocess module - (2015-03-06)
[4593] Command line parameter handling in Python via the argparse module - (2015-12-08)
[4663] Easy data to object mapping (csv and Python) - (2016-03-24)
[4708] Scons - a build system in Python - building hello world - (2016-10-29)
[4717] with in Python - examples of use, and of defining your own context - (2016-11-02)
Some other Articles
Mutable v Immuatble objects in Python, and the implicationReading command line parameters in PythonA first graph with Matplotlib in PythonJson is the new marshall, pickle and cPickle / PythonLoving programming in Python - and ready to teach YOU howAdding a PHP build option, rotating an image based on camera data, and a new look at thumbnails in PHPAccessing a MySQL database from Python with mysql.connectorImages of our rail promotion campaign Public training courses - upcoming datesDifferent views of a Welsh Valley - but headed home