| |||
Object using generator - directory file traverser
File Handling example from a Well House Consultants training course
More on File Handling [link]
Source code: dirStream.py Module: Y110
# Object using generator - directory file traverser """ This class (given a path name) traverses all the files in that directory and records each record by record. File names are sorted Asciibetically, but generators are used to avoid the application growing huge on large data flows. Test harness run on 1 Gbyte of data across 21 files """ import os import os.path class dirStream (object): def __init__(this,folder): this.folder = folder this.places = os.listdir(folder) this.places.sort() this.fh = None this.status = [] this.status_mode = "waiting" this.searching = "" def getRecord(this,search = None): this.searching = search for k in xrange(len(this.places)): fullpath = os.path.join(this.folder,this.places[k]) if not os.path.isfile(fullpath): continue this.status.append([fullpath,0,0]) this.status_mode = "serving" for lyne in open(fullpath,"r"): this.status[-1][1]+=1 if search == None or lyne.find(search) > -1: this.status[-1][2]+=1 yield lyne this.status_mode = "completed" def getWhere(this): if this.status_mode == "serving": return (this.status[-1][0],this.status[-1][1]) elif this.status_mode == "waiting": return ("",0) else: return ("",-1) def getStatus(this): v1,v2 = this.getWhere() linesread = 0 linesmatched = 0 for (f,n,m) in this.status: linesread += n linesmatched += m stat = {"stream_status": this.status_mode, "current_file_name": v1, "current_line_number": v2, "current_file_number": len(this.status), "lines_read_so_far": linesread, "lines_matched_so_far": linesmatched, "total_number_files": len(this.places), "searching": (this.searching == None) and "no" or "yes", "searching_for": this.searching } return stat def getReport(this): return this.status if __name__ == "__main__": # Analyse data from sys import argv try: lookfor = argv[1] except: lookfor = None source = dirStream("logs") for record in source.getRecord(lookfor): file,line = source.getWhere() print line,file,record # brief report report = source.getStatus() for k in report.keys(): print "{0:<20s} {1}".format(k,report[k]) print "--------------------------------" # full report for file_info in source.getReport(): print "{1:8d} {2:8d} {0:s}".format(*file_info) """ Sample Output trainee@kingston:~/py_sql$ python dirSteam.py Turbines 71099 logs/ac_20150218 sft042.sysms.net www.wellho.net - [17/Feb/2015:13:28:07 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1" 200 17283 "-" "Mozilla/5.0 (TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0" 71107 logs/ac_20150218 ec2-184-169-203-101.us-west-1.compute.amazonaws.com www.wellho.net - [17/Feb/2015:13:28:09 +0000] "HEAD /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1" 200 - "-" "Google-HTTP-Java-Client/1.17.0-rc (gzip)" (etc) - some output snipped 2084 logs/ac_20150221 reth0-609.duedil-fw-01.lon.vorboss.net www.wellho.net - [20/Feb/2015:04:00:09 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.0" 200 24157 "http://www.wellho.net/" "Mozilla/5.0 (compatible; electricmonk/3.2.0 +https://www.duedil.com/our-crawler/)" 12542 logs/ac_20150221 static.55.174.46.78.clients.your-server.de www.wellho.net - [20/Feb/2015:05:56:17 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1" 200 23709 "-" "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +https://www.megaindex.ru/?tab=linkAnalyze)" 20609 logs/ac_20150221 static.118.27.76.144.clients.your-server.de www.wellho.net - [20/Feb/2015:07:21:24 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1" 200 23709 "-" "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +https://www.megaindex.ru/?tab=linkAnalyze)" 159388 logs/ac_20150221 msnbot-157-55-39-62.search.msn.com www.wellho.net - [20/Feb/2015:23:38:35 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1" 200 23256 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" stream_status completed current_file_name current_line_number -1 searching_for Turbines lines_read_so_far 4011213 lines_matched_so_far 70 total_number_files 21 searching yes current_file_number 21 -------------------------------- 156228 0 logs/ac_20150201 161144 0 logs/ac_20150202 190542 0 logs/ac_20150203 227646 0 logs/ac_20150204 221454 0 logs/ac_20150205 202896 0 logs/ac_20150206 198114 0 logs/ac_20150207 175836 0 logs/ac_20150208 170156 0 logs/ac_20150209 202743 0 logs/ac_20150210 190289 0 logs/ac_20150211 190397 0 logs/ac_20150212 207429 0 logs/ac_20150213 251313 0 logs/ac_20150214 165796 0 logs/ac_20150215 168314 0 logs/ac_20150216 194138 0 logs/ac_20150217 181487 26 logs/ac_20150218 187665 29 logs/ac_20150219 185631 11 logs/ac_20150220 181995 4 logs/ac_20150221 trainee@kingston:~/py_sql$ """ Learn about this subject
This module and example are covered on the following public courses:
* Learning to program in Python * Python Programming Also available on on site courses for larger groups Books covering this topic
Yes. We have over 700 books in our library. Books
covering Python are listed here and when you've selected a
relevant book we'll link you on to Amazon to order.
Other Examples
This example comes from our "File Handling" training module. You'll find a description of the topic and some
other closely related examples on the "File Handling" module index page.
Full description of the source code
You can learn more about this example on the training courses listed on this page,
on which you'll be given a full set of training notes.
Many other training modules are available for download (for limited use) from our download centre under an Open Training Notes License. Other resources
• Our Solutions centre provides a number of longer technical articles.
• Our Opentalk forum archive provides a question and answer centre. • The Horse's mouth provides a daily tip or thought. • Further resources are available via the resources centre. • All of these resources can be searched through through our search engine • And there's a global index here. Purpose of this website
This is a sample program, class demonstration or answer from a
training course. It's main purpose
is to provide an after-course service to customers who have attended our
public private or
on site courses, but the examples are made
generally available under conditions described below.
Web site author
Conditions of use
Past attendees on our training courses are welcome to use individual
examples in the course of their programming, but must check
the examples they use to ensure that they are suitable for their
job. Remember that some of our examples show you how not to do
things - check in your notes. Well House Consultants take no responsibility
for the suitability of these example programs to customer's needs.
This program is copyright Well House Consultants Ltd. You are forbidden from using it for running your own training courses without our prior written permission. See our page on courseware provision for more details. Any of our images within this code may NOT be reused on a public URL without our prior permission. For Bona Fide personal use, we will often grant you permission provided that you provide a link back. Commercial use on a website will incur a license fee for each image used - details on request. | |||
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho PAGE: http://www.wellho.net/resources/ex.php • PAGE BUILT: Sun Oct 11 14:50:09 2020 • BUILD SYSTEM: JelliaJamb |