Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Object using generator - directory file traverser
File Handling example from a Well House Consultants training course
More on File Handling [link]

This example is described in the following article(s):
   • Loving programming in Python - and ready to teach YOU how - [link]

This example references the following resources:
http://datasift.com/bot.html)
http://www.wellho.net/
https://www.duedil.com/our-crawler/)
https://www.megaindex.ru/?tab=linkAnalyze)
https://www.megaindex.ru/?tab=linkAnalyze)
http://www.bing.com/bingbot.htm)

Source code: dirStream.py Module: Y110

# Object using generator - directory file traverser

""" This class (given a path name) traverses all the files in that
directory and records each record by record. File names are sorted
Asciibetically, but generators are used to avoid the application
growing huge on large data flows. Test harness run on 1 Gbyte of data
across 21 files """


import os
import os.path

class dirStream (object):

        def __init__(this,folder):
                this.folder = folder
                this.places = os.listdir(folder)
                this.places.sort()
                this.fh = None
                this.status = []
                this.status_mode = "waiting"
                this.searching = ""

        def getRecord(this,search = None):
                this.searching = search
                for k in xrange(len(this.places)):
                        fullpath = os.path.join(this.folder,this.places[k])
                        if not os.path.isfile(fullpath): continue
                        this.status.append([fullpath,0,0])
                        this.status_mode = "serving"
                        for lyne in open(fullpath,"r"):
                                this.status[-1][1]+=1
                                if search == None or lyne.find(search) > -1:
                                        this.status[-1][2]+=1
                                        yield lyne
                this.status_mode = "completed"
        def getWhere(this):
                if this.status_mode == "serving":
                        return (this.status[-1][0],this.status[-1][1])
                elif this.status_mode == "waiting":
                        return ("",0)
                else:
                        return ("",-1)
        def getStatus(this):
                v1,v2 = this.getWhere()
                linesread = 0
                linesmatched = 0
                for (f,n,m) in this.status:
                        linesread += n
                        linesmatched += m
                stat = {"stream_status": this.status_mode,
                        "current_file_name": v1,
                        "current_line_number": v2,
                        "current_file_number": len(this.status),
                        "lines_read_so_far": linesread,
                        "lines_matched_so_far": linesmatched,
                        "total_number_files": len(this.places),
                        "searching": (this.searching == None) and "no" or "yes",
                        "searching_for": this.searching
                        }
                return stat
        def getReport(this):
                return this.status

if __name__ == "__main__":

        # Analyse data

        from sys import argv
        try:
                lookfor = argv[1]
        except:
                lookfor = None

        source = dirStream("logs")
        for record in source.getRecord(lookfor):
                file,line = source.getWhere()
                print line,file,record

        # brief report

        report = source.getStatus()
        for k in report.keys():
                print "{0:<20s} {1}".format(k,report[k])
        print "--------------------------------"

        # full report
        for file_info in source.getReport():
                print "{1:8d} {2:8d} {0:s}".format(*file_info)

""" Sample Output

trainee@kingston:~/py_sql$ python dirSteam.py Turbines
71099 logs/ac_20150218 sft042.sysms.net www.wellho.net - [17/Feb/2015:13:28:07 +0000]
"GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1" 200 17283 "-" "Mozilla/5.0
(TweetmemeBot/4.0; +http://datasift.com/bot.html) Gecko/20100101 Firefox/31.0"

71107 logs/ac_20150218 ec2-184-169-203-101.us-west-1.compute.amazonaws.com www.wellho.net -
[17/Feb/2015:13:28:09 +0000] "HEAD /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1"
200 - "-" "Google-HTTP-Java-Client/1.17.0-rc (gzip)"

(etc) - some output snipped

2084 logs/ac_20150221 reth0-609.duedil-fw-01.lon.vorboss.net www.wellho.net -
[20/Feb/2015:04:00:09 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.0"
200 24157 "http://www.wellho.net/" "Mozilla/5.0 (compatible; electricmonk/3.2.0
+https://www.duedil.com/our-crawler/)"

12542 logs/ac_20150221 static.55.174.46.78.clients.your-server.de www.wellho.net -
[20/Feb/2015:05:56:17 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1"
200 23709 "-" "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +https://www.megaindex.ru/?tab=linkAnalyze)"

20609 logs/ac_20150221 static.118.27.76.144.clients.your-server.de www.wellho.net -
[20/Feb/2015:07:21:24 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1"
200 23709 "-" "Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +https://www.megaindex.ru/?tab=linkAnalyze)"

159388 logs/ac_20150221 msnbot-157-55-39-62.search.msn.com www.wellho.net -
[20/Feb/2015:23:38:35 +0000] "GET /mouth/4429_Wind-Turbines-beauty-or-menace-.html HTTP/1.1"
200 23256 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

stream_status completed
current_file_name
current_line_number -1
searching_for Turbines
lines_read_so_far 4011213
lines_matched_so_far 70
total_number_files 21
searching yes
current_file_number 21
--------------------------------
  156228 0 logs/ac_20150201
  161144 0 logs/ac_20150202
  190542 0 logs/ac_20150203
  227646 0 logs/ac_20150204
  221454 0 logs/ac_20150205
  202896 0 logs/ac_20150206
  198114 0 logs/ac_20150207
  175836 0 logs/ac_20150208
  170156 0 logs/ac_20150209
  202743 0 logs/ac_20150210
  190289 0 logs/ac_20150211
  190397 0 logs/ac_20150212
  207429 0 logs/ac_20150213
  251313 0 logs/ac_20150214
  165796 0 logs/ac_20150215
  168314 0 logs/ac_20150216
  194138 0 logs/ac_20150217
  181487 26 logs/ac_20150218
  187665 29 logs/ac_20150219
  185631 11 logs/ac_20150220
  181995 4 logs/ac_20150221
trainee@kingston:~/py_sql$

"""

Learn about this subject
This module and example are covered on the following public courses:
 * Learning to program in Python
 * Python Programming
Also available on on site courses for larger groups

Books covering this topic
Yes. We have over 700 books in our library. Books covering Python are listed here and when you've selected a relevant book we'll link you on to Amazon to order.

Other Examples
This example comes from our "File Handling" training module. You'll find a description of the topic and some other closely related examples on the "File Handling" module index page.

Full description of the source code
You can learn more about this example on the training courses listed on this page, on which you'll be given a full set of training notes.

Many other training modules are available for download (for limited use) from our download centre under an Open Training Notes License.

Other resources
• Our Solutions centre provides a number of longer technical articles.
• Our Opentalk forum archive provides a question and answer centre.
The Horse's mouth provides a daily tip or thought.
• Further resources are available via the resources centre.
• All of these resources can be searched through through our search engine
• And there's a global index here.

Purpose of this website
This is a sample program, class demonstration or answer from a training course. It's main purpose is to provide an after-course service to customers who have attended our public private or on site courses, but the examples are made generally available under conditions described below.

Web site author
This web site is written and maintained by Well House Consultants.

Conditions of use
Past attendees on our training courses are welcome to use individual examples in the course of their programming, but must check the examples they use to ensure that they are suitable for their job. Remember that some of our examples show you how not to do things - check in your notes. Well House Consultants take no responsibility for the suitability of these example programs to customer's needs.

This program is copyright Well House Consultants Ltd. You are forbidden from using it for running your own training courses without our prior written permission. See our page on courseware provision for more details.

Any of our images within this code may NOT be reused on a public URL without our prior permission. For Bona Fide personal use, we will often grant you permission provided that you provide a link back. Commercial use on a website will incur a license fee for each image used - details on request.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2021: 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/resources/ex.php • PAGE BUILT: Sun Oct 11 14:50:09 2020 • BUILD SYSTEM: JelliaJamb