| |||
Data extraction and reporting from multiple files
File and Directory Handling example from a Well House Consultants training course
More on File and Directory Handling [link]
Source code: rail Module: T209
# Scenario. We have a number of (published, government) tables of the number of tickets sold for # journeys to and from stations on the main railway network of Great Britain on an annual basis, but # the format of the file for each year varies - fields have been added and removed, and any numbers # over 999 have commas added to split up the numbers which is great for the human reader, but causes # programs that want to further analyse the data some problems. We also have a file that include the # location of stations in Ordnance Survey grid terms, postcode, latitude and longitude, etc. # # Our objective is to produce a single output file, one record per station, that includes fields for # the station's details at the start, and then a total of the number of entrances and exits year by # year. # Define functions used in main program # Add to a list, or add NULL if value does not exist proc nappend {list value} { upvar $list target set there 0 if {$value != ""} { set updo "info exists $value" catch {set there [uplevel $updo]} errst } if {$there} { upvar $value vn lappend target $vn } else { lappend target NULL } } # Sort by postocde, moving all unpostcoded records to the end proc pcs {a b} { global postcode global tlctable set rv 0 if {[catch {set at $postcode($tlctable($a))} errst]} {return 1} if {[catch {set bt $postcode($tlctable($b))} errst]} {return -1} return [string compare $at $bt] } # -------- Main Program # Definition of fields of interest in varying different annual file formats set filespec { {{station_usage_0910.csv 2009 23} {four 0} {tlc 1} {name 2} {thisyear 21}} {{station_usage_0809.csv 2008 23} {four 0} {tlc 1} {name 2} {thisyear 21}} {{station_usage_0708.csv 2007 23} {four 0} {tlc 1} {name 2} {thisyear 16}} {{station_usage_0607.csv 2006 23} {four 0} {tlc 1} {name 2} {thisyear 16}} {{station_usage_0506.csv 2005 16} {four 4} {tlc 1} {name 0} {thisyear 13}} {{station_usage_0405.csv 2004 16} {four 4} {tlc 1} {name 0} {thisyear 13}} } foreach year $filespec { # read the format for the year set yeardata [lindex $year 0] set wanted [lrange $year 1 end] set filename [lindex $yeardata 0] set yearnumber [lindex $yeardata 1] set minfields [lindex $yeardata 2] set fh [open $filename r] # read each record for the year while 1 { gets $fh record if {$record == ""} break set fields [split $record "\t"] # Eliminate short (header and / or damaged) records if {[llength $fields] < $minfields} continue # extract the fields that are wanted from the right columns for this year foreach pair $wanted { set f_name [lindex $pair 0] set f_col [lindex $pair 1] set $f_name [lindex $fields $f_col] } # Substitute - decomma regsub -all , $thisyear "" thisyear regsub -all {"} $four "" four # Extract - dequote regexp {"([[:alpha:]]{3})"} $tlc whole tlc regsub -all \" $name "" name if {[regexp ^....$ $four]} { # if {$four == "3333"} {puts "$tlc $four $yearnumber $thisyear $name"} # Save the passenger count for this station and year set station_annual("$four/$yearnumber") $thisyear # First time for this station - save name and three letter code if {[info exists nametable($four)]} continue set nametable($four) $name set tlctable($four) $tlc } } } # --- At this point we have read in the data for each year # Now read in the ancilliary data for each station set fh [open stationtables.tsv r] while {[gets $fh line] >= 0} { set fields [split $line "\t"] set tlc [lindex $fields 0] set osgrid($tlc) [lindex $fields end] set postcode($tlc) [lindex $fields 2] set lat($tlc) [lindex $fields 3] set long($tlc) [lindex $fields 4] } # Generate output file; get station order first set fho [open railstats.txt w] set tlcs [array names tlctable] set tlcs [lsort -command pcs $tlcs] foreach station $tlcs { set tlc $tlctable($station) # Initial fields on each line set line "" nappend line station nappend line tlc nappend line postcode($tlc) nappend line osgrid($tlc) nappend line lat($tlc) nappend line long($tlc) nappend line nametable($station) # Add annual data foreach yearnumber {2004 2005 2006 2007 2008 2009} { nappend line station_annual("$station/$yearnumber") } # Fields in output record: # 1 - 4 digit station code # 2 - Three letter code for station # 3 - Station's postcode # 4 - OS grid reference for station # 5 - Latitude of station # 6 - Longitude of station # 7 - Station name # 8 - Ticket sales - April 2004 - March 2005 for joureys to / from this station # 9 - Ticker sales, April 2005 to March 2006 # 10 - Ticker sales, April 2006 to March 2007 # 11 - Ticker sales, April 2005 to March 2008 # 12 - Ticker sales, April 2008 to March 2009 # 13 - Ticker sales, April 2009 to March 2010 # # 14 - April 2010 to March 2011 (added via Perl, 29.3.2012) # Turn into tab delimited line set newline [join $line "\t"] # Output the record provided that there are no problems with it if {[regexp "^...$" $tlc]} { puts $fho $newline } else { puts $newline } } Learn about this subject
This module and example are covered on the following public courses:
* Learning to program in Tcl * Tcl Programming * Tcl Programming Also available on on site courses for larger groups Books covering this topic
Yes. We have over 700 books in our library. Books
covering Tcl, Tk and Expect are listed here and when you've selected a
relevant book we'll link you on to Amazon to order.
Other Examples
This example comes from our "File and Directory Handling" training module. You'll find a description of the topic and some
other closely related examples on the "File and Directory Handling" module index page.
Full description of the source code
You can learn more about this example on the training courses listed on this page,
on which you'll be given a full set of training notes.
Many other training modules are available for download (for limited use) from our download centre under an Open Training Notes License. Other resources
• Our Solutions centre provides a number of longer technical articles.
• Our Opentalk forum archive provides a question and answer centre. • The Horse's mouth provides a daily tip or thought. • Further resources are available via the resources centre. • All of these resources can be searched through through our search engine • And there's a global index here. Purpose of this website
This is a sample program, class demonstration or answer from a
training course. It's main purpose
is to provide an after-course service to customers who have attended our
public private or
on site courses, but the examples are made
generally available under conditions described below.
Web site author
Conditions of use
Past attendees on our training courses are welcome to use individual
examples in the course of their programming, but must check
the examples they use to ensure that they are suitable for their
job. Remember that some of our examples show you how not to do
things - check in your notes. Well House Consultants take no responsibility
for the suitability of these example programs to customer's needs.
This program is copyright Well House Consultants Ltd. You are forbidden from using it for running your own training courses without our prior written permission. See our page on courseware provision for more details. Any of our images within this code may NOT be reused on a public URL without our prior permission. For Bona Fide personal use, we will often grant you permission provided that you provide a link back. Commercial use on a website will incur a license fee for each image used - details on request. | |||
PH: 01144 1225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho PAGE: http://www.wellho.net/resources/ex.php4 • PAGE BUILT: Sun Oct 11 14:50:09 2020 • BUILD SYSTEM: JelliaJamb |