For 2023 (and 2024 ...) - we are now fully retired from IT training. We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.
Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!
I am also active in many other area and still look after a lot of web sites - you can find an index ((here)) |
Using Perl to generate multiple reports from a HUGE file, efficiently
If you want to extract two distinct reports from a large data source, there are a number of ways you could do it. The first two are not brilliant:
1. You could read the entire file into memory, and then traverse it several times in a loop. This is a poor solution if the data becomes huge - the footprint of the program becomes massive, it may start swapping on and off the disc, and indeed it may crash "out of memory".
2. You could read the file multiple times. This is hard going on the disc, and potentially very slow as disc access times can be significant.
The third solution, which I describe fully below, is MUCH better ... you can read your data in record by record, just once and store the data you need for each report into separate variables as you go along. There's a new example from this week's Perl course [here].
Processing a web access log file (30 Mb but could be far bigger!) line by line:
while ($line = ) {
@parts = split(/\s+/,$line);
I built up strings with the extracted data that I needed for huge URL reads
if ($parts[9] > 1000000) {
$huge .= "$parts[3] $parts[8] $parts[9] $parts[6]\n";
}
and for requests that generated server errors
if ($parts[8] >= 500) {
$server .= "$parts[3] $parts[8] $parts[6]\n";
}
all within the same read loop - here's the end of the while loop:
}
Then - after the file reading was completed - I printed out the results:
print "$huge\n";
print "$server\n";
That same example has been expanded ... into a third report. I can (of course) add as many reports as I like to this, but in this third case I've used a list instead to collect the data I need within the same while loop that reads the whole file:
if ($line =~ /Trowbridge/) {
push @toon,"$parts[6] $parts[9]\n";
}
This has then allowed me to reorder (sort) the report data before sending it to the output:
@toon = sort(@toon);
print "Trowbridge, by page name:\n@toon\n";
Example written on this week's Learning to program in Perl course.
(written 2011-12-09, updated 2011-12-17)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles P205 - Perl - Initial String Handling [31] Here documents - (2004-08-28) [254] x operator in Perl - (2005-03-22) [324] The backtick operator in Python and Perl - (2005-05-25) [970] String duplication - x in Perl, * in Python and Ruby - (2006-12-07) [987] Ruby v Perl - interpollating variables - (2006-12-15) [1195] Regular Express Primer - (2007-05-20) [1608] Underlining in Perl and Python - the x and * operator in use - (2008-04-12) [1849] String matching in Perl with Regular Expressions - (2008-10-20) [1860] Seven new intermediate Perl examples - (2008-10-30) [2798] Perl - skip the classics and use regular expressions - (2010-06-08) [2816] Intelligent Matching in Perl - (2010-06-18) [2832] Are you learning Perl? Some more examples for you! - (2010-06-27) [2963] Removing the new line with chop or chomp in Perl - what is the difference? - (2010-09-21) [3005] Lots of ways of doing it in Perl - printing out answers - (2010-10-19) [3411] Single and double quotes strings in Perl - what is the difference? - (2011-08-30) [3548] Dark mornings, dog update, and Python and Lua courses before Christmas - (2011-12-10) [3770] Sample answers to training course exercises - available on our web site - (2012-06-21)
Some other Articles
Some terms used in programming (Biased towards Python)Provide a useable train service, and people will use it!Well House Manor - perhaps the best hotel rooms in MelkshamUsing Perl to generate multiple reports from a HUGE file, efficientlyThe difference between dot (a.k.a. full stop, period) and comma in PerlFinding all matches to a pattern in Perl regular expressionsLooking for hotel rooms in Melksham over Christmas? We still have some availabilitySome different pictures from MelkshamWhat order are operations performed in, in a Perl expression?
|
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page
This is a page archived from The Horse's Mouth at
http://www.wellho.net/horse/ -
the diary and writings of Graham Ellis.
Every attempt was made to provide current information at the time the
page was written, but things do move forward in our business - new software
releases, price changes, new techniques. Please check back via
our main site for current courses,
prices, versions, etc - any mention of a price in "The Horse's Mouth"
cannot be taken as an offer to supply at that price.
Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).
|