If you want to extract two distinct reports from a large data source, there are a number of ways you could do it. The first two are not brilliant:
1. You could read the entire file into memory, and then traverse it several times in a loop. This is a poor solution if the data becomes huge - the footprint of the program becomes massive, it may start swapping on and off the disc, and indeed it may crash "out of memory".
2. You could read the file multiple times. This is hard going on the disc, and potentially very slow as disc access times can be significant.
The third solution, which I describe fully below, is MUCH better ... you can read your data in record by record, just once and store the data you need for each report into separate variables as you go along. There's a new example from this week's Perl course
[here].
Processing a web access log file (30 Mb but could be far bigger!) line by line:
while ($line = ) {
@parts = split(/\s+/,$line);
I built up strings with the extracted data that I needed for huge URL reads
if ($parts[9] > 1000000) {
$huge .= "$parts[3] $parts[8] $parts[9] $parts[6]\n";
}
and for requests that generated server errors
if ($parts[8] >= 500) {
$server .= "$parts[3] $parts[8] $parts[6]\n";
}
all within the same read loop - here's the end of the while loop:
}
Then - after the file reading was completed - I printed out the results:
print "$huge\n";
print "$server\n";
That same example has been expanded ... into a third report. I can (of course) add as many reports as I like to this, but in this third case I've used a list instead to collect the data I need
within the same while loop that reads the whole file:
if ($line =~ /Trowbridge/) {
push @toon,"$parts[6] $parts[9]\n";
}
This has then allowed me to reorder (sort) the report data before sending it to the output:
@toon = sort(@toon);
print "Trowbridge, by page name:\n@toon\n";
Example written on this week's
Learning to program in Perl course.
(written 2011-12-09, updated 2011-12-17)
2475
Associated topics are indexed under
P205 - Perl - Initial String Handling [3770] Sample answers to training course exercises - available on our web site - (2012-06-21)
[3548] Dark mornings, dog update, and Python and Lua courses before Christmas - (2011-12-10)
[3411] Single and double quotes strings in Perl - what is the difference? - (2011-08-30)
[3005] Lots of ways of doing it in Perl - printing out answers - (2010-10-19)
[2963] Removing the new line with chop or chomp in Perl - what is the difference? - (2010-09-21)
[2832] Are you learning Perl? Some more examples for you! - (2010-06-27)
[2816] Intelligent Matching in Perl - (2010-06-18)
[2798] Perl - skip the classics and use regular expressions - (2010-06-08)
[1860] Seven new intermediate Perl examples - (2008-10-30)
[1849] String matching in Perl with Regular Expressions - (2008-10-20)
[1608] Underlining in Perl and Python - the x and * operator in use - (2008-04-12)
[1195] Regular Express Primer - (2007-05-20)
[987] Ruby v Perl - interpollating variables - (2006-12-15)
[970] String duplication - x in Perl, * in Python and Ruby - (2006-12-07)
[324] The backtick operator in Python and Perl - (2005-05-25)
[254] x operator in Perl - (2005-03-22)
[31] Here documents - (2004-08-28)
Some other Articles
Some terms used in programming (Biased towards Python)Provide a useable train service, and people will use it!Well House Manor - perhaps the best hotel rooms in MelkshamUsing Perl to generate multiple reports from a HUGE file, efficientlyThe difference between dot (a.k.a. full stop, period) and comma in PerlFinding all matches to a pattern in Perl regular expressionsLooking for hotel rooms in Melksham over Christmas? We still have some availabilitySome different pictures from MelkshamWhat order are operations performed in, in a Perl expression?