Home Accessibility Courses Diary The Mouth Facebook Resources Site Map About Us Contact
Using Perl to generate multiple reports from a HUGE file, efficiently

If you want to extract two distinct reports from a large data source, there are a number of ways you could do it. The first two are not brilliant:

1. You could read the entire file into memory, and then traverse it several times in a loop. This is a poor solution if the data becomes huge - the footprint of the program becomes massive, it may start swapping on and off the disc, and indeed it may crash "out of memory".

2. You could read the file multiple times. This is hard going on the disc, and potentially very slow as disc access times can be significant.

The third solution, which I describe fully below, is MUCH better ... you can read your data in record by record, just once and store the data you need for each report into separate variables as you go along. There's a new example from this week's Perl course [here].

Processing a web access log file (30 Mb but could be far bigger!) line by line:

  while ($line = ) {
    @parts = split(/\s+/,$line);


I built up strings with the extracted data that I needed for huge URL reads
    if ($parts[9] > 1000000) {
      $huge .= "$parts[3] $parts[8] $parts[9] $parts[6]\n";
    }


and for requests that generated server errors
    if ($parts[8] >= 500) {
      $server .= "$parts[3] $parts[8] $parts[6]\n";
    }


all within the same read loop - here's the end of the while loop:

  }

Then - after the file reading was completed - I printed out the results:

  print "$huge\n";
  print "$server\n";





That same example has been expanded ... into a third report. I can (of course) add as many reports as I like to this, but in this third case I've used a list instead to collect the data I need within the same while loop that reads the whole file:

    if ($line =~ /Trowbridge/) {
      push @toon,"$parts[6] $parts[9]\n";
    }


This has then allowed me to reorder (sort) the report data before sending it to the output:

  @toon = sort(@toon);
  print "Trowbridge, by page name:\n@toon\n";


Example written on this week's Learning to program in Perl course.

(written 2011-12-09, updated 2011-12-17)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P205 - Perl - Initial String Handling
  [3770] Sample answers to training course exercises - available on our web site - (2012-06-21)
  [3548] Dark mornings, dog update, and Python and Lua courses before Christmas - (2011-12-10)
  [3411] Single and double quotes strings in Perl - what is the difference? - (2011-08-30)
  [3005] Lots of ways of doing it in Perl - printing out answers - (2010-10-19)
  [2963] Removing the new line with chop or chomp in Perl - what is the difference? - (2010-09-21)
  [2832] Are you learning Perl? Some more examples for you! - (2010-06-27)
  [2816] Intelligent Matching in Perl - (2010-06-18)
  [2798] Perl - skip the classics and use regular expressions - (2010-06-08)
  [1860] Seven new intermediate Perl examples - (2008-10-30)
  [1849] String matching in Perl with Regular Expressions - (2008-10-20)
  [1608] Underlining in Perl and Python - the x and * operator in use - (2008-04-12)
  [1195] Regular Express Primer - (2007-05-20)
  [987] Ruby v Perl - interpollating variables - (2006-12-15)
  [970] String duplication - x in Perl, * in Python and Ruby - (2006-12-07)
  [324] The backtick operator in Python and Perl - (2005-05-25)
  [254] x operator in Perl - (2005-03-22)
  [31] Here documents - (2004-08-28)


Back to
The difference between dot (a.k.a. full stop, period) and comma in Perl
Previous and next
or
Horse's mouth home
Forward to
Dark mornings, dog update, and Python and Lua courses before Christmas
Some other Articles
Some terms used in programming (Biased towards Python)
Provide a useable train service, and people will use it!
Well House Manor - perhaps the best hotel rooms in Melksham
Using Perl to generate multiple reports from a HUGE file, efficiently
The difference between dot (a.k.a. full stop, period) and comma in Perl
Finding all matches to a pattern in Perl regular expressions
Looking for hotel rooms in Melksham over Christmas? We still have some availability
Some different pictures from Melksham
What order are operations performed in, in a Perl expression?
4252 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3547_Usi ... ently.html • PAGE BUILT: Sun Mar 30 15:20:58 2014 • BUILD SYSTEM: WomanWithCat