Scenario. I have a lot of data that contains large numbers of records which I want to separate into groups. For example, an incomeing web server log file which I want to split out and process visitor by visitor.
Using Perl, I can loop through my data line by line and store it into a hash - for example:
while ($lyne = H>) {
$lyne =~ /\S+/ ;
$all{$&} .= $lyne;
}
• In a web server log file, the IP address / name of the visiting server is the first non-space string on the line
• It's perfectly valid to do a regular expression match outside a condition - if you're working with an automatically generated data file that does not need any validation, this is acceptable practise too
• $& is the special variable that contains "the bit that matched" after a regular expression match in Perl. If the incoming string is massive and there are lots of matches in a tight loop, you *may* be a bit inefficient if you use $&.
• The
.= operator adds on to the end of an existing string. If I had wanted a list of accesses (rather than a string contaning them all), I could have pushed each recrd onto a list within the hash (but that would be at a later point in the course).
• Implicit reference to a variable such as the hash %all in my example will cause it to be created if it doesn't exist (the very first time through the loop), and each new element in that hash will similarly be created as necessary. In a longer program, creating of a local hash via
my %all may be appropriate.
Using the code above, I then output each of the members of the hash, so grouping records by visiting client, and within visiting client by date and time since that's the order that are stored in the original file:
foreach $visitor(keys %first) {
print $all{$visitor},"\n";
}
• Note the extra \n. In Perl, you always need to think about your new lines. In this example, they're present on the records when read in, they are
notremoved with
chop or
chmomp, so they are kept within the $all string as record delimiters. I've added the extra one in the output code just to provide a degree of separation between the blocks.
The complete program that the snippets above are copied from is on our web site -
[here].
(written 2012-08-15, updated 2012-08-18)
2488
Associated topics are indexed under
P215 - Perl - More about Files [3412] Handling binary data in Perl is easy! - (2011-08-30)
[3320] Reading the nth line from a file (Perl and Tcl examples) - (2011-06-09)
[2964] An introduction to file handling in programs - buffering, standard in and out, and file handles - (2010-09-21)
[2405] But I am reading from a file - no need to prompt (Perl) - (2009-09-14)
[1832] Processing all files in a directory - Perl - (2008-10-11)
[1709] There is more that one way - Perl - (2008-07-14)
[1225] Perl - functions for directory handling - (2007-06-09)
P207 - Perl - File Handling [3830] Traversing a directory in Perl - (2012-08-08)
[3548] Dark mornings, dog update, and Python and Lua courses before Christmas - (2011-12-10)
[3326] Finding your big files in Perl - design considerations beyond the course environment - (2011-06-14)
[2833] Fresh Perl Teaching Examples - part 2 of 3 - (2010-06-27)
[2821] Chancellor George Osborne inspires Perl Program - (2010-06-22)
[2818] File open and read in Perl - modernisation - (2010-06-19)
[2233] Transforming data in Perl using lists of lists and hashes of hashes - (2009-06-12)
[1861] Reactive (dynamic) formatting in Perl - (2008-10-31)
[1860] Seven new intermediate Perl examples - (2008-10-30)
[1841] Formatting with a leading + / Lua and Perl - (2008-10-15)
[1467] stdout v stderr (Tcl, Perl, Shell) - (2007-12-10)
[1442] Reading a file multiple times - file pointers - (2007-11-23)
[1416] Good, steady, simple example - Perl file handling - (2007-10-30)
[1312] Some one line Perl tips and techniques - (2007-08-21)
[867] Being sure to be positive in Perl - (2006-09-15)
[702] Iterators - expressions tha change each time you call them - (2006-04-27)
[618] Perl - its up to YOU to check your file opened - (2006-02-23)
[616] printf - a flawed but useful function - (2006-02-22)
[255] STDIN, STDOUT, STDERR and DATA - Perl file handles - (2005-03-23)
[114] Relative or absolute milkman - (2004-11-10)
[12] How many people in a room? - (2004-08-12)
Some other Articles
Caching Design PatternsRelax at Well House Manor - gardens, fountain, hotelCopying, duplicating, cloning an object in PHPAutoload in PHPSpraying data from one incoming to series of outgoing files in PerlGuest review - Well House Manor, MelkshamEvening behind Melksham SpaIn the garden at Well House ManorThe Information age - not yet truly with us?Geekmas 2012 - celebrating open source languages such as Perl, PHP and Python