| |||||||||||
| |||||||||||
Filter Large Log Files. Posted by pr19939 (pr19939), 1 February 2005 Hi Ellis,I came across your following code for bulk filtering of data. %wanted = ( "members.aol.com" => 1, "www.geocities.com" => 1, "groups.yahoo.com" => 1, "home.earthlink.net" => 1 ); # Parse the incoming data looking for matches while ($line = <>) { ($server) = ($line =~ m!//(.*?)/!); if ($wanted{$server}) { print $line; } } I tried the above code.But no luck. What i need is the opposite of the above code.I want to filter the lines having .gif and .jpg extensions from a 350,000 lines logfile. Right now i used the index search method.It takes 3 hours.So i would like to know the best approach.Time frame is my main concern.Sample code $fileArray; my $totlogfile = "$today-TotalLogFile"; my $totlogfile1 = "today-TotalLogFile1"; my $totlogfilebkup="TotalLogFileBkup"; open(total,">$totlogfilebkup") || die("Could not open out file!$outfile");#outfile is declared before opendir(DIR, "logfiles") or die "couldn't open logs"; while ( defined ($filename = readdir(DIR)) ) { $index = index($filename,$yesterday); if ($index > -1) { $fileArray[$count] = $filename; $count = $count + 1; print "The log file name is $filename.\n"; open(logfile,"$filename") || die("Couldx not open file! $logfilename");#$logfilename declared while($line = <logfile>) { chomp($line); unless(( $line =~ /\.gif/i ) || ( $line =~ /\.jpg/i ) || ( $line =~ /\.jpeg/i ) || ( $line =~ /\.js/i ) || ( $line =~ /\.css/i ) || ( $line =~ /tickerServlet/i ) || ( $line =~ /nagios/i ) || ( $line =~ /statusservlet/i )) { print total "$line\n"; } } close logfile; } } closedir(DIR); close total; Can you please help? Thanks Thanks in advance. Posted by admin (Graham Ellis), 1 February 2005 I'm not surprised that it's slow ... but 3 hours?? I've just run some tests with my own log files and your filter; I have 700,000 lines in 31 log files for January, so that's twice the amount of data you have, and I ran the following as a "control" case - similar to your code but without the prints: Code:
That took 8.1 seconds By removing the ignore case option on the regular expressions, that time dropped to 4.5 seconds. By removing the chomp (why are you chomping when you add the \n back on later?) it came down to 4.3 seconds By using separate tests rather than stringing them all together with a || it came down to 4.1 seconds By using index rather than regular expressions, it came down to 2.9 seconds. Here's the final code I was running: Code:
As a final test, I took out all the record filtering any my script ran in 2.1 seconds ... that really shows how the time was used! Suggestion - try out my changes in your program as appropriate. I don't know where your 3 hours is going though ... seems very odd. And use the times function (perhaps run it in several places) to help you find where the efficiency issue is Posted by pr19939 (pr19939), 8 February 2005 Hi Ellis,I tried your sample code.the first code worked fine and printed the number of lines satisfying the search criteria.But i could not find a way to write the output into a new file.I am not able to use two file handles.(One for opening the log file to be read and the other for writing the output into a new file). The "next-if" structure did not work. Please advice. Thanks in advance Posted by admin (Graham Ellis), 8 February 2005 on 02/08/05 at 09:34:55, pr19939 wrote:
You can open a file for write uisng open (FHO,">abbcc.txt"); and write to it using print FHO; or similar Quote:
That's a tricky one; my examples were written and tested against my log files. If you can tell me how it failed for you (compile error, didn't work at runtime, something else), and perhaps post up a line or two of the code that failed, I'll be much better able to offer specific advise. This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |