Home Accessibility Courses Diary The Mouth Facebook Resources Site Map About Us Contact
Speeding up your Perl code

On Friday morning - our Perl for Larger Project course - I was looking at coding efficiency / run speed with delegates. As an example, we took a data file from our web server logs - some 23 Mbytes of data, comprising about 121,000 lines, of which 1099 contained the word "melksham" in lower case.

The control case - [source] - took 1370 seconds (that's about 23 minutes) to run. I read the whole file into a long string scalar:
  open FH,"ac_20110723" or die;
  read FH,$buffer,-s "ac_20110723";

and then looked through it for "melksham" lines, which I reported:
  $count = 0;
  while ($buffer =~ /.*melksham.*/g) {
    print "$&\n" if ($debug);
    $count++ ;
    }

well - I didn't quite report them (I set a debug flag off) because I didn't want to skew my figures by the time taken to scroll the information past the uses on the terminal output.

I'm often asked about efficient coding ... whether
  $n = $n + 1;
is slower than
  $n += 1;
and if that is slower than
  $n++;
The answer is "yes, it probably gets very slightly more efficient to use one of the shorter forms", but in reallity the difference is so slight that it really makes no practical difference in most cases. Let's see how we can make a serious difference to our data file analysis example above.

1. Start regular expressions with a literal

All I did at first was to add \n at the start of my regular expression. And that meant that the regular expression handler wasn't trying to match at every start character in the string - it only had to start at each new line. So
  while ($buffer =~ /.*melksham.*/g) {
became
  while ($buffer =~ /\n.*melksham.*/g) {
See [full source]

Tiny code difference? Yes ... but my 1370 seconds runtime dropped to ... just 22 seconds. That's over 60 times faster!

2. Start regular expressions with a zero width assetion (anchor)

If you're not able to find an appropriate literal to which to key, an anchor is a good but perhaps slightly less efficient alternative:
  while ($buffer =~ /^.*melksham.*/gm) {
see [full source]

which cut down from 1370 to 27 seconds - not as good as the 22 seconds of our first experiment, but still rather good.

3. Don't use $' $` or $& - find an alternative

If you refer to one of these three variables anywhere in your code, every regular expression match that you perform save out the three variables in case they're used. So in the our example, that's 25 Mbytes at every succsssful match. And my previous program never actually runs the code that makes the reference. Ouch!

Match becomes:
 &mbsp;while ($buffer =~ /^(.*melksham.*)/gm) {
and my reference to $& becomes to $1:
 &mbsp;print "$1\n" if ($debug);
see [full source]

and my 27 seconds drops to 7 seconds.

4. Should we read line by line, into a list, or into a single string?

I replace my read by a while loop that read the data line by line. Then I replaced it by a <> read into a list, which I parsed with a foreach. See [here] and [here].

Incredibly ... the 7 seconds drops to less than a second with foreach and an incredibly fast 0.16 of a second with a while loop. Yes - that's code running 8500 times faster than my control.


5. Does replacing a regular expression with a string function make it quicker?

Sometimes, yes ... but in this case, replacing
 &mbsp;while ($buffer =~ /^(.*melksham.*)/gm) {
by
  if (index($buffer,"melksham") >= 0) {
didn't make any noticable difference.

There are three factors ....
a) The regex has got very simple, and Perl has probably optimised it anyway (a good lesson in encouraging you to use straightforward code)
b) The text machine I'm using (a MacBook Air) has a big chunk of memory rather than a disc, which probably does funny things to the stats. There's certainly no wait time of disc drives ....
c) Times are so short that they can't be measured reliably on a single cycle.

Using the Benchmark module and the timethese method from it, you can rerun, time, and average out a series of tests ... and that's good for continuing to optimise. See [here] for full source.

Perhaps with a little more effort ... that original chunk of code could be 10,000 times faster ... and that's with just a little thought using the techniques described above.
(written 2011-07-30)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P667 - Perl - Handling Huge Data
  [3375] How to interact with a Perl program while it is processing data - (2011-07-31)
  [2834] Teaching examples in Perl - third and final part - (2010-06-27)
  [2806] Macho matching - do not do it! - (2010-06-13)
  [2805] How are you getting on? - (2010-06-13)
  [2376] Long job - progress bar techniques (Perl) - (2009-08-26)
  [1924] Preventing ^C stopping / killing a program - Perl - (2008-12-05)
  [1920] Progress Bar Techniques - Perl - (2008-12-03)
  [1397] Perl - progress bar, supressing ^C and coping with huge data flows - (2007-10-20)
  [975] Answering ALL the delegate's Perl questions - (2006-12-09)
  [762] Huge data files - what happened earlier? - (2006-06-15)
  [639] Progress bars and other dynamic reports - (2006-03-09)


Back to
Another busy Week at Well House Manor ... pictures from the midweek
Previous and next
or
Horse's mouth home
Forward to
How to interact with a Perl program while it is processing data
Some other Articles
New product - ensuring that supply matches demand
What do I mean when I add things in Perl?
Kennet and Avon - Walk from Bedwyn to Pewsey. TransWilts day out.
Speeding up your Perl code
Another busy Week at Well House Manor ... pictures from the midweek
Wearing the new London uniform
From Wiltshire to Weymouth on Sundays
Standing Challenge
Local Council leads bans on many activities
4252 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3374_Spe ... -code.html • PAGE BUILT: Sun Mar 30 15:20:58 2014 • BUILD SYSTEM: WomanWithCat