Fastest way to replace chars - Perl Programming

Posted by John_Moylan (jfp), 25 September 2002

Evening all.

Now this has puzzled me, in the sense that I never expected it ito be faster....but.

I want to convert 2002-09-18 04:45:22 to 20020918044522
This has to be done on hundreds of thousands of lines so I thought I'd better benchmark it on 6000 lines.
(Can't test on more for now, corrupt mysqldump, lIwas lucky to salvage 6000 lines, but anyway)

First I thought of using s/\D//g; in my method below.
This took 5 seconds (using Benchmark.pm)

But the three steps of
$date =~ tr/-//d;
$date =~ tr/ //d;
$date =~ tr/://d;
took only 4 seconds.

Is this to do with the regex engine overhead?
I was sure the 3 step process would be slower.

Code:

sub DateToTimestamp () {

# the date is in the format of '2002-09-18 04:45:22'
# but I want a timestamp of '20020918044522'

my ($self, $date) = @_;

$date =~ tr/-//d;
$date =~ tr/ //d;
$date =~ tr/://d;

print "$date\n";

return $date;
}

Or have I missed something?

jfp

Posted by admin (Graham Ellis), 25 September 2002

I'm not at all suprised at the result. Regular expressions are very clever, and that cleverness does add some slowing down. On the other hand, tr simply builds up a 256 character translate table and blats the data through it.

If you think about it, even your simple regex has to look at each character against a list of (10) digits and loop internally to check that each character isn't one of them ....

Question for you. Are the dashes, colons and spaces always at exacltly the same character position number in the string? If they are, I think you might find that it's even quicker to use unpack or a series of substrs, followed by a pack to reform the parts of the time and datestamp.

By the way - welcome to the rank of "Established Poster" - you're no longer a Newcomer! - G

This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.