If you want to find the
position of a match in an incoming string, simply check the length of $` (That's $PREMATCH if you've chosen to
use English;) to check where it starts, and add the length of $& (that's $MATCH) to find where it ends.
Lets say I want to find all the URLs referred to in a web page that's loaded into the variable $html. I could write:
push @section,[length($`),length($&),$1]
while ($html =~ m!(https?://[^ >"]+)!g);
and that will give me a list of 3-element lists containing start point, length and actual string matched. Here's the code to display that list:
foreach $element(@section) {
print (join(", ",@$element),"\n");
}
and here's some of the results from the sources of our
resources index
5979, 36, http://www.wellho.net/forum/top.html
6967, 36, http://www.wellho.net/net/mouth.html
7059, 42, http://www.wellho.net/downloads/index.html
8369, 67, http://www.wellho.net/mouth/387_Training-course-plans-for-2006.html
9365, 43, http://www.trainingcenter.co.uk/travel.html
9516, 45, http://reiseauskunft.bahn.de/bin/query.exe/en
9599, 59, http://www.livedepartureboards.co.uk/ldb/summary.aspx?T=MKM
9861, 48, https://lightning.he.net/~wellho/net/secure.html
P.S. I loaded my whole web page into a single variable using the code
open (FH,"/Library/WebServer/live_html/resources/index.html");
undef $/;
$html = <FH>;
which is a nice little demo of changing (or removing) the delimiter character for reading from a file handle, via the $/ variable. Once $/ has been undef-fed, reading into a scalar slurps from the current pointer in the file right through to the end of file. (written 2006-02-02, updated 2009-11-29)
Commentator | says ... | Dave Cross: | You should warn people that using $`, $& and $' is a potential performance hit as any use of one of those variables in a program means that Perl has to track all of those variables for every match in your program. You can get the same information without the performance implications by using @- and @+.
And, I know this is just a demonstration, but encouraging people to parse HTML using regexes is a really bad idea. It's a much better idea to use something like HTML::Parser (or one of its subclasses like, in this case, HTML::LinkExtor). (comment added 2006-02-02 06:52:23) |
Graham Ellis: | Thanks, Dave. Totally agree your comments. However there can be so many "if"s and "but"s added to any example that it becomes hard to see the wood from the trees.
Yes, there are FAR better ways of parsing HTML but it was a nice example and, yes, $` and friends can be ineffiicient. So if you want to say where in a string a regular expression match is to be found, what you you use as a more efficient alternative? (comment added 2006-02-02 07:53:53) |
Dave Cross: | As I mentioned in my first comment, you can get the information using @- and @+.
push @section,[$-[0],$+[0] - $-[1],$1]
while ($html =~ m!(https?://[^ >"]+)!g);
One other point I forgot to mention earlier.
Special variables like $/ should only ever be changed using 'local' in a block - so that they regain their former value once you exit that block. You don't want to leave interesting values in those variables which might break the rest of your program.
So I'd write your example as:
open (FH,"/Library/WebServer/live_html/resources/index.html");
my $html;
{
local $/ = undef;
$html = ;
}
Or, more idiomatically:
open (FH,"/Library/WebServer/live_html/resources/index.html");
my $html = do { local $/; };
(comment added 2006-02-02 10:42:31) |
Graham Ellis: | Don't you just love the way there's always half a dozen ways to do things in Perl. Truely a great language, but one that's biased toward being fantastic to use for the practitioner who's really deep into it.
Dave - many thanks for all the inputs / alternatives / caveats. I agree 'em all ... (and note your @+ and @- comments that I overlooked yesterday). I hope we haven't frightened of the newcomer who asked what he felt was going to be answered by a single simple line! (comment added 2006-02-03 07:23:34) |
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P212 - Perl - More on Character Strings [453] Commenting Perl regular expressions - (2005-09-30)
[583] Remember to process blank lines - (2006-01-31)
[597] Storing a regular expression in a perl variable - (2006-02-09)
[608] Don't expose your regular expressions - (2006-02-15)
[737] Coloured text in a terminal from Perl - (2006-05-29)
[928] C++ and Perl - why did they do it THAT way? - (2006-11-16)
[943] Matching within multiline strings, and ignoring case in regular expressions - (2006-11-25)
[1222] Perl, the substitute operator s - (2007-06-08)
[1230] Commenting a Perl Regular Expression - (2007-06-12)
[1251] Substitute operator / modifiers in Perl - (2007-06-28)
[1305] Regular expressions made easy - building from components - (2007-08-16)
[1336] Ignore case in Regular Expression - (2007-09-08)
[1510] Handling Binary data (.gif file example) in Perl - (2008-01-17)
[1727] Equality and looks like tests - Perl - (2008-07-29)
[1735] Finding words and work boundaries (MySQL, Perl, PHP) - (2008-08-03)
[1947] Perl substitute - the e modifier - (2008-12-16)
[2230] Running a piece of code is like drinking a pint of beer - (2009-06-11)
[2379] Making variables persistant, pretending a database is a variable and other Perl tricks - (2009-08-27)
[2657] Want to do a big batch edit? Nothing beats Perl! - (2010-03-01)
[2801] Binary data handling with unpack in Perl - (2010-06-10)
[2834] Teaching examples in Perl - third and final part - (2010-06-27)
[2874] Unpacking a Perl string into a list - (2010-07-16)
[2877] Further more advanced Perl examples - (2010-07-19)
[2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
[3059] Object Orientation in an hour and other Perl Lectures - (2010-11-18)
[3100] Looking ahead and behind in Regular Expressions - double matching - (2010-12-23)
[3322] How much has Perl (and other languages) changed? - (2011-06-10)
[3332] DNA to Amino Acid - a sample Perl script - (2011-06-24)
[3411] Single and double quotes strings in Perl - what is the difference? - (2011-08-30)
[3546] The difference between dot (a.k.a. full stop, period) and comma in Perl - (2011-12-09)
[3630] Serialsing and unserialising data for storage and transfer in Perl - (2012-02-28)
[3650] Possessive Regular Expression Matching - Perl, Objective C and some other languages - (2012-03-12)
[3707] Converting codons via Amino Acids to Proteins in Perl - (2012-04-25)
[3927] First match or all matches? Perl Regular Expressions - (2012-11-19)
[4452] Binary data handling - Python and Perl - (2015-03-09)
Some other Articles
Danny and Donna are getting marriedRobust PHP user inputsChanging @INC - where Perl loads its modulesJob vacancy - double agent wantedPerl Regular Expressions - finding the position and length of the matchLooking for Python staffLoosing breath with GeraldDWIM and AWWOSaving a MySQL query results to your local disc for Excel