Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Perl Regular Expressions - finding the position and length of the match

If you want to find the position of a match in an incoming string, simply check the length of $` (That's $PREMATCH if you've chosen to use English;) to check where it starts, and add the length of $& (that's $MATCH) to find where it ends.

Lets say I want to find all the URLs referred to in a web page that's loaded into the variable $html. I could write:


push @section,[length($`),length($&),$1]
while ($html =~ m!(https?://[^ >"]+)!g);


and that will give me a list of 3-element lists containing start point, length and actual string matched. Here's the code to display that list:


foreach $element(@section) {
print (join(", ",@$element),"\n");
}


and here's some of the results from the sources of our resources index


5979, 36, http://www.wellho.net/forum/top.html
6967, 36, http://www.wellho.net/net/mouth.html
7059, 42, http://www.wellho.net/downloads/index.html
8369, 67, http://www.wellho.net/mouth/387_Training-course-plans-for-2006.html
9365, 43, http://www.trainingcenter.co.uk/travel.html
9516, 45, http://reiseauskunft.bahn.de/bin/query.exe/en
9599, 59, http://www.livedepartureboards.co.uk/ldb/summary.aspx?T=MKM
9861, 48, https://lightning.he.net/~wellho/net/secure.html


P.S. I loaded my whole web page into a single variable using the code

open (FH,"/Library/WebServer/live_html/resources/index.html");
undef $/;
$html = <FH>;

which is a nice little demo of changing (or removing) the delimiter character for reading from a file handle, via the $/ variable. Once $/ has been undef-fed, reading into a scalar slurps from the current pointer in the file right through to the end of file.
(written 2006-02-02, updated 2009-11-29)

Commentatorsays ...
Dave Cross:You should warn people that using $`, $& and $' is a potential performance hit as any use of one of those variables in a program means that Perl has to track all of those variables for every match in your program. You can get the same information without the performance implications by using @- and @+.

And, I know this is just a demonstration, but encouraging people to parse HTML using regexes is a really bad idea. It's a much better idea to use something like HTML::Parser (or one of its subclasses like, in this case, HTML::LinkExtor).
(comment added 2006-02-02 06:52:23)
Graham Ellis:Thanks, Dave. Totally agree your comments. However there can be so many "if"s and "but"s added to any example that it becomes hard to see the wood from the trees.

Yes, there are FAR better ways of parsing HTML but it was a nice example and, yes, $` and friends can be ineffiicient. So if you want to say where in a string a regular expression match is to be found, what you you use as a more efficient alternative?
(comment added 2006-02-02 07:53:53)
Dave Cross:As I mentioned in my first comment, you can get the information using @- and @+.

push @section,[$-[0],$+[0] - $-[1],$1]
while ($html =~ m!(https?://[^ >"]+)!g);

One other point I forgot to mention earlier.

Special variables like $/ should only ever be changed using 'local' in a block - so that they regain their former value once you exit that block. You don't want to leave interesting values in those variables which might break the rest of your program.

So I'd write your example as:

open (FH,"/Library/WebServer/live_html/resources/index.html");

my $html;
{
local $/ = undef;

$html = ;
}

Or, more idiomatically:

open (FH,"/Library/WebServer/live_html/resources/index.html");

my $html = do { local $/; };

(comment added 2006-02-02 10:42:31)
Graham Ellis:Don't you just love the way there's always half a dozen ways to do things in Perl. Truely a great language, but one that's biased toward being fantastic to use for the practitioner who's really deep into it.

Dave - many thanks for all the inputs / alternatives / caveats. I agree 'em all ... (and note your @+ and @- comments that I overlooked yesterday). I hope we haven't frightened of the newcomer who asked what he felt was going to be answered by a single simple line!
(comment added 2006-02-03 07:23:34)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
P212 - Perl - More on Character Strings
  [4452] Binary data handling - Python and Perl - (2015-03-09)
  [3927] First match or all matches? Perl Regular Expressions - (2012-11-19)
  [3707] Converting codons via Amino Acids to Proteins in Perl - (2012-04-25)
  [3650] Possessive Regular Expression Matching - Perl, Objective C and some other languages - (2012-03-12)
  [3630] Serialsing and unserialising data for storage and transfer in Perl - (2012-02-28)
  [3546] The difference between dot (a.k.a. full stop, period) and comma in Perl - (2011-12-09)
  [3411] Single and double quotes strings in Perl - what is the difference? - (2011-08-30)
  [3332] DNA to Amino Acid - a sample Perl script - (2011-06-24)
  [3322] How much has Perl (and other languages) changed? - (2011-06-10)
  [3100] Looking ahead and behind in Regular Expressions - double matching - (2010-12-23)
  [3059] Object Orientation in an hour and other Perl Lectures - (2010-11-18)
  [2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
  [2877] Further more advanced Perl examples - (2010-07-19)
  [2874] Unpacking a Perl string into a list - (2010-07-16)
  [2834] Teaching examples in Perl - third and final part - (2010-06-27)
  [2801] Binary data handling with unpack in Perl - (2010-06-10)
  [2657] Want to do a big batch edit? Nothing beats Perl! - (2010-03-01)
  [2379] Making variables persistant, pretending a database is a variable and other Perl tricks - (2009-08-27)
  [2230] Running a piece of code is like drinking a pint of beer - (2009-06-11)
  [1947] Perl substitute - the e modifier - (2008-12-16)
  [1735] Finding words and work boundaries (MySQL, Perl, PHP) - (2008-08-03)
  [1727] Equality and looks like tests - Perl - (2008-07-29)
  [1510] Handling Binary data (.gif file example) in Perl - (2008-01-17)
  [1336] Ignore case in Regular Expression - (2007-09-08)
  [1305] Regular expressions made easy - building from components - (2007-08-16)
  [1251] Substitute operator / modifiers in Perl - (2007-06-28)
  [1230] Commenting a Perl Regular Expression - (2007-06-12)
  [1222] Perl, the substitute operator s - (2007-06-08)
  [943] Matching within multiline strings, and ignoring case in regular expressions - (2006-11-25)
  [928] C++ and Perl - why did they do it THAT way? - (2006-11-16)
  [737] Coloured text in a terminal from Perl - (2006-05-29)
  [608] Don't expose your regular expressions - (2006-02-15)
  [597] Storing a regular expression in a perl variable - (2006-02-09)
  [583] Remember to process blank lines - (2006-01-31)
  [453] Commenting Perl regular expressions - (2005-09-30)


Back to
Looking for Python staff
Previous and next
or
Horse's mouth home
Forward to
Job vacancy - double agent wanted
Some other Articles
Danny and Donna are getting married
Robust PHP user inputs
Changing @INC - where Perl loads its modules
Job vacancy - double agent wanted
Perl Regular Expressions - finding the position and length of the match
Looking for Python staff
Loosing breath with Gerald
DWIM and AWWO
Saving a MySQL query results to your local disc for Excel
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/586_Perl ... match.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat