Perl Regular Expressions - finding the position and length of the match

For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))

If you want to find the position of a match in an incoming string, simply check the length of $` (That's $PREMATCH if you've chosen to use English;) to check where it starts, and add the length of $& (that's $MATCH) to find where it ends.

Lets say I want to find all the URLs referred to in a web page that's loaded into the variable $html. I could write:



push @section,[length($`),length($&),$1]

        while ($html =~ m!(https?://[^ >"]+)!g);

and that will give me a list of 3-element lists containing start point, length and actual string matched. Here's the code to display that list:



foreach $element(@section) {

        print (join(", ",@$element),"\n");

        }

and here's some of the results from the sources of our resources index



5979, 36, http://www.wellho.net/forum/top.html

6967, 36, http://www.wellho.net/net/mouth.html

7059, 42, http://www.wellho.net/downloads/index.html

8369, 67, http://www.wellho.net/mouth/387_Training-course-plans-for-2006.html

9365, 43, http://www.trainingcenter.co.uk/travel.html

9516, 45, http://reiseauskunft.bahn.de/bin/query.exe/en

9599, 59, http://www.livedepartureboards.co.uk/ldb/summary.aspx?T=MKM

9861, 48, https://lightning.he.net/~wellho/net/secure.html

P.S. I loaded my whole web page into a single variable using the code



open (FH,"/Library/WebServer/live_html/resources/index.html");

undef $/;

$html = <FH>;

which is a nice little demo of changing (or removing) the delimiter character for reading from a file handle, via the $/ variable. Once $/ has been undef-fed, reading into a scalar slurps from the current pointer in the file right through to the end of file.
(written 2006-02-02, updated 2009-11-29)

Commentator	says ...
Dave Cross:	You should warn people that using $`, $& and $' is a potential performance hit as any use of one of those variables in a program means that Perl has to track all of those variables for every match in your program. You can get the same information without the performance implications by using @- and @+. And, I know this is just a demonstration, but encouraging people to parse HTML using regexes is a really bad idea. It's a much better idea to use something like HTML::Parser (or one of its subclasses like, in this case, HTML::LinkExtor). (comment added 2006-02-02 06:52:23)
Graham Ellis:	Thanks, Dave. Totally agree your comments. However there can be so many "if"s and "but"s added to any example that it becomes hard to see the wood from the trees. Yes, there are FAR better ways of parsing HTML but it was a nice example and, yes, $` and friends can be ineffiicient. So if you want to say where in a string a regular expression match is to be found, what you you use as a more efficient alternative? (comment added 2006-02-02 07:53:53)
Dave Cross:	As I mentioned in my first comment, you can get the information using @- and @+. push @section,[$-[0],$+[0] - $-[1],$1] while ($html =~ m!(https?://[^ >"]+)!g); One other point I forgot to mention earlier. Special variables like $/ should only ever be changed using 'local' in a block - so that they regain their former value once you exit that block. You don't want to leave interesting values in those variables which might break the rest of your program. So I'd write your example as: open (FH,"/Library/WebServer/live_html/resources/index.html"); my $html; { local $/ = undef; $html = ; } Or, more idiomatically: open (FH,"/Library/WebServer/live_html/resources/index.html"); my $html = do { local $/; }; (comment added 2006-02-02 10:42:31)
Graham Ellis:	Don't you just love the way there's always half a dozen ways to do things in Perl. Truely a great language, but one that's biased toward being fantastic to use for the practitioner who's really deep into it. Dave - many thanks for all the inputs / alternatives / caveats. I agree 'em all ... (and note your @+ and @- comments that I overlooked yesterday). I hope we haven't frightened of the newcomer who asked what he felt was going to be answered by a single simple line! (comment added 2006-02-03 07:23:34)

Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles

P212 - Perl - More on Character Strings
  [453] Commenting Perl regular expressions - (2005-09-30)
  [583] Remember to process blank lines - (2006-01-31)
  [597] Storing a regular expression in a perl variable - (2006-02-09)
  [608] Don't expose your regular expressions - (2006-02-15)
  [737] Coloured text in a terminal from Perl - (2006-05-29)
  [928] C++ and Perl - why did they do it THAT way? - (2006-11-16)
  [943] Matching within multiline strings, and ignoring case in regular expressions - (2006-11-25)
  [1222] Perl, the substitute operator s - (2007-06-08)
  [1230] Commenting a Perl Regular Expression - (2007-06-12)
  [1251] Substitute operator / modifiers in Perl - (2007-06-28)
  [1305] Regular expressions made easy - building from components - (2007-08-16)
  [1336] Ignore case in Regular Expression - (2007-09-08)
  [1510] Handling Binary data (.gif file example) in Perl - (2008-01-17)
  [1727] Equality and looks like tests - Perl - (2008-07-29)
  [1735] Finding words and work boundaries (MySQL, Perl, PHP) - (2008-08-03)
  [1947] Perl substitute - the e modifier - (2008-12-16)
  [2230] Running a piece of code is like drinking a pint of beer - (2009-06-11)
  [2379] Making variables persistant, pretending a database is a variable and other Perl tricks - (2009-08-27)
  [2657] Want to do a big batch edit? Nothing beats Perl! - (2010-03-01)
  [2801] Binary data handling with unpack in Perl - (2010-06-10)
  [2834] Teaching examples in Perl - third and final part - (2010-06-27)
  [2874] Unpacking a Perl string into a list - (2010-07-16)
  [2877] Further more advanced Perl examples - (2010-07-19)
  [2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
  [3059] Object Orientation in an hour and other Perl Lectures - (2010-11-18)
  [3100] Looking ahead and behind in Regular Expressions - double matching - (2010-12-23)
  [3322] How much has Perl (and other languages) changed? - (2011-06-10)
  [3332] DNA to Amino Acid - a sample Perl script - (2011-06-24)
  [3411] Single and double quotes strings in Perl - what is the difference? - (2011-08-30)
  [3546] The difference between dot (a.k.a. full stop, period) and comma in Perl - (2011-12-09)
  [3630] Serialsing and unserialising data for storage and transfer in Perl - (2012-02-28)
  [3650] Possessive Regular Expression Matching - Perl, Objective C and some other languages - (2012-03-12)
  [3707] Converting codons via Amino Acids to Proteins in Perl - (2012-04-25)
  [3927] First match or all matches? Perl Regular Expressions - (2012-11-19)
  [4452] Binary data handling - Python and Perl - (2015-03-09)

Back to
Looking for Python staff

Previous and next
or
Horse's mouth home

Forward to
Job vacancy - double agent wanted

Some other Articles

Danny and Donna are getting married
Robust PHP user inputs
Changing @INC - where Perl loads its modules
Job vacancy - double agent wanted
Perl Regular Expressions - finding the position and length of the match
Looking for Python staff
Loosing breath with Gerald
DWIM and AWWO
Saving a MySQL query results to your local disc for Excel

4759 posts, page by page

Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page

This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

Like this? ??