Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Help please with Perl + regular expressions

Posted by Ritchie (Ritchie), 8 January 2004
I'm not the best skilled in Perl, and here is my problem....

I need to search through an HTML page and find all occurences of something like:

<a href="/dev/tester.cgi?area=uk&template=3&name=yours&question=tricky">

and replace them to become

<a href="/dev/test/area/uk/template/3/name/yours/question/tricky">

Where the path and script name are in a variable: $script_name = "/dev/tester.cgi"
and the replacement path in a variable: $short_script_name = "/dev/test"
There may be more or less entries in the query string, but they will always be in pairs.

I have come up with this:

$htmlpage =~ s|<a href="($script_name)/([?])\w+([=])\w+([&])\w+"></a>|$short_script_name / / /| /ge;

Does that look as though it will work ??




Posted by admin (Graham Ellis), 9 January 2004
The way to test out whether something works is to try it out on a test data set .... I look at your code and I think it's more complex that it needs to be (and I think the "e" modifier is wrong); I have a few other concerns too.  Here is a sample piece of code that might help you:

Code:
$sampledata = <<"END";
This is some sample data in which we are going to make som changes
<a href="/dev/tester.cgi?area=uk&template=3&name=yours&question=tricky">
and here the link </a>.
Let's see how this goes
END

$script_name = "/dev/tester.cgi";
$short_name = "/dev/test";

$sampledata =~ s/$script_name/$short_name/g;
while ($sampledata =~ s!$short_name((/\w+)*)[?&](\w+)=(\w+)
               !$short_name$1/$3/$4!x) { };

print $sampledata;


And that runs and produces:

Code:
This is some sample data in which we are going to make som changes
<a href="/dev/test/area/uk/template/3/name/yours/question/tricky">
and here the link </a>.
Let's see how this goes


Which as far as I can see is what you're looking for.

Tips

a) Don't try to do it all in one regular expression - two simple ones are much easier than one complex one

b) The "g" modifier isn't going to work for you as you have a series of overlapping matches - so I've used a while loop.

Posted by Ritchie (Ritchie), 9 January 2004
on 01/09/04 at 13:49:37, Graham Ellis wrote:
Tips

a) Don't try to do it all in one regular expression - two simple ones are much easier than one complex one

b) The "g" modifier isn't going to work for you as you have a series of overlapping matches - so I've used a while loop.


Graham
Many thanks for your reply. I will attempt to see how that works in my scenario.
I agree with what you say about doing it in one go. Since my post I have refined the script to look like......

Code:
sub buildpage {

   @short_name = split(/\./,$long_script_name});                       # make a short version of the script by stripping the extension off
   $short_script_name = $short_name[0]."/";                      # add a slash to replace the ?
   $html =~ s/href="$long_script_name(.*?)">/makenewURL($1)/ge;       # replace all URLs calling the script
   
}

sub makenewURL {
     my ($query) = @_;
   
   $query =~ s/&/\//g;                                                     # tidy up &
   $query =~ s/\?//g;                                                          # get rid of ? (this is to accomodate the form URL - could've gone in next line otherwise)
   $query =~ s/\&|=/\//g;                                                      # get rid of & = and make /
   $newstring = "href=\"".$short_script_name.$query."\">";       # make the new URL
   return $newstring;
}


That works, but I will try yours, but I need to understand it first  

Ciao.




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho