Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
Help please with Perl + regular expressions

Posted by Ritchie (Ritchie), 8 January 2004
I'm not the best skilled in Perl, and here is my problem....

I need to search through an HTML page and find all occurences of something like:

<a href="/dev/tester.cgi?area=uk&template=3&name=yours&question=tricky">

and replace them to become

<a href="/dev/test/area/uk/template/3/name/yours/question/tricky">

Where the path and script name are in a variable: $script_name = "/dev/tester.cgi"
and the replacement path in a variable: $short_script_name = "/dev/test"
There may be more or less entries in the query string, but they will always be in pairs.

I have come up with this:

$htmlpage =~ s|<a href="($script_name)/([?])\w+([=])\w+([&])\w+"></a>|$short_script_name / / /| /ge;

Does that look as though it will work ??




Posted by admin (Graham Ellis), 9 January 2004
The way to test out whether something works is to try it out on a test data set .... I look at your code and I think it's more complex that it needs to be (and I think the "e" modifier is wrong); I have a few other concerns too.  Here is a sample piece of code that might help you:

Code:
$sampledata = <<"END";
This is some sample data in which we are going to make som changes
<a href="/dev/tester.cgi?area=uk&template=3&name=yours&question=tricky">
and here the link </a>.
Let's see how this goes
END

$script_name = "/dev/tester.cgi";
$short_name = "/dev/test";

$sampledata =~ s/$script_name/$short_name/g;
while ($sampledata =~ s!$short_name((/\w+)*)[?&](\w+)=(\w+)
               !$short_name$1/$3/$4!x) { };

print $sampledata;


And that runs and produces:

Code:
This is some sample data in which we are going to make som changes
<a href="/dev/test/area/uk/template/3/name/yours/question/tricky">
and here the link </a>.
Let's see how this goes


Which as far as I can see is what you're looking for.

Tips

a) Don't try to do it all in one regular expression - two simple ones are much easier than one complex one

b) The "g" modifier isn't going to work for you as you have a series of overlapping matches - so I've used a while loop.

Posted by Ritchie (Ritchie), 9 January 2004
on 01/09/04 at 13:49:37, Graham Ellis wrote:
Tips

a) Don't try to do it all in one regular expression - two simple ones are much easier than one complex one

b) The "g" modifier isn't going to work for you as you have a series of overlapping matches - so I've used a while loop.


Graham
Many thanks for your reply. I will attempt to see how that works in my scenario.
I agree with what you say about doing it in one go. Since my post I have refined the script to look like......

Code:
sub buildpage {

   @short_name = split(/\./,$long_script_name});                       # make a short version of the script by stripping the extension off
   $short_script_name = $short_name[0]."/";                      # add a slash to replace the ?
   $html =~ s/href="$long_script_name(.*?)">/makenewURL($1)/ge;       # replace all URLs calling the script
   
}

sub makenewURL {
     my ($query) = @_;
   
   $query =~ s/&/\//g;                                                     # tidy up &
   $query =~ s/\?//g;                                                          # get rid of ? (this is to accomodate the form URL - could've gone in next line otherwise)
   $query =~ s/\&|=/\//g;                                                      # get rid of & = and make /
   $newstring = "href=\"".$short_script_name.$query."\">";       # make the new URL
   return $newstring;
}


That works, but I will try yours, but I need to understand it first  

Ciao.




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho