Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
This week, we're updating our course layouts and descriptions. Presentation and materials always gently change over time, but just occasionally there's a need to make a step change to clear out some of the old and roll in the new. That's now happening - but over a long and complex site it's not instant and you'll see sections of the site changing up to and including 19th September.

See also [here] for status update
 
Help please with Perl + regular expressions

Posted by Ritchie (Ritchie), 8 January 2004
I'm not the best skilled in Perl, and here is my problem....

I need to search through an HTML page and find all occurences of something like:

<a href="/dev/tester.cgi?area=uk&template=3&name=yours&question=tricky">

and replace them to become

<a href="/dev/test/area/uk/template/3/name/yours/question/tricky">

Where the path and script name are in a variable: $script_name = "/dev/tester.cgi"
and the replacement path in a variable: $short_script_name = "/dev/test"
There may be more or less entries in the query string, but they will always be in pairs.

I have come up with this:

$htmlpage =~ s|<a href="($script_name)/([?])\w+([=])\w+([&])\w+"></a>|$short_script_name / / /| /ge;

Does that look as though it will work ??




Posted by admin (Graham Ellis), 9 January 2004
The way to test out whether something works is to try it out on a test data set .... I look at your code and I think it's more complex that it needs to be (and I think the "e" modifier is wrong); I have a few other concerns too.  Here is a sample piece of code that might help you:

Code:
$sampledata = <<"END";
This is some sample data in which we are going to make som changes
<a href="/dev/tester.cgi?area=uk&template=3&name=yours&question=tricky">
and here the link </a>.
Let's see how this goes
END

$script_name = "/dev/tester.cgi";
$short_name = "/dev/test";

$sampledata =~ s/$script_name/$short_name/g;
while ($sampledata =~ s!$short_name((/\w+)*)[?&](\w+)=(\w+)
               !$short_name$1/$3/$4!x) { };

print $sampledata;


And that runs and produces:

Code:
This is some sample data in which we are going to make som changes
<a href="/dev/test/area/uk/template/3/name/yours/question/tricky">
and here the link </a>.
Let's see how this goes


Which as far as I can see is what you're looking for.

Tips

a) Don't try to do it all in one regular expression - two simple ones are much easier than one complex one

b) The "g" modifier isn't going to work for you as you have a series of overlapping matches - so I've used a while loop.

Posted by Ritchie (Ritchie), 9 January 2004
on 01/09/04 at 13:49:37, Graham Ellis wrote:
Tips

a) Don't try to do it all in one regular expression - two simple ones are much easier than one complex one

b) The "g" modifier isn't going to work for you as you have a series of overlapping matches - so I've used a while loop.


Graham
Many thanks for your reply. I will attempt to see how that works in my scenario.
I agree with what you say about doing it in one go. Since my post I have refined the script to look like......

Code:
sub buildpage {

   @short_name = split(/\./,$long_script_name});                       # make a short version of the script by stripping the extension off
   $short_script_name = $short_name[0]."/";                      # add a slash to replace the ?
   $html =~ s/href="$long_script_name(.*?)">/makenewURL($1)/ge;       # replace all URLs calling the script
   
}

sub makenewURL {
     my ($query) = @_;
   
   $query =~ s/&/\//g;                                                     # tidy up &
   $query =~ s/\?//g;                                                          # get rid of ? (this is to accomodate the form URL - could've gone in next line otherwise)
   $query =~ s/\&|=/\//g;                                                      # get rid of & = and make /
   $newstring = "href=\"".$short_script_name.$query."\">";       # make the new URL
   return $newstring;
}


That works, but I will try yours, but I need to understand it first  

Ciao.




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho