Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
Importing a regular expression from a file

Posted by meanroy (meanroy), 7 August 2006
I have a script which is used to remove spam from a Wiki.
The collection of spam phrases (and Spammer urls)has grown unwieldy.
Here's a 'snippit' to show how I use the regex.
Code:
     
     $_ = $returned_text;
     # check for spam      
     # match upper or lower case
     if (/\.ro                    #anything from Romania
           |\.ru                             # anything from russia
           |<div\ style              # anyone trying to make a div
           |hot-girls                    # porn site
            /ix) {
           # Rewrite the page without the spam
                 }

I've not been able to figure out how to keep the banned phrases and URL's in a file and import them.

I found this reference from Perl Regular Expressions - detailed manual but it doesn't seem quite what I need.
Quote:
`(?imsx-imsx)'
            One or more embedded pattern-match modifiers.
            This is particularly useful for dynamic pat­
            terns, such as those read in from a configura­
            tion file, read in as an argument, are specified
            in a table somewhere, etc.  Consider the case
            that some of which want to be case sensitive and
            some do not.  The case insensitive ones need to
            include merely `(?i)' at the front of the pat­
            tern.      For example:

                $pattern = "foobar";
                if ( /$pattern/i ) { }

                # more flexible:

                $pattern = "(?i)foobar";

I may be missing some trivial way to do this but I can't seem to find it.
Do you have any suggestion?

Roy.
===========
Oopsie...
It looks like the perlfaq6 manpage may help though I'm having trouble with putting more than one expression in the file.
The example
Code:
   chomp($pattern = <STDIN>);
   if ($line =~ /$pattern/) { }

only does one line. I'll keep hacking but if you could clarify this I'd appreciate it.

Posted by admin (Graham Ellis), 8 August 2006
I think you're getting there.  I suggest 1 RE per line, read in and chomp each in turn.

Posted by meanroy (meanroy), 8 August 2006
Yes, that seems to work but at the expense of using a lot of time reading the file. (relatively)
Maybe I should bring the file into a Hash and sequence through the Hash or into an array of strings and step through the array?
Roy

- edit -
I've gone back to the "Owl" book, Mastering Regular Expressions, since after skipping lightly past this particular little problem, I'm up against more regex madness.
After I've figured out things a bit I'll post some code. (I hope!)

Posted by meanroy (meanroy), 10 August 2006
I've been able to get a file of regex to work somewhat, though I have more to do, but can't seem to get past this pecularity:
Quote:
#!perl
# regextrysplit.pl
use warnings;
use strict;
my @MatchFoundArray =  "";
my $WikiHistory ="* [Any garbled crap2345n&%]i";
print "WikiHistory contains:$WikiHistory\n";   # make sure I know whats in there
@MatchFoundArray = split(m/(\* \[)/,  $WikiHistory );
my $indexnum = 0;
foreach my $MatchFoundArray (@MatchFoundArray) {
       print "Item", $indexnum++," is $MatchFoundArray\n";
}


I expected to see only two elements printed.
Instead I find:
Quote:
WikiHistory contains:* [Any garbled crap2345n&%]i
Item0 is
Item1 is * [
Item2 is Any garbled crap2345n&%]i

I've spent a lot of time messing with this and searching everywhere, trying to figure it out.

What the heck is going on?

Roy
Oh Duh!
"split produces a list of the things either side of the match. If the match occurs at the very start of the data, then a null field is produced. This is Item0."
I confess I didn't figure it out myself, asked on perlmonks.



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho