Importing a regular expression from a file

Posted by meanroy (meanroy), 7 August 2006

I have a script which is used to remove spam from a Wiki.
The collection of spam phrases (and Spammer urls)has grown unwieldy.
Here's a 'snippit' to show how I use the regex.
Code:

$_ = $returned_text;
# check for spam
# match upper or lower case
if (/\.ro #anything from Romania
|\.ru # anything from russia
|<div\ style # anyone trying to make a div
|hot-girls # porn site
/ix) {
# Rewrite the page without the spam
}

I've not been able to figure out how to keep the banned phrases and URL's in a file and import them.

I found this reference from Perl Regular Expressions - detailed manual but it doesn't seem quite what I need.
Quote:

`(?imsx-imsx)'
One or more embedded pattern-match modifiers.
This is particularly useful for dynamic pat
terns, such as those read in from a configura
tion file, read in as an argument, are specified
in a table somewhere, etc. Consider the case
that some of which want to be case sensitive and
some do not. The case insensitive ones need to
include merely `(?i)' at the front of the pat
tern. For example:

$pattern = "foobar";
if ( /$pattern/i ) { }

# more flexible:

$pattern = "(?i)foobar";

I may be missing some trivial way to do this but I can't seem to find it.
Do you have any suggestion?

Roy.
===========
Oopsie...
It looks like the perlfaq6 manpage may help though I'm having trouble with putting more than one expression in the file.
The example
Code:

chomp($pattern = <STDIN>);
if ($line =~ /$pattern/) { }

only does one line. I'll keep hacking but if you could clarify this I'd appreciate it.

Posted by admin (Graham Ellis), 8 August 2006

I think you're getting there. I suggest 1 RE per line, read in and chomp each in turn.

Posted by meanroy (meanroy), 8 August 2006

Yes, that seems to work but at the expense of using a lot of time reading the file. (relatively)
Maybe I should bring the file into a Hash and sequence through the Hash or into an array of strings and step through the array?
Roy

- edit -
I've gone back to the "Owl" book, Mastering Regular Expressions, since after skipping lightly past this particular little problem, I'm up against more regex madness.
After I've figured out things a bit I'll post some code. (I hope!)

Posted by meanroy (meanroy), 10 August 2006

I've been able to get a file of regex to work somewhat, though I have more to do, but can't seem to get past this pecularity:
Quote:

#!perl
# regextrysplit.pl
use warnings;
use strict;
my @MatchFoundArray = "";
my $WikiHistory ="* [Any garbled crap2345n&%]i";
print "WikiHistory contains:$WikiHistory\n"; # make sure I know whats in there
@MatchFoundArray = split(m/(\* \[)/, $WikiHistory );
my $indexnum = 0;
foreach my $MatchFoundArray (@MatchFoundArray) {
print "Item", $indexnum++," is $MatchFoundArray\n";
}

I expected to see only two elements printed.
Instead I find:
Quote:

WikiHistory contains:* [Any garbled crap2345n&%]i
Item0 is
Item1 is * [
Item2 is Any garbled crap2345n&%]i

I've spent a lot of time messing with this and searching everywhere, trying to figure it out.

What the heck is going on?

Roy
Oh Duh!
"split produces a list of the things either side of the match. If the match occurs at the very start of the data, then a null field is produced. This is Item0."
I confess I didn't figure it out myself, asked on perlmonks.

This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.