Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Importing a regular expression from a file

Posted by meanroy (meanroy), 7 August 2006
I have a script which is used to remove spam from a Wiki.
The collection of spam phrases (and Spammer urls)has grown unwieldy.
Here's a 'snippit' to show how I use the regex.
     $_ = $returned_text;
     # check for spam      
     # match upper or lower case
     if (/\.ro                    #anything from Romania
           |\.ru                             # anything from russia
           |<div\ style              # anyone trying to make a div
           |hot-girls                    # porn site
            /ix) {
           # Rewrite the page without the spam

I've not been able to figure out how to keep the banned phrases and URL's in a file and import them.

I found this reference from Perl Regular Expressions - detailed manual but it doesn't seem quite what I need.
            One or more embedded pattern-match modifiers.
            This is particularly useful for dynamic pat­
            terns, such as those read in from a configura­
            tion file, read in as an argument, are specified
            in a table somewhere, etc.  Consider the case
            that some of which want to be case sensitive and
            some do not.  The case insensitive ones need to
            include merely `(?i)' at the front of the pat­
            tern.      For example:

                $pattern = "foobar";
                if ( /$pattern/i ) { }

                # more flexible:

                $pattern = "(?i)foobar";

I may be missing some trivial way to do this but I can't seem to find it.
Do you have any suggestion?

It looks like the perlfaq6 manpage may help though I'm having trouble with putting more than one expression in the file.
The example
   chomp($pattern = <STDIN>);
   if ($line =~ /$pattern/) { }

only does one line. I'll keep hacking but if you could clarify this I'd appreciate it.

Posted by admin (Graham Ellis), 8 August 2006
I think you're getting there.  I suggest 1 RE per line, read in and chomp each in turn.

Posted by meanroy (meanroy), 8 August 2006
Yes, that seems to work but at the expense of using a lot of time reading the file. (relatively)
Maybe I should bring the file into a Hash and sequence through the Hash or into an array of strings and step through the array?

- edit -
I've gone back to the "Owl" book, Mastering Regular Expressions, since after skipping lightly past this particular little problem, I'm up against more regex madness.
After I've figured out things a bit I'll post some code. (I hope!)

Posted by meanroy (meanroy), 10 August 2006
I've been able to get a file of regex to work somewhat, though I have more to do, but can't seem to get past this pecularity:
use warnings;
use strict;
my @MatchFoundArray =  "";
my $WikiHistory ="* [Any garbled crap2345n&%]i";
print "WikiHistory contains:$WikiHistory\n";   # make sure I know whats in there
@MatchFoundArray = split(m/(\* \[)/,  $WikiHistory );
my $indexnum = 0;
foreach my $MatchFoundArray (@MatchFoundArray) {
       print "Item", $indexnum++," is $MatchFoundArray\n";

I expected to see only two elements printed.
Instead I find:
WikiHistory contains:* [Any garbled crap2345n&%]i
Item0 is
Item1 is * [
Item2 is Any garbled crap2345n&%]i

I've spent a lot of time messing with this and searching everywhere, trying to figure it out.

What the heck is going on?

Oh Duh!
"split produces a list of the things either side of the match. If the match occurs at the very start of the data, then a null field is produced. This is Item0."
I confess I didn't figure it out myself, asked on perlmonks.

This page is a thread posted to the opentalk forum at and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: • WEB: • SKYPE: wellho