« Index of Pictures | Main | Just ******* Google it »

November 25, 2006

Matching within multiline strings, and ignoring case in regular expressions

Regular Expressions are powerful matching tools and you can specify almost anything within them. But there are certain facilities that are naturally applied to the regular expression as a whole rather than to parts of the match, and there are specified in a different way in each language / implementation.

For example, in what is commonly known as multiline mode you may want to match not only at the start / end of the string as a whole, but also match at embedded new lines. You can specify multiline mode as follows:

In Tcl, using the -lineanchor option
In Perl, with the /m modifier on the end of your regex
In Python by adding re.M or re.MULTILINE to your compile

Here's an example, in Tcl, looking for embedded lined containing just ABC:

set samples [list "Hello world\nABC\nThis matches" \
"Another test\nABCD\nNo match" ]

foreach sample $samples {
puts [regexp -lineanchor {^ABC$} $sample]
}


ther facilities often added onto your regular expression as modifiers include:

a) The ability to have "." (the dot) match any character at all, and not to exclude the newline character which it does by default. Sometimes known as single line of linestop mode.

In Tcl, leave off the -linestop option
In Perl, add /s
In python, add re.DOTALL onto the compile

b) The ability to ignore case in the match

In Perl, /i
In Python, re.I or re.IGNORECASE
In Tcl, use (?i through ) in the regex

c) The ability to add white space as comments into your expression

In Perl, /x
In Python, re.VERBOSE
In Tcl, use (?X through ) in the regex

Posted by gje at November 25, 2006 05:48 AM

Comments

Post a comment




Remember Me?


Well House Consultants Ltd. Copyright 2008