Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
Say what you mean in regular expressions

Posted by admin (Graham Ellis), 6 November 2002
You need to be accurate with regular expressions and consider all eventualities.  Here's a sample perl program that I received (together with a message inviting me to "run them on your own systems, adapt them to meet your requirements"):

Code:
#!/bin/perl -w
# Regular expression examples

print "Please enter a word beginning with y or Y: ";
$answer = <STDIN>;
chop $answer;
if ($answer =~ /^[yY].*/)  {
       print "Correct!\n";
} else {
       print "Incorrect!\n";
}

print "Please enter a two-digit number: ";
chop ($answer = <STDIN>);
if ($answer =~ /^[0-9][0-9]$/) {
       print "Correct!\n";
} else {
       print "Incorrect!\n";
}

print "Please enter a string not starting with a number: ";
chop ($answer = <STDIN>);
if ($answer =~ /^[^0-9].*/)  {
               print "Correct!\n";
} else {
       print "Incorrect!\n";
}

print "Please enter any number: ";
chop ($answer = <STDIN>);
if ($answer =~ /^[0-9]*$/)  {
               print "Correct!\n";
} else {
       print "Incorrect!\n";
}


Now this is actually a teaching example, so I'm guessing that the prompt is supposed to accurately reflect the pattern being looked for ... but ...

Expression: /^[yY].*/
Said to be looking for: a word beginning with y or Y
Actually matches: Any input that starts with a y - a word, a sentence, just the letter Y ...

Expression: /^[0-9][0-9]$/
Said to be looking for: a two-digit number:
Actually matches: a two digit number - CORRECT (ish) -  but an unsigned 2 digit number, so you can't enter -54 for example

Expression: /^[^0-9].*/
Said to be looking for: a string not starting with a number:
Actually matches: A string starting with a non-numeric character. This will fail to match a blank line, and yet a blank line doesn't start with a number.

Expression: /^[0-9]*$/
Said to be looking for: any number
Actually matches: A blank line, or any whole positive number provided that there isn't a "+" sign on the front of it.

Quite apart from the unexpected results you'll get from three out of the four regular expressions that were being used to teach the subject, there are some style issues:

1. What is the point in ending some of these expressions with .* - none that I can see; it's redundant

2. Using \d or [[:digit:]] is preferable to using [0-9], as 0-9 relies on the fact that the digit characters happen to come in sequence in the ASCII set.  What's going to happen if our trainee's using unicodes for his real work.

Other questions come to my mind too; I might use the i modifier rather than state [yY] if I was recoding the exercise.  I might follow each ^ anchor with \s* and preceed each $ with \s*, and I might take the repeated [0-9] element and follow it with a {2} count.  Then again, the whole thing screams for a subroutine - but to be fair to the original author, the course it  comes from probably hasn't covered subroutines by this point.





This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho