| |||||||||||
| |||||||||||
Say what you mean in regular expressions Posted by admin (Graham Ellis), 6 November 2002 You need to be accurate with regular expressions and consider all eventualities. Here's a sample perl program that I received (together with a message inviting me to "run them on your own systems, adapt them to meet your requirements"):Code:
Now this is actually a teaching example, so I'm guessing that the prompt is supposed to accurately reflect the pattern being looked for ... but ... Expression: /^[yY].*/ Said to be looking for: a word beginning with y or Y Actually matches: Any input that starts with a y - a word, a sentence, just the letter Y ... Expression: /^[0-9][0-9]$/ Said to be looking for: a two-digit number: Actually matches: a two digit number - CORRECT (ish) - but an unsigned 2 digit number, so you can't enter -54 for example Expression: /^[^0-9].*/ Said to be looking for: a string not starting with a number: Actually matches: A string starting with a non-numeric character. This will fail to match a blank line, and yet a blank line doesn't start with a number. Expression: /^[0-9]*$/ Said to be looking for: any number Actually matches: A blank line, or any whole positive number provided that there isn't a "+" sign on the front of it. Quite apart from the unexpected results you'll get from three out of the four regular expressions that were being used to teach the subject, there are some style issues: 1. What is the point in ending some of these expressions with .* - none that I can see; it's redundant 2. Using \d or [[:digit:]] is preferable to using [0-9], as 0-9 relies on the fact that the digit characters happen to come in sequence in the ASCII set. What's going to happen if our trainee's using unicodes for his real work. Other questions come to my mind too; I might use the i modifier rather than state [yY] if I was recoding the exercise. I might follow each ^ anchor with \s* and preceed each $ with \s*, and I might take the repeated [0-9] element and follow it with a {2} count. Then again, the whole thing screams for a subroutine - but to be fair to the original author, the course it comes from probably hasn't covered subroutines by this point. This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |