Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
 
Pattern Matching - a primer on regular Expressions

PATTERN MATCHING (OR HOW TO DO A LOOK LIKE)

You can test a string against a pattern (known as a REGULAR EXPRESSION if you want to say "does this string look like this pattern".

Regular expressions comprise a number of elements (of 6 basic types I'll tell you about in a minute) and are matched from left to right ... the regular expression is compared against the string element by element and if it's still "yes, that matched" when the comparison gets to the end, you have a match.

Using PHP's "ereg" function as an example ...

if (ereg("ham",$teststring)) { ... says look for the string "h", "a", "m" within $teststring, and return a true value if it occurs and a false value if it does not occur. all letters and digits (so including h, a and m) are "literals" - the first of the basic types that you can put in a regular expression

THE SIX BASIC TYPES ARE:
 -> literals
 -> character groups
 -> anchors (a.k.a. zero width assertions)
 -> counts
 -> groupings
 -> alternations
And we'll look at them one by one.

1. Literals. A character specified in the regular expression is matched exactly against the same character in the teststring. All letters and digits that appear in a regular expression (unless within some other type) are literals, as are many of the special characters such as % ! @ & - _ = < > / , : " ' and ; (this is NOT a complete list. If you want other special characters to match exactly, you mus preceed them with a \ (to say "I really want a ...") and remember that you should use single not double quoted strings (PHP) for your regular expression to avoid the double quote operator picking up the backslash!

Example:

if (ereg('@hotmail\.com',$teststring)) { ... will match and perform the block if the $testsring variable contains "@hotmail.com". The \ is needed before the "." as "." is NOT one of the special characters that's taken as a literal. Note that this example WOULD match "rupert@hotmail.com.au" as it contains the required sequence of characters!

2. Character Groups. Written between square brackets, these match one character from $teststring against AND one character from the group. So [aeiouAE] would match a lower case a, e, i, o, u or a capaital A or E. You can use a "-" within a character group to specifiy a range of characters, and use a ^ directly after the [ to match any character EXCEPT the one(s) listed. There are other character groups too (once again, I'm giving you the concept) - note especially that "." matches any one character.

Example:

if (ereg('c[aeiou][^t]',$teststring)) { .... will match
 -> a letter c
 -> a lower case vowel (a, e, i, o or u)
 -> and any character which is NOT a lower case t.
So it WILL match can cog and cup but NOT bog cat cot or cut. It WILL also match acorn as this contains the sequence you're looking for WITHIN the string.

3. Anchors. By default, regular expression matches are made anywhere within the teststring - the previous example match "acorn" for example. If you apply anchor - you use ^ to indicate "start of string" and $ to indicate end of string for example - then you can limit you match to the start or end ... and if you do both, you're specifying a regular expression that matches the whole string.

Example:

if (ereg('^c.t$',$teststring)) { .... will match a string that starts with a c, folled by any other character, followed by a t. And at that point the teststring must match - in other words, test string has to be 3 characters long. This will match cat cot cxt and even c*t. It will NOT match Scot, cats or scattergram.

4. Counts. Each literal, character group (and anchor) that you've seen so far matches once against the teststring. By adding a count AFTER any of these elements, you can specify that you want it to match a different fumber of times. The counts that you'll find used time and time again are:
 ? previous item occurs 0 or 1 times ("perhaps a")
 + previous item occurs 1 or more times ("some")
 * previous item occurs 0 or more times ("perhaps some")

Example:

if (ereg('^https?://',$teststring)) { ... will match a teststring starting with http; that MAY be followed by an "s". Then the following characters (whether of not there was an s) will be ://. As there was no anchor, the match will be successful whatever else follows in the teststring.

5. Groupings. If you want your counts to apply to more than one character, you can use round brackets around the section to which the count applies.

Example:

if (ereg('^https?://(www\.)?wellho.net',$teststring)) { ... will match a test string staring with http:// or https://; that may be followed by www. (either all 4 of those characters or none of them) and it will then be followed by wellho.net.

6. Alternation. The "|" character in a regular expression means "or" over a wider scope than the character grouping - [http][ftp] would match any letter h ot t or p followed by any letter f or t or p, but (http|ftp) would match either "http" or "ftp". Note that it's sensible to group the alternatives with round brackets if you're not sure of how far the | will go.

Example:

if (ereg('^https?://(www\.)?wellho.net(/|$)',$teststring)) { ... will match exactly what the previous example matched ... EXCEPT that it must either be followed by a further /, or end at that point.

I hope those examples help you in your first steps with regular expressions - you are limited only by your imagination in what you can do, and there are many many more elements that I haven't introduced you to within the basic types. We do run a complete course on regular expressions ;-) ...

SOME FURTHER NOTES:

No partial matches - in other words, if a match fails then you get a false back rather than a message to tell you that "it matched but only up to this point".

Different flavours - regular expression handlers and functions come in a number of different flavours; PHP has two of them (ereg which I've used here are preg). At the level I've got to so far, most of the features are common ground.

Language Syntax - different syntax / calling functions are used within regular expressions in different languages.

Case - the examples show above are case sensitive. In PHP, eregi is a case insentitive alternative and other languages also provide a way of ignoring case.

Captures - having matched, you sometimes want to refer to the part of the teststring that matched specific parts of the regular expression. In order to capture part of the incoming string, you should use a set of grouping brackets to indicate the 'interesting bit'. How you can refer back to it later is function / language specific.


See also Regular Expression course details

Please note that articles in this section of our web site were current and correct to the best of our ability when published, but by the nature of our business may go out of date quite quickly. The quoting of a price, contract term or any other information in this area of our website is NOT an offer to supply now on those terms - please check back via our main web site

Related Material

String Handling in PHP
  [4072] Splitting the difference with PHP - (2013-04-27)
  [4071] Setting up strings in PHP - (2013-04-27)
  [3790] Solution looking for a problem? Lookahead and Lookbehind - (2012-06-30)
  [3789] More than just matching with a regular expression in PHP - (2012-06-30)
  [3788] Getting more than a yes / no answer from a regular expression pattern match - (2012-06-30)
  [3534] Learning to program in PHP - Regular Expression and Associative Array examples - (2011-12-01)
  [3516] Regular Expression modifiers in PHP - summary table - (2011-11-12)
  [3515] PHP - moving from ereg to preg for regular expressions - (2011-11-11)
  [3424] Divide 10000 by 17. Do you get 588.235294117647, 588.24 or 588? - Ruby and PHP - (2011-09-08)
  [3020] Handling (expanding) tabs in PHP - (2010-10-29)
  [2629] Curly braces within double quoted strings in PHP - (2010-02-09)
  [2238] Handling nasty characters - Perl, PHP, Python, Tcl, Lua - (2009-06-14)
  [2165] Making Regular Expressions easy to read and maintain - (2009-05-10)
  [2046] Finding variations on a surname - (2009-02-17)
  [1799] Regular Expressions in PHP - (2008-09-16)
  [1613] Regular expression for 6 digits OR 25 digits - (2008-04-16)
  [1603] Do not SHOUT and do not whisper - (2008-04-06)
  [1533] Short and sweet and sticky - PHP form input - (2008-02-06)
  [1372] A taster PHP expression ... - (2007-09-30)
  [1336] Ignore case in Regular Expression - (2007-09-08)
  [1195] Regular Express Primer - (2007-05-20)
  [1058] PHP Regular expression to extrtact link and text - (2007-01-31)
  [1008] Date conversion - PHP - (2006-12-26)
  [728] Looking ahead and behind in a Regular Expression - (2006-05-22)
  [716] Evaluating arithmetic expressions in configuration files - (2006-05-10)
  [642] How similar are two words - (2006-03-11)
  [608] Don't expose your regular expressions - (2006-02-15)
  [589] Robust PHP user inputs - (2006-02-03)
  [574] PHP - dividing a string up into pieces - (2006-01-23)
  [560] The fencepost problem - (2006-01-10)
  [558] Converting between acres and hectares - (2006-01-08)
  [493] Running a Perl script within a PHP page - (2005-11-12)
  [463] Splitting the difference - (2005-10-13)
  [422] PHP Magic Quotes - (2005-08-22)
  [337] the array returned by preg_match_all - (2005-06-06)
  [54] PHP and natural sorting - (2004-09-19)
  [31] Here documents - (2004-08-28)

Additional Python Facilities
  [4709] Some gems from Intermediate Python - (2016-10-30)
  [4593] Command line parameter handling in Python via the argparse module - (2015-12-08)
  [4536] Json load from URL, recursive display, Python 3.4 - (2015-10-14)
  [4451] Running an operating system command from your Python program - the new way with the subprocess module - (2015-03-06)
  [4439] Json is the new marshall, pickle and cPickle / Python - (2015-02-22)
  [4298] Python - an interesting application - (2014-09-18)
  [4211] Handling JSON in Python (and a csv, marshall and pickle comparison) - (2013-11-16)
  [4085] JSON from Python - first principles, easy example - (2013-05-13)
  [3469] Teaching dilemma - old tricks and techniques, or recent enhancements? - (2011-10-08)
  [3442] A demonstration of how many Python facilities work together - (2011-09-16)
  [3089] Python regular expressions - repeating, splitting, lookahead and lookbehind - (2010-12-17)
  [2790] Joining a MySQL table from within a Python program - (2010-06-02)
  [2786] Factory methods and SqLite in use in a Python teaching example - (2010-05-29)
  [2765] Running operating system commands from your Python program - (2010-05-14)
  [2764] Python decorators - your own, staticmethod and classmethod - (2010-05-14)
  [2746] Model - View - Controller demo, Sqlite - Python 3 - Qt4 - (2010-04-29)
  [2745] Connecting Python to sqlite and MySQL databases - (2010-04-28)
  [2721] Regular Expressions in Python - (2010-04-14)
  [2655] Python - what is going on around me? - (2010-02-28)
  [2462] Python - how it saves on compile time - (2009-10-20)
  [2435] Serialization - storing and reloading objects - (2009-10-04)
  [2407] Testing code in Python - doctest, unittest and others - (2009-09-16)
  [1876] Python Regular Expressions - (2008-11-08)
  [1337] A series of tyre damages - (2007-09-08)
  [1336] Ignore case in Regular Expression - (2007-09-08)
  [1305] Regular expressions made easy - building from components - (2007-08-16)
  [1149] Turning objects into something you can store - Pickling (Python) - (2007-04-15)
  [1136] Buffering output - why it is done and issues raised in Tcl, Perl, Python and PHP - (2007-04-06)
  [1043] Sending an email from Python - (2007-01-18)
  [901] Python - listing out the contents of all variables - (2006-10-21)
  [753] Python 3000 - the next generation - (2006-06-09)
  [672] Keeping your regular expressions simple - (2006-04-05)
  [663] Python to MySQL - (2006-03-31)
  [463] Splitting the difference - (2005-10-13)
  [239] What and why for the epoch - (2005-03-08)
  [208] Examples - Gadfly, NI Number, and Tcl to C interface - (2005-02-10)
  [183] The elegance of Python - (2005-01-19)

Tcl/Tk - Advanced Regular Expressions
  [4205] Regular Expression Substitution - Tcl - (2013-11-12)
  [1613] Regular expression for 6 digits OR 25 digits - (2008-04-16)
  [1412] Sparse and Greedy matching - Tcl 8.4 - (2007-10-27)
  [1410] Tcl / regsub - changing a string and using interesting bits - (2007-10-27)
  [1336] Ignore case in Regular Expression - (2007-09-08)
  [1305] Regular expressions made easy - building from components - (2007-08-16)
  [1195] Regular Express Primer - (2007-05-20)
  [943] Matching within multiline strings, and ignoring case in regular expressions - (2006-11-25)

Perl - More on Character Strings
  [4452] Binary data handling - Python and Perl - (2015-03-09)
  [3927] First match or all matches? Perl Regular Expressions - (2012-11-19)
  [3707] Converting codons via Amino Acids to Proteins in Perl - (2012-04-25)
  [3650] Possessive Regular Expression Matching - Perl, Objective C and some other languages - (2012-03-12)
  [3630] Serialsing and unserialising data for storage and transfer in Perl - (2012-02-28)
  [3546] The difference between dot (a.k.a. full stop, period) and comma in Perl - (2011-12-09)
  [3411] Single and double quotes strings in Perl - what is the difference? - (2011-08-30)
  [3332] DNA to Amino Acid - a sample Perl script - (2011-06-24)
  [3322] How much has Perl (and other languages) changed? - (2011-06-10)
  [3100] Looking ahead and behind in Regular Expressions - double matching - (2010-12-23)
  [3059] Object Orientation in an hour and other Perl Lectures - (2010-11-18)
  [2993] Arrays v Lists - what is the difference, why use one or the other - (2010-10-10)
  [2877] Further more advanced Perl examples - (2010-07-19)
  [2874] Unpacking a Perl string into a list - (2010-07-16)
  [2834] Teaching examples in Perl - third and final part - (2010-06-27)
  [2801] Binary data handling with unpack in Perl - (2010-06-10)
  [2657] Want to do a big batch edit? Nothing beats Perl! - (2010-03-01)
  [2379] Making variables persistant, pretending a database is a variable and other Perl tricks - (2009-08-27)
  [2230] Running a piece of code is like drinking a pint of beer - (2009-06-11)
  [1947] Perl substitute - the e modifier - (2008-12-16)
  [1735] Finding words and work boundaries (MySQL, Perl, PHP) - (2008-08-03)
  [1727] Equality and looks like tests - Perl - (2008-07-29)
  [1510] Handling Binary data (.gif file example) in Perl - (2008-01-17)
  [1336] Ignore case in Regular Expression - (2007-09-08)
  [1305] Regular expressions made easy - building from components - (2007-08-16)
  [1251] Substitute operator / modifiers in Perl - (2007-06-28)
  [1230] Commenting a Perl Regular Expression - (2007-06-12)
  [1222] Perl, the substitute operator s - (2007-06-08)
  [943] Matching within multiline strings, and ignoring case in regular expressions - (2006-11-25)
  [928] C++ and Perl - why did they do it THAT way? - (2006-11-16)
  [737] Coloured text in a terminal from Perl - (2006-05-29)
  [608] Don't expose your regular expressions - (2006-02-15)
  [597] Storing a regular expression in a perl variable - (2006-02-09)
  [586] Perl Regular Expressions - finding the position and length of the match - (2006-02-02)
  [583] Remember to process blank lines - (2006-01-31)
  [453] Commenting Perl regular expressions - (2005-09-30)

Object Orientation and General technical topics - Regular Expression Elements
  [4763] Regex Reference sheet - (2017-10-10)
  [4505] Regular Expressions for the petrified - in Ruby - (2015-06-03)
  [2804] Regular Expression Myths - (2010-06-13)
  [1849] String matching in Perl with Regular Expressions - (2008-10-20)
  [1799] Regular Expressions in PHP - (2008-09-16)
  [1766] Diagrams to show you how - Tomcat, Java, PHP - (2008-08-22)
  [1480] Next course - 7th January 2008, Regular Expressions - (2007-12-21)
  [453] Commenting Perl regular expressions - (2005-09-30)

Object Orientation and General technical topics - What are Regular Expressions?
  [4763] Regex Reference sheet - (2017-10-10)
  [4505] Regular Expressions for the petrified - in Ruby - (2015-06-03)
  [2844] Learning about Regular Expressions in C through examples - (2010-06-30)
  [2563] Efficient debugging of regular expressions - (2010-01-04)
  [1195] Regular Express Primer - (2007-05-20)

Ruby - Strings and Regular Expressions
  [4549] Clarrissa-Marybelle - too long to really fit? - (2015-10-23)
  [4505] Regular Expressions for the petrified - in Ruby - (2015-06-03)
  [4388] Global Regular Expression matching in Ruby (using scan) - (2015-01-08)
  [3758] Ruby - standard operators are overloaded. Perl - they are not - (2012-06-09)
  [3757] Ruby - a teaching example showing many of the language features in short but useful program - (2012-06-09)
  [3621] Matching regular expressions, and substitutions, in Ruby - (2012-02-23)
  [3424] Divide 10000 by 17. Do you get 588.235294117647, 588.24 or 588? - Ruby and PHP - (2011-09-08)
  [2980] Ruby - examples of regular expressions, inheritance and polymorphism - (2010-10-02)
  [2623] Object Oriented Ruby - new examples - (2010-02-03)
  [2621] Ruby collections and strings - some new examples - (2010-02-03)
  [2614] Neatly formatting results into a table - (2010-02-01)
  [2608] Search and replace in Ruby - Ruby Regular Expressions - (2010-01-31)
  [2295] The dog is not in trouble - (2009-07-17)
  [2293] Regular Expressions in Ruby - (2009-07-16)
  [1891] Ruby to access web services - (2008-11-16)
  [1887] Ruby Programming Course - Saturday and Sunday - (2008-11-16)
  [1875] What are exceptions - Python based answer - (2008-11-08)
  [1588] String interpretation in Ruby - (2008-03-21)
  [1305] Regular expressions made easy - building from components - (2007-08-16)
  [1195] Regular Express Primer - (2007-05-20)
  [987] Ruby v Perl - interpollating variables - (2006-12-15)
  [986] puts - opposite of chomp in Ruby - (2006-12-15)
  [970] String duplication - x in Perl, * in Python and Ruby - (2006-12-07)

resource index - PHP
Solutions centre home page

You'll find shorter technical items at The Horse's Mouth and delegate's questions answered at the Opentalk forum.

At Well House Consultants, we provide training courses on subjects such as Ruby, Lua, Perl, Python, Linux, C, C++, Tcl/Tk, Tomcat, PHP and MySQL. We're asked (and answer) many questions, and answers to those which are of general interest are published in this area of our site.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2019: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01225 708225 • FAX: 01225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/solutions/php-patt ... sions.html • PAGE BUILT: Wed Mar 28 07:47:11 2012 • BUILD SYSTEM: wizard