Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2021 - online Python 3 training - see ((here)).

Our plans were to retire in summer 2020 and see the world, but Coronavirus has lead us into a lot of lockdown programming in Python 3 and PHP 7.
We can now offer tailored online training - small groups, real tutors - works really well for groups of 4 to 14 delegates. Anywhere in the world; course language English.

Please ask about private 'maintenance' training for Python 2, Tcl, Perl, PHP, Lua, etc.
Pattern Matching - a primer on regular Expressions

PATTERN MATCHING (OR HOW TO DO A LOOK LIKE)

You can test a string against a pattern (known as a REGULAR EXPRESSION if you want to say "does this string look like this pattern".

Regular expressions comprise a number of elements (of 6 basic types I'll tell you about in a minute) and are matched from left to right ... the regular expression is compared against the string element by element and if it's still "yes, that matched" when the comparison gets to the end, you have a match.

Using PHP's "ereg" function as an example ...

if (ereg("ham",$teststring)) { ... says look for the string "h", "a", "m" within $teststring, and return a true value if it occurs and a false value if it does not occur. all letters and digits (so including h, a and m) are "literals" - the first of the basic types that you can put in a regular expression

THE SIX BASIC TYPES ARE:
 -> literals
 -> character groups
 -> anchors (a.k.a. zero width assertions)
 -> counts
 -> groupings
 -> alternations
And we'll look at them one by one.

1. Literals. A character specified in the regular expression is matched exactly against the same character in the teststring. All letters and digits that appear in a regular expression (unless within some other type) are literals, as are many of the special characters such as % ! @ & - _ = < > / , : " ' and ; (this is NOT a complete list. If you want other special characters to match exactly, you mus preceed them with a \ (to say "I really want a ...") and remember that you should use single not double quoted strings (PHP) for your regular expression to avoid the double quote operator picking up the backslash!

Example:

if (ereg('@hotmail\.com',$teststring)) { ... will match and perform the block if the $testsring variable contains "@hotmail.com". The \ is needed before the "." as "." is NOT one of the special characters that's taken as a literal. Note that this example WOULD match "rupert@hotmail.com.au" as it contains the required sequence of characters!

2. Character Groups. Written between square brackets, these match one character from $teststring against AND one character from the group. So [aeiouAE] would match a lower case a, e, i, o, u or a capaital A or E. You can use a "-" within a character group to specifiy a range of characters, and use a ^ directly after the [ to match any character EXCEPT the one(s) listed. There are other character groups too (once again, I'm giving you the concept) - note especially that "." matches any one character.

Example:

if (ereg('c[aeiou][^t]',$teststring)) { .... will match
 -> a letter c
 -> a lower case vowel (a, e, i, o or u)
 -> and any character which is NOT a lower case t.
So it WILL match can cog and cup but NOT bog cat cot or cut. It WILL also match acorn as this contains the sequence you're looking for WITHIN the string.

3. Anchors. By default, regular expression matches are made anywhere within the teststring - the previous example match "acorn" for example. If you apply anchor - you use ^ to indicate "start of string" and $ to indicate end of string for example - then you can limit you match to the start or end ... and if you do both, you're specifying a regular expression that matches the whole string.

Example:

if (ereg('^c.t$',$teststring)) { .... will match a string that starts with a c, folled by any other character, followed by a t. And at that point the teststring must match - in other words, test string has to be 3 characters long. This will match cat cot cxt and even c*t. It will NOT match Scot, cats or scattergram.

4. Counts. Each literal, character group (and anchor) that you've seen so far matches once against the teststring. By adding a count AFTER any of these elements, you can specify that you want it to match a different fumber of times. The counts that you'll find used time and time again are:
 ? previous item occurs 0 or 1 times ("perhaps a")
 + previous item occurs 1 or more times ("some")
 * previous item occurs 0 or more times ("perhaps some")

Example:

if (ereg('^https?://',$teststring)) { ... will match a teststring starting with http; that MAY be followed by an "s". Then the following characters (whether of not there was an s) will be ://. As there was no anchor, the match will be successful whatever else follows in the teststring.

5. Groupings. If you want your counts to apply to more than one character, you can use round brackets around the section to which the count applies.

Example:

if (ereg('^https?://(www\.)?wellho.net',$teststring)) { ... will match a test string staring with http:// or https://; that may be followed by www. (either all 4 of those characters or none of them) and it will then be followed by wellho.net.

6. Alternation. The "|" character in a regular expression means "or" over a wider scope than the character grouping - [http][ftp] would match any letter h ot t or p followed by any letter f or t or p, but (http|ftp) would match either "http" or "ftp". Note that it's sensible to group the alternatives with round brackets if you're not sure of how far the | will go.

Example:

if (ereg('^https?://(www\.)?wellho.net(/|$)',$teststring)) { ... will match exactly what the previous example matched ... EXCEPT that it must either be followed by a further /, or end at that point.

I hope those examples help you in your first steps with regular expressions - you are limited only by your imagination in what you can do, and there are many many more elements that I haven't introduced you to within the basic types. We do run a complete course on regular expressions ;-) ...

SOME FURTHER NOTES:

No partial matches - in other words, if a match fails then you get a false back rather than a message to tell you that "it matched but only up to this point".

Different flavours - regular expression handlers and functions come in a number of different flavours; PHP has two of them (ereg which I've used here are preg). At the level I've got to so far, most of the features are common ground.

Language Syntax - different syntax / calling functions are used within regular expressions in different languages.

Case - the examples show above are case sensitive. In PHP, eregi is a case insentitive alternative and other languages also provide a way of ignoring case.

Captures - having matched, you sometimes want to refer to the part of the teststring that matched specific parts of the regular expression. In order to capture part of the incoming string, you should use a set of grouping brackets to indicate the 'interesting bit'. How you can refer back to it later is function / language specific.


See also Regular Expression course details

Please note that articles in this section of our web site were current and correct to the best of our ability when published, but by the nature of our business may go out of date quite quickly. The quoting of a price, contract term or any other information in this area of our website is NOT an offer to supply now on those terms - please check back via our main web site

Related Material

String Handling in PHP
  [31] - ()
  [54] - ()
  [337] - ()
  [422] - ()
  [463] - ()
  [493] - ()
  [558] - ()
  [560] - ()
  [574] - ()
  [589] - ()
  [608] - ()
  [642] - ()
  [716] - ()
  [728] - ()
  [1008] - ()
  [1058] - ()
  [1195] - ()
  [1336] - ()
  [1372] - ()
  [1533] - ()
  [1603] - ()
  [1613] - ()
  [1799] - ()
  [2046] - ()
  [2165] - ()
  [2238] - ()
  [2629] - ()
  [3020] - ()
  [3424] - ()
  [3515] - ()
  [3516] - ()
  [3534] - ()
  [3788] - ()
  [3789] - ()
  [3790] - ()
  [4071] - ()
  [4072] - ()

Additional Python Facilities
  [183] - ()
  [208] - ()
  [239] - ()
  [463] - ()
  [663] - ()
  [672] - ()
  [753] - ()
  [901] - ()
  [1043] - ()
  [1136] - ()
  [1149] - ()
  [1305] - ()
  [1336] - ()
  [1337] - ()
  [1876] - ()
  [2407] - ()
  [2435] - ()
  [2462] - ()
  [2655] - ()
  [2721] - ()
  [2745] - ()
  [2746] - ()
  [2764] - ()
  [2765] - ()
  [2786] - ()
  [2790] - ()
  [3089] - ()
  [3442] - ()
  [3469] - ()
  [4085] - ()
  [4211] - ()
  [4298] - ()
  [4439] - ()
  [4451] - ()
  [4536] - ()
  [4593] - ()
  [4709] - ()

Tcl/Tk - Advanced Regular Expressions
  [943] - ()
  [1195] - ()
  [1305] - ()
  [1336] - ()
  [1410] - ()
  [1412] - ()
  [1613] - ()
  [4205] - ()

Perl - More on Character Strings
  [453] - ()
  [583] - ()
  [586] - ()
  [597] - ()
  [608] - ()
  [737] - ()
  [928] - ()
  [943] - ()
  [1222] - ()
  [1230] - ()
  [1251] - ()
  [1305] - ()
  [1336] - ()
  [1510] - ()
  [1727] - ()
  [1735] - ()
  [1947] - ()
  [2230] - ()
  [2379] - ()
  [2657] - ()
  [2801] - ()
  [2834] - ()
  [2874] - ()
  [2877] - ()
  [2993] - ()
  [3059] - ()
  [3100] - ()
  [3322] - ()
  [3332] - ()
  [3411] - ()
  [3546] - ()
  [3630] - ()
  [3650] - ()
  [3707] - ()
  [3927] - ()
  [4452] - ()

Object Orientation and General technical topics - Regular Expression Elements
  [453] - ()
  [1480] - ()
  [1766] - ()
  [1799] - ()
  [1849] - ()
  [2804] - ()
  [4505] - ()
  [4763] - ()

Object Orientation and General technical topics - What are Regular Expressions?
  [1195] - ()
  [2563] - ()
  [2844] - ()
  [4505] - ()
  [4763] - ()

Ruby - Strings and Regular Expressions
  [970] - ()
  [986] - ()
  [987] - ()
  [1195] - ()
  [1305] - ()
  [1588] - ()
  [1875] - ()
  [1887] - ()
  [1891] - ()
  [2293] - ()
  [2295] - ()
  [2608] - ()
  [2614] - ()
  [2621] - ()
  [2623] - ()
  [2980] - ()
  [3424] - ()
  [3621] - ()
  [3757] - ()
  [3758] - ()
  [4388] - ()
  [4505] - ()
  [4549] - ()

resource index - PHP
Solutions centre home page

You'll find shorter technical items at The Horse's Mouth and delegate's questions answered at the Opentalk forum.

At Well House Consultants, we provide training courses on subjects such as Ruby, Lua, Perl, Python, Linux, C, C++, Tcl/Tk, Tomcat, PHP and MySQL. We're asked (and answer) many questions, and answers to those which are of general interest are published in this area of our site.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2021: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/solutions/php-patt ... sions.html • PAGE BUILT: Wed Mar 28 07:47:11 2012 • BUILD SYSTEM: wizard