Ruby Regular Expressions

If you want to say does this string "look like" another, without being able to give an explicit string for the 'another', then you're probably looking for a regular expression.


Regular expressions come in various flavours, and the flavour used in Ruby is "Perl Style" (i.e. it's similar to Perl, Python and the preg functions in PHP, and differs from Tcl and the ereg functions in PHP).

In summary, a regular expression comprises elements such as:

Literals - exact matches
 - letters, digits and some special characters match exactly
 - special characters can be matched when preceded by a \

Character groups - any one character from a selection
 - [abcyh] any character from the list
 - [^abcyh] any character not in the list
 - [A-Z0-4] any capital letter of digit 0 through 4
 - \s \d \w any space, digit or word character
 - \S \D \W any non-space, non-digit or non-word character
 - . any character at all (may not match \n)

Counts - apply to previous literal, character or group
 - {2,6} 2 to 6 occurrences of previous item
 - {4,} 4 or more occurrences of previous item
 - {5} exactly 5 occurrences of previous item
 - + 1 or more of previous item
 - * 0 or more of previous item
 - ? 0 or 1 of previous item

Anchors (a.k.a. Zero width assertions)
 - ^ match here at start of string or line
 - $ match here at end of string or line
 - \b match here at word boundary

 - ( .... ) grouping for capture and counting
 - | alternation; "either / or"
 - \1 \2 references back to previous groups.

There are more options - but those are the common ones.


The =~ operator is the 'match' operator, so I can ask if something looks like a regular expression. The index method on a string will recognise a regular expression and use it to separate. The variable $& contains the matched string, and $1, $2 etc contain matched capture groups.


places = ["Training in Melksham and elsewhere",
   "We are at SN12 6QL (HQ) and SN12 7NY (training centre)",
   "And can train you at even if you're at HS7 5LZ"]

# Matching to see whether or not it fits the pattern
places.each do |place|
        if place =~ /\b[A-Z]{1,2}\d\w?\s+\d[A-Z]{2}\b/
                puts %Q!There's a postcode in "#{place}"!

# Making use of the matched string
places.each do |place|
        if place =~ /\b([A-Z]{1,2}\d\w?)\s+\d[A-Z]{2}\b/
                puts %Q!We found #{$&} sorted via #{$1}!

# More careful extraction - global matching to regular
# expressions is not brilliant except in very recent
# releases, but the index method on string does very well
places.each do |place|
sf = 0
while sfn = place.index(/\b(([A-Z]{1,2}\d\w?)\s+\d[A-Z]{2})\b/,sf)
                sf = sfn+1
                puts %Q!We found #{$&}!

Here is the result of running that program:

earth-wind-and-fire:~/ruby/r109 grahamellis$ ruby rex1.rb
There's a postcode in "We are at SN12 6QL (HQ) and SN12 7NY (tr ..."
There's a postcode in "And can train you at even if you're at HS7 5LZ"
We found SN12 6QL sorted via SN12
We found HS7 5LZ sorted via HS7
We found SN12 6QL
We found SN12 7NY
We found HS7 5LZ
earth-wind-and-fire:~/ruby/r109 grahamellis$

Ruby also allows you to produce compiled regular expression objects, and matchdata objects, and use those for more efficient and more sophisticated matching. And methods such as split, too, can use regular expressions.

DATA.read.each_line do |host|
        print host
        stuff = host.split(/[\s,]+/)
        ip = stuff.shift
        stuff.each do |name|
                print "#{ip} may be called #{name}\n"

__END__ earth fire, sea pickle wind blows


earth-wind-and-fire:~/ruby/r109 grahamellis$ ruby rex2.rb earth may be called earth fire, sea pickle may be called fire may be called sea may be called pickle wind blows may be called wind may be called blows
earth-wind-and-fire:~/ruby/r109 grahamellis$

Although we've been using / delimiters for regular expressions, you can if you prefer user %r! through ! (or replace the ! with any other special character) in much the same was as %Q and %q

See also Programming in Ruby - Course

Please note that articles in this section of our web site were current and correct to the best of our ability when published, but by the nature of our business may go out of date quite quickly. The quoting of a price, contract term or any other information in this area of our website is NOT an offer to supply now on those terms - please check back via our main web site

