Ruby - a teaching example showing many of the language features in short but useful program
Although the main publicity and driver for the Ruby language has been the Rails web framework (see
previous article here), it's an excellent data manipulation language too - with many of the short and efficient coding techniques that you would have available to you in Perl, yet additionally with an object oriented design that's neat and easy built in from its conception, making it easy to code and - importantly - easy to debug and maintain.
On the course that's just finished, my delegates were going to be using Ruby primarily within Rails, but also for some substantial data manipulation work away from the web. That's a good approach, as it allows them to reuse code across the two different (overnight batch and web interactive) environments. And it's also excellent for data monging - reformatting and filtering, every offer short tasks and in some environments, tasks that will change in what they need to do.
Here's a question that I set my delegates:
"Given a file of staff members names and their skills - such as
morris Perl Java PHP Tcl/Tk
nigel PHP Python Java Perl
orpheus MySQL Ruby,Tcl/Tk XML
peter PHP Java Perl
produce a sorted list of skills, with a list of team members names alongside that skiil - here's a part of the report:
Python: adam barry harry hazel ken leane nigel olivia rupert
Ruby: barbara charles cherry ed florence hazel ivan jenny kerry
len margaret nina orpheus petra que
If you want to try this question for yourself before I do a step by step answer, you may download the data from [here]
Solution
Open the file, creating a file handle object. Each time you run the
gets method on this file handle object, you'll get the next line back and you'll get back a
nil when you have run out of data:
fh = File.new "requests.xyz"
We'll need a table to store our skills, and alongside each skill a list of people with that skill. We'll use a hash, as that's a key / value pair table in Ruby, where the key can be just about anything.
Hash.net would create a new, empty hash. However, there's a shorthand for that which is
{}, and that's what I've used in my example:
langs = {}
We want to loop through all the data in the incoming file, so I've written a line of code which sets up a loop that keeps reading lines of data while they're available, using the file handle already opened. Once we get a
nil back, the loop will exit.
while fh.gets
You'll notice that I've not explicitly saved the line that was read into a variable. That's because I'm using a feature sometimes known as
topicalisation. In many circumstances, where you don't give Ruby an explicit variable name, it will
assume you mean a special global variable called
$_. So in this case, each line in turn is read into that variable.
All lines read will end with a new line, which you'll want to remove. You could use
chop or
chomp. These methods both run on a string object ... and if you don't specify that object, they'll run on
$_ instead. So
chomp!
removes the last character from
$_.
chomp returns a new string - but you can use the alternative
chomp! as I have done, which alters the incoming string object in situ - i.e. I'm altering the value in
$_ by removing any new line on the end of it.
My data file has a series of space delimited fields on each line, but if you look at the raw data carefully you'll find that there are multiple spaces sometimes, and sometimes a comma appears as well as, or instead of, space. We'll deal with that by splitting (our string in
$_ as we've not given an object on which the method is to run) at a "regular expression" - i.e. at a pattern:
fields = split(/[, ]+/)
Regular expressions are usually written between slashes, and contain a number of elements each of which is followed by a count of the number of time that element is to occur. In this example,
[, ] specifies that we're looking for a comma or a space (it's a character group when you come to learn about regular expressions), and
+ specifies that we're looking for one or more occurrences of that group. Finally,
split returns an array of strings that we've saved into the variable called fields.
Shifting an array returns the first element to us, and moves all the others up. So:
name = fields.shift
will strip off the first element (that's the staff member's name) and save it into a new variable, reducing the length of the array by one. Perfect!
We're left with the fields array containing purely the list of skills (languages) and so we can loop through that list of skills in turn. This is the
for loop - taking each member of an array into a separate variable in turn (strictly, copying a reference to the object in each member of the array):
for lang in fields
We're going to build up our table of languages in the hash we created at the top of our program. If the current language is one we've just found for the first time in this run of the program, we need to create an empty array within the hash:
langs[lang] ||= []
We're using the "lazy or" operator
||,and setting up and assigning an empty array object into the hash element in question, thus creating that element. Languages such as Perl will usually assume an empty array (ok - it's called a
list in Perl!) in such a circumstance through what's known as
autovivification, but in Ruby you have to specifically create an object before you can modify it. To some extent, that's a side effect of the underlying object oriented basis of Ruby, but it also provides very practical assistance during the development phase of programs where variable name mis-spellings and failures to initialise are quickly rooted out.
Now that we've ensured that there is a member of the hash for the language skill of the currently named person, we can simply add their name onto the end of that list.
langs[lang].push name
No need to look at a count of how many names there are already, as the
push method simply says "add onto the end".
And that's about it for the code to set up the hash of lists. All that remains is for us to close the loop through the skills for each person:
end
and to close the loop that reads in all the lines of the file.
end
The purist will be asking "should we now close the input file?". Maybe, but Ruby will do it for us automatically when we exit the program after a few more lines anyway.
Having read in all the lines, let's output our new report, language by language.
We want to take the languages in some sort of order - and alphabetic seems appropriate. However, a hash always appears jumbled up to the human eye and cannot be sorted (I spend a few minutes showing you why on our
Ruby courses). So we'll use the
keys method to give us an array of the keys. This
can be sorted, so we'll run the
sort method on it, and we'll run the following loop with each member of the resulting array in turn:
for lang in langs.keys.sort
Within each member of my hash, I have an array of people's names, and I want those names sorted in alphabetic order too. In this case, I can sort them
in situ using
sort! rather than
sort (see the pattern here - like
chomp and
chomp! earlier):
langs[lang].sort!
And having sorted I can now output my results. And it turns out that the formatting I've decided to do makes this the longest line of the program:
puts "#{lang}: #{langs[lang].join " "}".gsub(/(.{60,}?)\s/,"\\1\n ")
So - what's all that about? I'm outputting the language, followed by a list of the names of all the people who know the language, that list being constructed from an array of names using
join. The result, though, can be a very long line indeed. So I've used a regular expression to find the first white space character after the 60th character, and replace it with a new line and some space to split the line up. Because I used
gsub rather than
sub, the substitution is then repeated after the next block of 60 characters, and so on until the whole string has been divided up in this way. If we just split every 60 characters, this would be messy with names being split between lines, but by looking forward for the next space we're generating really neat output.
And, finally, we need to close our output loop:
end
That's a very long explanation of a short piece of code.
"Could I write code like that?" you may ask. In time, and with practise, most people can. Once you're trained, skilled and experienced, code such as this only takes a few minutes to write in Ruby. Indeed, I'll go so far as to say that it's unlikely it could be quicker in another language. It would be significantly slower in Java and much slower in C, but then very large Java systems will probably be more maintainable in their future, and C programs can always run faster if you invest enough time into writing them.
Here's the final code:
fh = File.new "requests.xyz"
langs = {}
while fh.gets
chomp!
fields = split(/[, ]+/)
name = fields.shift
for lang in fields
langs[lang] ||= []
langs[lang].push name
end
end
for lang in langs.keys.sort
langs[lang].sort!
puts "#{lang}: #{langs[lang].join " "}".gsub(/(.{60,}?)\s/,"\\1\n ")
end
Complete source code (without the comments!) and full sample output also available
[here].
(written 2012-06-09, updated 2012-06-16)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
R110 - Ruby - Special Variables and Pseudo-Variables [990] Ruby - Totally Topical - (2006-12-16)
[1586] Variable types in Ruby - (2008-03-21)
[1587] Some Ruby programming examples from our course - (2008-03-21)
[1891] Ruby to access web services - (2008-11-16)
[2296] Variable scope - what is it, and how does it Ruby? - (2009-07-18)
[2613] Constants in Ruby - (2010-02-01)
[2623] Object Oriented Ruby - new examples - (2010-02-03)
[4502] Reading and parsing a JSON object in Ruby - (2015-06-01)
[4682] One line scripts - Awk, Perl and Ruby - (2016-05-20)
R109 - Ruby - Strings and Regular Expressions [970] String duplication - x in Perl, * in Python and Ruby - (2006-12-07)
[986] puts - opposite of chomp in Ruby - (2006-12-15)
[987] Ruby v Perl - interpollating variables - (2006-12-15)
[1195] Regular Express Primer - (2007-05-20)
[1305] Regular expressions made easy - building from components - (2007-08-16)
[1588] String interpretation in Ruby - (2008-03-21)
[1875] What are exceptions - Python based answer - (2008-11-08)
[1887] Ruby Programming Course - Saturday and Sunday - (2008-11-16)
[2293] Regular Expressions in Ruby - (2009-07-16)
[2295] The dog is not in trouble - (2009-07-17)
[2608] Search and replace in Ruby - Ruby Regular Expressions - (2010-01-31)
[2614] Neatly formatting results into a table - (2010-02-01)
[2621] Ruby collections and strings - some new examples - (2010-02-03)
[2980] Ruby - examples of regular expressions, inheritance and polymorphism - (2010-10-02)
[3424] Divide 10000 by 17. Do you get 588.235294117647, 588.24 or 588? - Ruby and PHP - (2011-09-08)
[3621] Matching regular expressions, and substitutions, in Ruby - (2012-02-23)
[3758] Ruby - standard operators are overloaded. Perl - they are not - (2012-06-09)
[4388] Global Regular Expression matching in Ruby (using scan) - (2015-01-08)
[4505] Regular Expressions for the petrified - in Ruby - (2015-06-03)
[4549] Clarrissa-Marybelle - too long to really fit? - (2015-10-23)
R107 - Collections (Arrays and Hashes) in Ruby [991] Adding a member to a Hash in Ruby - (2006-12-16)
[2291] Collection objects (array and hash) in Ruby - (2009-07-16)
[2606] Sorting arrays and hashes in Ruby - (2010-01-30)
[2618] What are Ruby Symbols? - (2010-02-02)
[2976] Creating, extending, traversing and combining Ruby arrays - (2010-09-30)
[3253] Is this number between? Does this list include? - Ruby - (2011-04-18)
[3255] Process every member of an array, and sort an array - Ruby - (2011-04-21)
[3257] All possible combinations from a list (Python) or array (Ruby) - (2011-04-23)
[3435] Sorta sorting a hash, and what if an exception is NOT thrown - Ruby - (2011-09-12)
[4368] Shuffling a list - Ruby and Python - (2014-12-28)
[4499] Significant work - beyond helloworld in Ruby - (2015-05-27)
Some other Articles
Melksham - placed 2254 out of 2255. What can be done about it?Why you should use objects even for short data manipulation programs in RubyThe five oldest blogs and the horses mouthRuby - a teaching example showing many of the language features in short but useful programRuby on Rails - how it flows, and where the files goCruising on the Mersey Ferry?Eyes Wide OpenAdding a passcode to a directoryMelksham Visitors Map - Bus routes and train lines to and from the town