Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
Python and Tcl - public course schedule [here]
Private courses on your site - see [here]
Please ask about maintenance training for Perl, PHP, Lua, etc
Shell - Grep - Sed - Awk - Perl - Python - which to use when?

Last week, I found myself teaching a Multi-vendor Advanced Unix Data Tools and Techniques course as a guest presenter. The tools that the course 'majored' on are grep, sed, awk and Perl ... being an advanced course, some knowledge was assumed ahead of time, and so reference was made to other utilities with an assumption of fundamental knowledge (cut, sort, and head came up amongst others), and discussion in class also extended to Python. Shells covered were Bash and Ksh.

The objective of the course was to familiarise delegates with the advanced features of the data handling tools, so that they could make maximum use of those tools thereafter in their work - typically in the handling and filtering of large flows of data. And the $64,000 question that umbrellas over the whole couse is "which tool should we use when?"

Let's go through the tools, one at a time, and see how it fits in to a pattern ...

Grep is an excellent data filter. Taking incoming flows of information (typically records which are new line delimited), grep gives you the ability to look within each line for a pattern, in isolation from other lines. Results can simply be output, or a count can be output (-c option) , or non-matching lines can be output (-v option). Line numbers can be reported, multiple input files can be handled, etc. A flexible tool that can perform literal matches (fgrep), basic regular expression matches (grep) and extended regular expression matches (egrep), but cannot itself perform inline edits, substitutions, or look over line number ranges or multiple lines. That's where you move on to ...

Sed. A stream editor - in other words, a tool which takes an incoming flow of data, just like grep, but has rather more capability in what it can do with each line. Like grep, sed can select lines based on them matching a regular expression, but it can also select based on line numbers, and on ranges of lines.
  sed -n 3,/AL/p railstats.xyz
says "output only from the third line to the first line thereafter that contains AL". And sed, while it's running, can look ahead to the following line if you have data with continuation lines for example, and it can even store data into and recover it from a "hold buffer". As well as just outputting lines / records, sed can edit lines, substituting the matching section of a line with something else, perhaps a literal string of perhaps something that's based on the incoming match via a backreference. Se even has a labelling, looping and conditional check capability within its commands. But are they the best tool for more complex edits? That's where you move on to ...

Awk. Like sed, Awk takes an incoming flow of data and looks at each line to see if it matches patterns, at lines by number, and in the case of awk it can also be on all sorts of other criteria too. Each line that matches an awk pattern can have a whole block of code run on it, with conditionals, loops, and variables in the classic "programming" way; the "K" in the name awK is the initial of Brian Kernighan - co-author of C with Dennis Richie, so it's no co-incidence that awk has similaries to (and the powers of) C, and with C being very much a mainstream language it makes awk's syntax look very familiar indeed to most programmers. In awk programs, you can even have arrays and user defined functions. Here's an example of awk in use - finding all lines with "Lake" in the 7th field of a data file, and reporting a number of the fields:
    wizard:graham graham$ awk 'BEGIN{count=0;FS="\t";OFS="|";};
        $7 ~ /Lake/{count++;print NR,count,$2,$3,$12,$7};
        END{OFS=" ";print "Matched",count,"of",NR}' railstats.xyz
    86|1|TLK|B94 5SE|10884|The Lakes
    1026|2|LAK|IP27 9AD|536|Lakenheath
    1238|3|OXN|LA9 7HG|350292|Oxenholme Lake District
    1792|4|LKE|PO36 8PJ|67162|Lake
    Matched 4 of 2539
    wizard:graham graham$

Awk programs can be stored into files so that you don't have to type it all onto the command line every time. But they are inherrantly line by line processing based, and lack mainstream facilities to pull in central library code that you want to use in lots of different scripts. That's where you move on to ...

Perl. The "Practical Extraction and Reporting Language". Perl's a very feature-rich programming language indeed, with a very easy to run interface - in other words, you can simply put your script into a file and say "go run this Perl program" without worrying about compile and load cycles or anything like that. You can put your Perl program on the command line (-e option), but you'll rarely do so. Code is very commonly shared between Perl scripts / Perl programs via use statements, with a very complete library structure allowing access to common code that's distributed with Perl, you own code that you want to share between yor Perl programs, and a central resource (the CPAN) of code modules that other people have written and shared. However, Perl is so feature rich that it's not easy to learn, and it's often very hard to read code that's been written by others - especially by others for whom maintainable programming isn't a passion. You can do just about anything in computing / data terms with Perl. But if you want to work in a team, each maintaining each other's code and / or with longer term projects on the same data, you may want to go for something that's "object oriented" through and through. That's where you move on to ...

Python. The "Advanced Unix Tools" course had only an appendix in the notes on Python, and that's fair enough because learning Python requires a complete course; we've moved so far from grep after all. Python's a superb data tool, and much more. If you're working with the same data but manipulating it in different ways, if you're working in a team, then it should be a very serious candidate to be your "tool of choice". You'll notice that this paragraph is not ending with "that's where you move on". You don't move on - for the most complex of data manipulation tasks, Python is my tool of choice.
(written 2012-10-22, updated 2012-10-23)

Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
A051 - Web Application Deployment - Linux - General
  [4259] Upgrading our training systems to all the current stable versions - (2014-04-07)
  [3219] How do I become a Linux System Administrator? - (2011-03-28)
  [2035] 1234567890 ... coming up on Friday 13th - (2009-02-11)
  [2023] sw_vers - what version of OSX am I running? - (2009-02-03)

A166 - Web Application Deployment - Linux Utilities
  [4682] One line scripts - Awk, Perl and Ruby - (2016-05-20)
  [4586] Extending your bash shell with aliases, functions and extra commands - (2015-11-28)
  [3764] Shell, Awk, Perl of Python? - (2012-06-14)
  [3446] Awk v Perl - (2011-09-18)
  [2638] Finding what has changed - Linux / Unix - (2010-02-17)
  [2484] Finding text and what surrounds it - contextual grep - (2009-10-30)
  [2320] Helping new arrivals find out about source code examples - (2009-08-03)
  [2145] Using the internet to remotely check for power failure at home (PHP) - (2009-04-29)
  [1690] Conversion of c/r line ends to l/f line ends - (2008-06-28)
  [1366] awk - a powerful data extraction and manipulation tool - (2007-09-25)
  [1361] Korn shell course - (2007-09-22)
  [71] Comparators in Linux and Unix - (2004-10-03)
  [63] Almost like old times - (2004-09-26)

P050 - Perl - General
  [4301] Perl - still a very effective language indeed for extracting and reporting - (2014-09-20)
  [4296] Polishing the Perl courses - updated training - (2014-09-17)
  [3911] How well do you know Perl and / or Python? - (2012-11-04)
  [3823] Know Python or PHP? Want to learn Perl too? - (2012-07-31)
  [3407] Perl - a quick reminder and revision. Test yourself! - (2011-08-26)
  [3332] DNA to Amino Acid - a sample Perl script - (2011-06-24)
  [3322] How much has Perl (and other languages) changed? - (2011-06-10)
  [3093] How many toilet rolls - hotel inventory and useage - (2010-12-18)
  [2971] Should the public sector compete with businesses? and other deep questions - (2010-09-26)
  [2825] Perl course - is it tailored to Linux, or Microsoft Windows? - (2010-06-25)
  [2783] The Perl Survey - (2010-05-27)
  [2736] Perl Course FAQ - (2010-04-23)
  [2504] Learning to program in ... - (2009-11-15)
  [2374] Lead characters on Perl variable names - (2009-08-24)
  [2242] So what is this thing called Perl that I keep harping on about? - (2009-06-15)
  [2228] Where do I start when writing a program? - (2009-06-11)
  [1897] Keeping on an even keel - (2008-11-21)
  [1750] Glorious (?) 12th August - what a Pe(a)rl! - (2008-08-12)
  [743] How to debug a Perl program - (2006-06-04)
  [400] New in the shops - (2005-08-01)
  [116] The next generation of programmer - (2004-11-13)

Y050 - Python - General
  [4712] A reminder of the key issues to consider in moving from Python 2 to Python 3 - (2016-10-30)
  [4656] Identifying the first and last records in a sequence - (2016-02-26)
  [4558] Well House Consultants - Python courses / what's special. - (2015-10-28)
  [4434] Public training courses - upcoming dates - (2015-02-21)
  [4408] Additional Python courses added to our schedule - (2015-01-29)
  [4295] A longer Python ... training course - (2014-09-16)
  [4236] Using Python to analyse last years forum logs. Good coding practise discussion. - (2014-01-01)
  [3935] Whether you have programmed before or not, we can teach you Python - (2012-11-25)
  [3903] Python Programming class for delegates who have already self-taught the basics - (2012-10-25)
  [3816] Want to escape the Olympics? Learn to program in the countryside! - (2012-07-23)
  [3798] When you should use Object Orientation even in a short program - Python example - (2012-07-06)
  [3519] Python - current versions and implementations (CPython, Jython, IronPython etc) - (2011-11-13)
  [3489] Python courses and Private courses - gently updating our product to keep it ahead of the game - (2011-10-20)
  [3463] Busy weekend of contrasts. - (2011-10-03)
  [3076] Python through the Snow - (2010-12-01)
  [2822] Python training courses for use with ESRI ArcMap software - (2010-06-23)
  [2778] Learning to program in Python 2 ... and / or in Python 3 - (2010-05-24)
  [2394] Two days of demonstration scripts in Python - (2009-09-05)
  [2367] Learning to program - how to jump the first hurdles - (2009-08-20)
  [2285] Great new diagrams for our notes ... Python releases - (2009-07-13)
  [2227] Learning PHP, Ruby, Lua and Python - upcoming courses - (2009-06-11)
  [2020] Learning Python - many new example programs - (2009-01-31)
  [2017] Python - a truly dynamic language - (2009-01-30)
  [16] Python training - (2004-08-16)

Back to
How much parking should there be at Melksham Campus?
Previous and next
Horse's mouth home
Forward to
Python Programming class for delegates who have already self-taught the basics
Some other Articles
Taking the lead, not the dog, for a walk.
How should we choose our Wiltshire Police and Crime Commissioner?
Want to help us improve transport in Wiltshire? Here is how!
Shell - Grep - Sed - Awk - Perl - Python - which to use when?
How much parking should there be at Melksham Campus?
The Xxxxx Guest House in Xxxxxxxxxxx - my stay reviewed
Father Christmas to be on train in Melksham
The course must go on - improvements to tutor travel plans, with immediate effect
Autumn scenes from Melksham
4759 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96 at 50 posts per page

This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

1 unpublished comment pending on this page

edit your own (not yet published) comments

© WELL HOUSE CONSULTANTS LTD., 2019: 404 The Spa • Melksham, Wiltshire • United Kingdom • SN12 6QL
PH: 01225 708225 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/3902_She ... when-.html • PAGE BUILT: Sat May 27 16:49:10 2017 • BUILD SYSTEM: WomanWithCat