Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
extracting from an ascii file

Posted by ali (ali), 1 April 2004

I need your help for the following. ia have a file in this form:

i15  4.5  p  o16 3.5  p  i15  5.5  p  o15  8.5  p  2.0

I need to change it into this from:

i15   4.5   p  2.0
o16  3.5   p  2.0
i15   5.5   p  2.0
o15  8.5   p  2.0

using some form of regex I coud get only
i15 4.5 p 2.0

all other occurences are ignored although I used the g option in search and replacement pattern. My file has lots of lines in this form with different number of fields. For each line I only get the first occurence. I appreciate your help.


Posted by admin (Graham Ellis), 2 April 2004
I would write something like

@info = ($incoming =~ /([io]\d+\s+\S+)/g);
$pval = ($incoming =~ /(p\s+\d\S+)/);
foreach $io (@info) {
       print "$io $pval\n";

I'm guessing a bit there about your incoming data - you've only posted one sample line of data and so it's hard to see what the pattern is in general, so you may need to modify it.

First line - gets all the I or O fields followed by another field
Second line - gets the P value (only interested in the P that's followed by a number)
loop - adds the second line match onto the end of all the first line matches

Posted by ali (ali), 3 April 2004
Hi Graham,

Thanks for your reply. here is a few lines of the actual input:

o16z 27.72 3   p i13z 28.08 1   p o13z 27.37 4   p i16z 28.07 2   p i12z 28.35 2
  p 25.630                      
o16z 29.40 2   s i13z 29.99 1   s o13z 29.91 2   s i16z 9.17 4   s i12z 30.34 2
 s 25.630                      
o16z 59.70 1   p i20z 60.63 0   p i16z 60.87 1   p i18z 60.88 0   p i13z 61.01 0
  p i12z 61.25 0   p i19z 61.69 1   p 58.490      
o16z 61.07 1   s i20z 62.55 1   s i16z 62.88 1   s i18z 0.00 4   s i13z 63.27 2
 s i12z 63.59 1   s i19z 64.37 2   s 58.490

those are 4 lines of the file. In every other line there is "s" instead of p.

I tried to do what you told me to do. But I could'nt get any output. If the name of the file is "input" for example, how I should read the file in scalar context to use @info =($incoming =~ ..... ) ?
thanks again.

Posted by John_Moylan (jfp), 3 April 2004
Hello Ali

This seems to work, though your second post was not too clear to my wine riddled brain  

I've used __DATA__ for the example code and just put it all in an array and foreached each line.

my @dataArray = <DATA>;

# The regex seems to have a recurring pattern
# So lets write the pattern once and times it by 4
my $regex = '([io]\d\d\s{1,3}\d\.\d)\s+?[ps]\s+?' x4;

foreach my $line (@dataArray) {


   # This bit seems to be common at the end, so lets grab the last 6 chars first to simplify things
   # make sure there's no trailing whitespace though... perhaps that regex would be better
   my $lastBit = substr($line, -6);

   my ($first, $second, $third, $fourth) = $line =~ m/^$regex/;

   # You print/format it how you like, this is to prove it works
   print "$first $lastBit\n$second $lastBit\n$third $lastBit\n$fourth $lastBit\n\n";
i15  4.5  p  o16 3.5  p  i15  5.5  p  o15  8.5  p  2.0
i15  4.5  p  o16 3.5  s  i15  5.5  p  o15  8.5  s  2.0

i15  4.5 p  2.0
o16 3.5 p  2.0
i15  5.5 p  2.0
o15  8.5 p  2.0

i15  4.5 s  2.0
o16 3.5 s  2.0
i15  5.5 s  2.0
o15  8.5 s  2.0

I think the regex is probably hugely inelegant (Graham, comments please) and should really have an anchor at the start ^ and finish $ to speed things up.

note: I've made an edit...not tested, hope it still works

Posted by admin (Graham Ellis), 4 April 2004
Hi, folks ... I'm still a little concerned that I don't really understand the data.  Ali - the second example you provide has "z" characters which aren't there in the first example - I'm guessing that you do want these retained in the output?  I'm also guessing that the line that starts with a # is a comment? Jfp - your sample answer looks fine, save for the fact that we don't really know how many fields there are per line (it may be 4, it may be 5, it may vary) so some global match procedure may be a better option and we can then loop throught the matches.

Anyway - I've doctored my original program to suit the new data format and here it is, complete with reading in of the data:

while ($incoming = <DATA>) {
@info = ($incoming =~ /([io]\d+z\s+\S+)/g);
($pval) = ($incoming =~ /([ps]\s+\d\S+)/);
foreach $io (@info) {
       print "$io $pval\n";
o16z 27.72 3   p i13z 28.08 1  p o13z 27.37 4  p i16z 28.07 2  p i12z 28.35 2 p 25.630
o16z 29.40 2  s i13z 29.99 1  s o13z 29.91 2  s i16z 9.17 4  s i12z 30.34 2  s 25.630
o16z 59.70 1  p i20z 60.63 0  p i16z 60.87 1  p i18z 60.88 0  p i13z 61.01 0 p i12z 61.25 0  p i19z 61.69 1  p 58.490
o16z 61.07 1  s i20z 62.55 1  s i16z 62.88 1  s i18z 0.00 4  s i13z 63.27 2  s i12z 63.59 1  s i19z 64.37 2  s 58.490

Which gives the following output

o16z 27.72 p 25.630
i13z 28.08 p 25.630
o13z 27.37 p 25.630
i16z 28.07 p 25.630
i12z 28.35 p 25.630
o16z 29.40 s 25.630
i13z 29.99 s 25.630
o13z 29.91 s 25.630
i16z 9.17 s 25.630
i12z 30.34 s 25.630
o16z 59.70 p 58.490
i20z 60.63 p 58.490
i16z 60.87 p 58.490
i18z 60.88 p 58.490
i13z 61.01 p 58.490
i12z 61.25 p 58.490
i19z 61.69 p 58.490
o16z 61.07 s 58.490
i20z 62.55 s 58.490
i16z 62.88 s 58.490
i18z 0.00 s 58.490
i13z 63.27 s 58.490
i12z 63.59 s 58.490
i19z 64.37 s 58.490

As regard to reading from the file into a scalar, you need to open the file on a file handle and read in with a loop such as my "while" loop - I have just embedded the data in the program in this case to give you a start.   If you're (a) needing to use Perl to do some quite complex data monging and (b) not familiar with the basics such as reading files, have you considered a training course?   I'm quite happy to provide answers here, but it could be a very long and frustrating process.  Have a look at - next course runs mid-April!

Posted by ali (ali), 5 April 2004
Hi Graham and jfp,

Thanks for your replies.
Indeed Graham is right. Number of fields are different in lines. And I want to save the lines with #1010 or #...
unchanged. JFP program worked for a fixed number of fields. I edited the Grahams code and it worked perfect:

while (<>) {

while (/(\s+[ps]\s+).*([ps]\s+\d\S+)/g ) {
s/$p/ $q\n/;
print $_;     # also prints the lines which did not matched


I used your nice patterns in my program. I put the file name on the command line and run the code.

And Graham,

In fact I studied the book "learning pearl" by Schwartz and I can do simple programs. The other day I was a little confused. But uasually I spend a lot of time to do a
simple program.

Thanks again for your help.

This page is a thread posted to the opentalk forum at and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: • WEB: • SKYPE: wellho