merging two files - Perl Programming

Posted by ali (ali), 12 April 2004

Hi Graham,

I am done with the first part of the problem. However I got another problem. I want to merge two files into one file. From the first one I want to kepp all the lines. For the second line I just need to keep selected lines.

FILE1

#1001
i12z ---------- p
o12z----------p
#1004
i12z------------p
i13z------------p
i14z------------s
i15z------------s
#1010

FILE2
1990 1 2 ------------------ 999
1990 1 2 -------------------1000
1990 1 2 --------------------1001
1990 1 2 --------------------1002
1990 1 2 --------------------1003
1990 1 2 ---------------------1004

the merged file should look like this

1990 1 2 --------------------1001
i12z ---------- p
o12z----------p
1990 1 2 ---------------------1004
i12z------------p
i13z------------p
i14z------------s
i15z------------s
...

I opened both file for reading and a merged file for writing. Here is what I did:

my ($file1, $file2) = @ARGV;

open (FILE1, $file1) or die "Can't open $file1: $!\n";
open (FILE2, $file2) or die "Can't open $file2: $!\n";
open (MERGE, ">merged") or die "Can't open merged file: $!\n";

my $line1 = <FILE1>;
my $line2 = <FILE2>;

while (defined ($line1) || defined ($line2)) {
if ($line1 =~ /^\#(\d+)\s+/) {
$tmp=$1;
$line1=<FILE1>;
}

if ($line1 =~ /^[io]\d+/){
print MERGE $line1;
$line1 = <FILE1>;
next;
}

@array= split /\s/, $line2;
$last= $array[$#array];
if ($last != $tmp) {
$line2 =<FILE2>;
next;
}
else {
print MERGE $line2;
$line2=<FILE2>;
}
}

----------
the problem is that it only prints the lines from FILE1.
I don't know if using next is correct or not in this form.
Please help me. Thanks, Ali

Posted by admin (Graham Ellis), 13 April 2004

Try starting with something like:

Code:

open (FH1,$ARGV[0]) or die "Can't open $ARGV[0]: $!\n";
open (FH2,$ARGV[0]) or die "Can't open $ARGV[0]: $!\n";

while (<FH2>) {
/(\d+)$/;
$f2{$1}=$_;
}

while (<FH1>) {
(/#(\d+)/) and print $f2{$1};
print;
}

I've syntax checked that, but not run it ... I'm not exactly sure of your specification as you say that you want every line from input file one on the output, and yet your sample shows only lines that start with an i or an o from file 1 on the output, so I'm noty sure what you're really trying to achieve. Anyway - please feel free to try my code and alter it as necessary.

To answer your question about next ... it will jump you up to the while statement and re-test the conditions.

Some further comments on your code

a) Typically, if you're reading in a file and want every line from it to appear in your output, you'll only read from that file handle in one place in your code. It's a common mistake to put in several reads ... it's usually wrong because each time you do you get the NEXT line and that means that data tends to get skipped over.

b) If the second file contains a number of keyed lines that you want to merge in, it's a much better technique to read them into a hash that you can then look up. rather than trying to read the second file until you get to the matching line. Using the "read until" technique, all the keys you need must be present and must be present in the right order so the technique is very prone to data errors, and the coding is much more complicated too. You would need to add a loop (at least) to your original code in this area if you have to do it this way for some reason

Ali, I'm going to suggest further study once again. I look at your programs (and I read your comment elsewhere that you take a long time to get simple programs running) and they're shouting out for training course - type help. I know I have a vested interest in this - I spend my paid time training people and looking at programs, so I may be a bit biased - BUT you won't find me making this suggestion to too many people. From looking at your programs / work, you would benefit particularly from this approach. I do understand that you're probable not able based in our part of the world and might need to find somethng more local.

This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.