51f4 reading multiple lines - Perl Programming
Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
reading multiple lines

Posted by deep (deep), 24 November 2007
Hello Folks,
I want to read one line at a time from File 1, read lines from File2, do some processing and write to a files.
Let me try using an example.
My File 1 looks like :
881.5372        221.3915        4       3       SIFLFKK A=0     C=0     D=0     E=0     F=2     G=0     H=0     I=1     K=2     L=1     M=0     N=0
    P=0     Q=0     R=0     S=1     T=0     V=0     W=0     Y=0
856.4441        286.4886        3       2       TSLFSFR A=0     C=0     D=0     E=0     F=2     G=0     H=0     I=0     K=0     L=1     M=0     N=0
    P=0     Q=0     R=1     S=2     T=1     V=0     W=0     Y=0
File 2 looks like:
7      8      881.5372      221.3915      4      3      XP_001418543.1      SIFLFKK      A=0      C=0      D=0      E=0      F=2      G=0      H=0      I=1      K=2      L=1      M=0      N=0      P=0      Q=0      R=0      S=1      T=0      V=0      W=0      Y=0
10      15      856.4441      286.4886      3      3      XP_001418056.1      DIYYRK      A=0      C=0      D=1      E=0      F=0      G=0      H=0      I=1      K=1      L=0      M=0      N=0      P=0      Q=0      R=1      S=0      T=0      V=0      W=0      Y=2
272      282      1285.6006      322.4073      4      3      XP_001422002.1      REHMVEMGLNA      A=1      C=0      D=0      E=2      F=0      G=1      H=1      I=0      K=0      L=1      M=2      N=1      P=0      Q=0      R=1      S=0      T=0      V=1      W=0      Y=0
38      87      5277.4026      1760.1414      3      2      XP_001420476.1      YSAALVDTNGCYASQTLEVEVSWTCETSTNTAVAAAFIAFAAFCAYSFGR      A=11      C=3      D=1      E=3      F=4      G=2      H=0      I=1      K=0      L=2      M=0      N=2      P=0      Q=1      R=1      S=5      T=6      V=4
     W=1      Y=3
2449      2462      1496.8094      375.2095      4      3      XP_001417092.1      TPQRPGAPVNVSFK      A=1      C=0      D=0      E=0      F=1      G=1      H=0      I=0      K=1      L=0      M=0      N=1      P=3      Q=1      R=1      S=1      T=1      V=2      W=0      Y=0
584      613      3135.4741      1568.7442      2      3      XP_001421161.1      AANMLSWAVNMAATKIGGPDDAHEPVDLQN      A=6      C=0      D=3      E=1      F=0      G=2      H=1      I=1      K=1      L=2      M=2      N=3      P=2      Q=1      R=0      S=1      T=1      V=2      W=1      Y=0
115      170      5985.1843      2993.5993      2      3      YP_636257.1      ELTWITGVIMAVCTVSFGVTGYSLPWDQVGYWAVKIVTGVPDAIPVVGPAIVELL      A=4      C=1      D=2      E=2      F=1      G=6      H=0      I=5      K=1      L=4      M=1      N=0      P=4      Q=1      R=1      S=2      T=5      V=1
1      W=3      Y=2

File1 is  smaller then File2.
I need to do is :
Read first line from File1 do processing on ALL the lines  in FIle2.
read second line and again do some processing on all the lines in FIle2.

I have been able to take 1 line from File1 and one line from File2, do processing, read second line from File 2 and second from File2 (In this case i have made the two files equal lines ).
I have been able to take the first line from File1 and processed All the lines in File2. However, I am just not able to processed any further, basically reading all the line in FIle 1until eof and process all the lines in FIle2.
Below is the code, This is a WORKING CODE that works for one line.   The code is huge as i am doing lots of processing, If any one can point out where I am wrong or can suggest
few things it would be of great help.
Thanks Guys


open INFILE1,"<$File1";
open INFILE2,"<$File2";

while ($line1 = <INFILE1>)
{
     @nonplant_fields = split /\s+/,$line1;
                 

     while ($line2 = <INFILE2>)
     {
           
           
           
           
           @plantfields =  split /\s+/,$line2;
           

     
         $count =0;
           $total_count =0;
           $P_count=0;
           $total_P_count=0;
           $NP_count =0;
           $total_NP_count =0;
       
                       $nonplant_pep_seq = $nonplant_fields[4];
                         $nonplant_pep_seq_length = length ($nonplant_pep_seq);
           
                   
                       $plant_pep_seq = $plantfields[7];
                     
                       $plant_pep_seq_length = length ($plant_pep_seq);
           
           
               
                 DO:            for ($i=5; $i<=24;$i++)
                                          {
                                                $NP_numbers = $nonplant_fields[$i];
                                                  #if ($line2 =~ /([ACDEFGHIKLMNPQRSTVWY]\=\d)+/)
                                           
                                                        if ($NP_numbers =~/\d/)
                                                     {
                                                           $nonplant_AA_count = $&;
                                                     }
                                                     
                    for ($j=$i+3; $j<=27; $j++)
                             {
                                 
                                               if ($plantfields[$j] =~ /\d/)
                                                  {
                                                        $plant_AA_count = $&;
and its goes on for a while...its big script. Hopefully the snap shot can give u an idea what I am doing wrong. Thanks.



Posted by admin (Graham Ellis), 24 November 2007
You need to rewind your second file each time you read a line from the first file, or store the second file in a list and keep traversing it from there.   I've put a sample up at

http://www.wellho.net/mouth/1442_Reading-a-file-multiple-times-file-pointers.html

to demonstrate the principle.

Posted by deep (deep), 24 November 2007
Thanks a lot Graham!! Tried everything, just forgot to reopen the files. Thanks again.

Posted by KevinAD (KevinAD), 24 November 2007
You could also use Tie::File I would think, but I am not sure how efficient it would be. Maybe use seek() too.

Posted by deep (deep), 26 November 2007
Thanks Kevin, I did try using "seek", i think some where I was messing it up. I am wondering would it be possible to use $. to get the current line number instead of "tell". Is it the case when each time I am opening the file, $. will be reset or will remember the last read line from the same file?



Posted by KevinAD (KevinAD), 26 November 2007
on 11/26/07 at 04:51:04, deep wrote:
Thanks Kevin, I did try using "seek", i think some where I was messing it up. I am wondering would it be possible to use $. to get the current line number instead of "tell". Is it the case when each time I am opening the file, $. will be reset or will remember the last read line from the same file?




I am pretty sure $. is reset when a file is opened (or maybe closed). You could try experimenting and see how it behaves.

Posted by admin (Graham Ellis), 26 November 2007
$. is the number of lines read since a file was last opened (hideously none-OO - latest file opening) whereas tell refers to the number of bytes into a file which is what you need for seek

Posted by deep (deep), 26 November 2007
Thanks Kevin,Graham for the suggestion(s).
I am having a bit problem with the script. I keep getting "out of Memory" error. I think I can narrow it down to the way I am processing the file

while ($line2 = <INFILE2>)
     {
           @plantfields = split /\s+/,$line2;
           $plant_prot_name = $plantfields[6];
           
           open INFILE1,"<$File1";
           while ($line1 = <INFILE1>)
           {
                 @nonplant_fields = split /\s+/,$line1;
push (@nonplant_pep_seq,$nonplant_fields[4]);


Can You suggest is some way I can optimize this part. I am looking into using "tell" and "seek". Hopefully it might do the trick.
I have designed it such a way because I am comparing and capturing many sets of data on fly from both the files. I am at a stage of rapid prototyping so at this moment not concentrating much of the performance/optimization. Just trying to get the logic right and see the results.

                 

Posted by admin (Graham Ellis), 27 November 2007
@nonplant_pep_seq  seems to cumulatives, - i.e. building up all the time - but isn't dependent on the first file in any way.  It's the only memory hog I can see that you have, and I don't see the point of doing it loits of times .... unless you are adding other things in there in the code you're not showing us.

Basically, you're blowing up a balloon until it bursts.   Better to start letting the air our into another file.

Posted by deep (deep), 27 November 2007
I see your point Graham, thanks. Later in the code all I am doing is manipulation of the data. I am emptying the array before the next read. Hopefully this will do the job.



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.
fb5b

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2013: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho
0