Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
Reading multiple lines after a pattern match

Posted by deep (deep), 11 July 2009
Hello People,
Need some assistance/guidance.
OUTLINE:
Two files (File1 and File2)
File1 has some ids such as
009463_3922_1827
897654_8764_5432
File2 has things along the lines of:
Query= 009463_3922_1827 length=252
        (252 letters)

Sequences producing significant alignments:                      (bits) Value

ref|NZ_ACCL02000008.1| Bryantella formatexigens DSM 14469 B_form...   153   2e-37

Query: 243   cccgcacacg 252
                     |||||||||
Sbjct: 89219 accgcacacg 89228
More stufff here

Query= 009525_3967_2963 length=249 uaccno=FIFOXZ216JYL81
        (249 letters)
AND MORE STUFF HERE
-----------
PROBLEM:
Capture/finding the Ids stored in File1 from file2 is trivial.
What I need to capture "also" is the remaining part.
For example:
This part of the code gives me the line when it has found the match: Query= 009463_3922_1827 length=252 uaccno=FIFOXZ216JUM5H
while ($line2=<INFILE2>)
       {

               if ($line2 =~ /$line1/)
               {
                       print $line2;
               }

Now how can I get to the other lines below this (Query= 009463_3922_1827 length=252 uaccno=FIFOXZ216JUM5H) line.
For example, everything until  
Query= 009525_3967_2963 length=249 uaccno=FIFOXZ216JYL81
        (249 letters)
>>>>>>>>>>>>>>>
a) Few ideas I can think of is using SEEK/tell.
Will this be a efficient way, how much to SEEK, the while loop is reading one line at a time so, some how buffer everything until see the pattern  as Query=.....
How to find the bytes until then?

b) Using read()
How to find the number of byes after the pattern match?

c)Using the metacharacters to read ahead after the pattern match /ID (?=SOMETHING)/
Tried this but with until, but its not working. May be my regex is incorrect.


If any one can just be a push in the write direction--pseudocode etc. it would be much appreciated.

I am not reading the files or going to use array (copying the contents of a file to an array) as the files are big.

Posted by admin (Graham Ellis), 19 July 2009
How long is file2?   I would tend to read the whole thing into a string and split it at the query delimiter, making up a hash of all the queries.

Here's some code:

Code:
open (FH,"deep.txt");
read(FH,$all,-s "deep.txt");

@parts = split(/(Query = \d+)/,$all);

foreach $element(@parts) {
     print $ec++,":::",$element,"\n";
     }


and in converts this data:

Code:
Title
Query = 1234 this is more
and so we go one etc
Query detaile etc
more stuff
Query = 7763 a further demo
lines of stuff
we keep on going
Query = 9966 last one
See how it goes
It's GONE!


Into a list of keys and values:

Code:
Dorothy-2:jul09 grahamellis$ perl deep.pl
0:::Title

1:::Query = 1234
2::: this is more
and so we go one etc
Query detaile etc
more stuff

3:::Query = 7763
4::: a further demo
lines of stuff
we keep on going

5:::Query = 9966
6::: last one
See how it goes
It's GONE!







This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho