| |||||||||||
| |||||||||||
Reading multiple lines after a pattern match Posted by deep (deep), 11 July 2009 Hello People,Need some assistance/guidance. OUTLINE: Two files (File1 and File2) File1 has some ids such as 009463_3922_1827 897654_8764_5432 File2 has things along the lines of: Query= 009463_3922_1827 length=252 (252 letters) Sequences producing significant alignments: (bits) Value ref|NZ_ACCL02000008.1| Bryantella formatexigens DSM 14469 B_form... 153 2e-37 Query: 243 cccgcacacg 252 ||||||||| Sbjct: 89219 accgcacacg 89228 More stufff here Query= 009525_3967_2963 length=249 uaccno=FIFOXZ216JYL81 (249 letters) AND MORE STUFF HERE ----------- PROBLEM: Capture/finding the Ids stored in File1 from file2 is trivial. What I need to capture "also" is the remaining part. For example: This part of the code gives me the line when it has found the match: Query= 009463_3922_1827 length=252 uaccno=FIFOXZ216JUM5H while ($line2=<INFILE2>) { if ($line2 =~ /$line1/) { print $line2; } Now how can I get to the other lines below this (Query= 009463_3922_1827 length=252 uaccno=FIFOXZ216JUM5H) line. For example, everything until Query= 009525_3967_2963 length=249 uaccno=FIFOXZ216JYL81 (249 letters) >>>>>>>>>>>>>>> a) Few ideas I can think of is using SEEK/tell. Will this be a efficient way, how much to SEEK, the while loop is reading one line at a time so, some how buffer everything until see the pattern as Query=..... How to find the bytes until then? b) Using read() How to find the number of byes after the pattern match? c)Using the metacharacters to read ahead after the pattern match /ID (?=SOMETHING)/ Tried this but with until, but its not working. May be my regex is incorrect. If any one can just be a push in the write direction--pseudocode etc. it would be much appreciated. I am not reading the files or going to use array (copying the contents of a file to an array) as the files are big. Posted by admin (Graham Ellis), 19 July 2009 How long is file2? I would tend to read the whole thing into a string and split it at the query delimiter, making up a hash of all the queries.Here's some code: Code:
and in converts this data: Code:
Into a list of keys and values: Code:
This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |