| |||||||||||
| |||||||||||
how to get jus the dna sequnce Posted by revtopo (revtopo), 19 July 2007 hi all there,i have been tryin through bioperl to get the DNA sequence from EMBL(a database) through a set of codes. though i get with most of the DNA sequence there are some accesion ids which produce the entire genomic DNA sequence fromt he list. do any one have an odea of how to do with this. the code that extracts the DNA sequence is: sub parsing { my ($uni_acc,$embl_xref,$line,@lines,@embl_xref); while(<>) { $uni_rec = $_; @lines = split/\n/,$uni_rec; foreach $line (@lines) { if ($line =~ /^AC\s+(\w+)/) { $uni_acc = $1; } elsif ($line =~ /^DR\s+EMBL;\s+(\w+)/) { $embl_xref = $1; #print"$uni_acc\t $embl_xref\n"; push (@embl_xref, $embl_xref); #print "@embl_xref"; next; } } } return (@embl_xref); } sub getdata { my @embl_xref = &parsing; $srs -> get_set_chunk_size(20);# to get the records in cnvenient numbers. $srs->get_records_with_accessions #to get recordds corresponding to accesion numbers. ( -db => 'embl', -AccNumbers => \@embl_xref, -file => 'embl_data.dbi' ); return ($srs); } thanks, Posted by KevinAD (KevinAD), 19 July 2007 I am not sure how you expect meaningful help with this question unless a person is very familiar with the data you are working with. Having said that, I will take a guess, maybe you need to refactor your regexp:elsif ($line =~ /^DR\s+EMBL;\s+(\w+)/) looks like (\w+) is too greedy and capturing more data then you think it should. This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |