| |||||||||||
| |||||||||||
to make work faster?? Posted by revtopo (revtopo), 30 October 2007 hi all,I am under a great confusion. I have got a code which counts the numbers of matches of RNa sequences with e genome. but that takes days together to look in the complete genome. I have attahced the code along with this. Please suggest any way of improving the efficeincy #!/usr/bin/perl -w use DBI; use strict; use Bio::SeqIO; use Bio: ![]() use GD::Graph::bars; $|=1; my $window_size = 500; my $step_size = 500; my $connect; my $database = 'Cloned_RNA_2_dev'; my $user ='root'; my $pass = ''; my $host = 'localhost'; my %sequences; my $total_length; my $length; open (RESULT, ">result.csv"); my $dsn = qq(DBI:mysql:database=$database; host=$host); $connect = DBI-> connect($dsn, $user, $pass,{printError =>1}) or die $DBI::errstr; #querying the database my $query = qq/select count(c.Rna_sequence_sequence) from Cloned_rna c, Hsp h where c.Rna_sequence_sequence=h.Rna_sequence_sequence and c.Project_idproject =72 and h.Sequence_db_idSequence_db =1133 and h.accession =? and h.hit_start >= ? and h.hit_end<=? group by h.accession/; my $sql = $connect->prepare($query); my %window_counts ; print RESULT "\"Chromosome\"\t\"start\"\t\"end\"\t\"counts\"\n" ; my $genome = Bio::SeqIO->new(-file=>'/home/shared-projects/sequence-dbs/TAIR7/TAIR7_nuclear_genome.fna', -format=>'fasta'); while (my $seq = $genome->next_seq()){ $length = $seq->length(); my $id =$seq->id(); #print RESULT "Windows on chromosome $id\n" ; warn "Getting windows on chromosome $id\n" ; #walking along the genome for(my $i=1;$i<=($length-$window_size); $i+=$step_size){ my $window_start = $i; my $window_end = $i+$window_size-1; my $sucess=($sql->execute($id, $window_start, $window_end)); die "query failed!\n $query \n" unless $sucess; my $window_id = "$id,$window_start" ; if (my @result = $sql->fetchrow_array() ) { ($window_counts{$window_id}) = @result; # has the counts as the value of the hash warn "\$window_counts{$window_id}='$window_counts{$window_id}'\n"; print RESULT "$id\t$window_start\t$window_end\t$window_counts{$window_id}\n" ; } # exit if $i > 10000; } } any help in improving this ![]() Posted by george_Ball (george), 30 October 2007 Well I'm no database guru but I'd probably bet money that the problem is in the database rather than this code, which does not seem to do anything sophisticated. Of course the way to attack this is to use a profiler, such as Dev:![]() He also discusses a DBI: ![]() Posted by KevinAD (KevinAD), 30 October 2007 How big is the gnome file?Posted by revtopo (revtopo), 30 October 2007 Cant say the exact length of the genome but the chromosome 1 is 3056785 and the chromosome2 is 1908764 and the chromosomes 3, 4, 5 are of similar length to chromosome 1 and 2.Posted by KevinAD (KevinAD), 31 October 2007 do you really need to do this?my $sucess=($sql->execute($id, $window_start, $window_end)); die "query failed!\n $query \n" unless $sucess Posted by revtopo (revtopo), 31 October 2007 yes ofcourse to know from in which chromosome they are parsed and the positionsPosted by george_Ball (george), 31 October 2007 I have no background in biology or BIO:: Perl so I have no understanding of the numbers you give here... But look at the code, and you see nested loops. What is the *actual* value (or at least the order of magnitude) of the number of iterations through the while loop ( ie how many times will $seq = $genome->next_seq() return a non-undef value ) ? What is the *actual* value (or at least the order of magnitude) of ($length-$window_size) / $step_size If we take the first of these as A and the second as B, then you are issuing the select statement to the database A*B times. If these are very large numbers then you are hitting the database a *very* large number of times. That may well be the cause of your problem, but it's impossible to say without having some remote idea what these numbers are. And, as I said earlier, problems like this are best approached through use of a profiling tool. Posted by KevinAD (KevinAD), 31 October 2007 I'm in the same boat as george here. If there is a forum on the bioperl site you might want to start asking these questions over there where there will hopefully be members with similar issues and that understand the types of files you are working with. This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |