| |||||||||||
| |||||||||||
any other way? Posted by rick (rick), 14 November 2007 Hi Graham,really thanks 4 ur wonderful help.I have a file,containing amino acid seq.like MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIIEMIPER NEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPPTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVK IRRRVDTNPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELQDCKIAPLMVAYMLERE I need o count each individual character,so I used this script; #!usr/bin/perl -w open(FILE,"<FASTA1.fa") or die "cant open file:$!\n"; $sequence = ''; @data=<FILE>; foreach $line (@data){ if ($line =~m/^\s*$/) { next; } elsif($line =~m/^\s*#/) { next; } elsif($line =~m/^>/) { next; } else { $sequence .= $line; } } $sequence =~ s/\s//g; @sq=split(//,$sequence); $count_of_A = 0; $count_of_C = 0; $count_of_G = 0; $count_of_T = 0; $count_of_Q=0; $count_of_S=0; $count_of_R=0; $count_of_W=0; $count_of_Y=0; $count_of_I=0; $count_of_H=0; $count_of_D=0; $count_of_E=0; $count_of_M=0; $count_of_N=0; $count_of_P=0; $count_of_V=0; $count_of_F=0; $count_of_K=0; $count_of_M=0; $errors = 0; foreach $char(@sq) { if ( $char eq 'A' ) { ++$count_of_A; } elsif ( $char eq 'C' ) { ++$count_of_C; } elsif ( $char eq 'G' ) { ++$count_of_G; } elsif ( $char eq 'T' ) { ++$count_of_T; } elsif ( $char eq 'S' ) { ++$count_of_S; } elsif ( $char eq 'D' ) { ++$count_of_D; } elsif ( $char eq 'P' ) { ++$count_of_P; } elsif ( $char eq 'V' ) { ++$count_of_V; } elsif ( $char eq 'L' ) { ++$count_of_L; } elsif ( $char eq 'I' ) { ++$count_of_I; } elsif ( $char eq 'M' ) { ++$count_of_M; } elsif ( $char eq 'F' ) { ++$count_of_F; } elsif ( $char eq 'Y' ) { ++$count_of_Y; } elsif ( $char eq 'W' ) { ++$count_of_W; } elsif ( $char eq 'H' ) { ++$count_of_H; } elsif ( $char eq 'K' ) { ++$count_of_K; } elsif ( $char eq 'R' ) { ++$count_of_R; } elsif ( $char eq 'Q' ) { ++$count_of_Q; } elsif ( $char eq 'N' ) { ++$count_of_N; } elsif ( $char eq 'E' ) { ++$count_of_E; } else { print "!!!!!!!! Error - I don\'t recognize this char: $char\n"; ++$errors; } } print "A = $count_of_A\n"; print "C = $count_of_C\n"; print "G = $count_of_G\n"; print "T = $count_of_T\n"; print "Q = $count_of_Q\n"; print "S = $count_of_S\n"; print "R = $count_of_R\n"; print "W = $count_of_W\n"; print "Y = $count_of_Y\n"; print "I = $count_of_I\n"; print "H = $count_of_H\n"; print "D = $count_of_D\n"; print "E = $count_of_E\n"; print "L = $count_of_L\n"; print "N = $count_of_N\n"; print "P = $count_of_P\n"; print "V = $count_of_V\n"; print "F = $count_of_F\n"; print "K = $count_of_K\n"; print "M = $count_of_M\n"; print "errors = $errors\n"; close FILE; This is working finely,but it is really large.Is there any way to do this same work with much smaller script?? Posted by KevinAD (KevinAD), 14 November 2007 I am not sure this will be any fastser than your existing code but you can try:Code:
I used Data: ![]() Posted by deep (deep), 24 November 2007 you can use the wonder full "substr"Here is a partial code, i have used to do the same thing u are doing. if(/[ARNDCQEGHILKMFPSTWYVJU]+[\n\r]+/) #regex to capture peptide sequences { $pep_seq = $&; #print "$&"; $len = length($&); #lenght of peptides #print "==>$len"; #print "Len==>$len"; for ($i=0;$i<$len;$i++) #Loop that uses lenght of the peptides as sentinal value. { $temp = substr $&,$i,1;#use of perl function substr to capture each and every AA in the current peptide sequence ignore the commented part..thats just the way I test my scripts. But this should get u going. My regex might be different then what u need, so u can change it accordingly. Opps just realized, u will need to compare the "temp" with the 20 AA. I "had" to do it as I am doing various processing on each AA. Posted by KevinAD (KevinAD), 24 November 2007 This can slow down your perl scripts:$pep_seq = $&; For reasons I am not sure of, if you use $& (or even worse: $` or $') they can slow down perl scripts. Perl is forced to use them in all regexps if used in one and this can have an impact on performance. Posted by admin (Graham Ellis), 25 November 2007 on 11/24/07 at 22:48:58, KevinAD wrote:
My understanding is that if you match against a string that is (say) 8 Mbytes in length, then every single match in your program will write 8 Mbytes of temporary variables if there is any mention of $& $' or $`. See longer article .... http://www.wellho.net/mouth/1444_Using-English-can-slow-you-right-down-.html This page is a thread posted to the opentalk forum
at www.opentalk.org.uk and
archived here for reference. To jump to the archive index please
follow this link.
|
| ||||||||||
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho |