Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
For 2023 (and 2024 ...) - we are now fully retired from IT training.
We have made many, many friends over 25 years of teaching about Python, Tcl, Perl, PHP, Lua, Java, C and C++ - and MySQL, Linux and Solaris/SunOS too. Our training notes are now very much out of date, but due to upward compatability most of our examples remain operational and even relevant ad you are welcome to make us if them "as seen" and at your own risk.

Lisa and I (Graham) now live in what was our training centre in Melksham - happy to meet with former delegates here - but do check ahead before coming round. We are far from inactive - rather, enjoying the times that we are retired but still healthy enough in mind and body to be active!

I am also active in many other area and still look after a lot of web sites - you can find an index ((here))
any other way?

Posted by rick (rick), 14 November 2007
Hi Graham,really thanks 4 ur wonderful help.

I have a file,containing amino acid seq.like

MERIKELRDLMSQSRTREILTKTTVDHMAIIKKYTSGRQEKNPALRMKWMMAMKYPITADKRIIEMIPER
NEQGQTLWSKTNDAGSDRVMVSPLAVTWWNRNGPPTSTVHYPKVYKTYFEKVERLKHGTFGPVHFRNQVK
IRRRVDTNPGHADLSAKEAQDVIMEVVFPNEVGARILTSESQLTITKEKKEELQDCKIAPLMVAYMLERE

I need o count each individual character,so I used this script;

#!usr/bin/perl -w
open(FILE,"<FASTA1.fa") or die "cant open file:$!\n";
$sequence = '';
@data=<FILE>;
foreach $line (@data){
if ($line =~m/^\s*$/) {
           next;
} elsif($line =~m/^\s*#/) {
           next;
} elsif($line =~m/^>/) {
           next;      
} else {
           $sequence .= $line;
       }
            }

$sequence =~ s/\s//g;

@sq=split(//,$sequence);
$count_of_A = 0;
$count_of_C = 0;
$count_of_G = 0;
$count_of_T = 0;
$count_of_Q=0;
$count_of_S=0;
$count_of_R=0;
$count_of_W=0;
$count_of_Y=0;
$count_of_I=0;
$count_of_H=0;
$count_of_D=0;
$count_of_E=0;
$count_of_M=0;
$count_of_N=0;
$count_of_P=0;
$count_of_V=0;
$count_of_F=0;
$count_of_K=0;
$count_of_M=0;
$errors     = 0;

foreach $char(@sq) {

   if     ( $char eq 'A' ) {
       ++$count_of_A;
   } elsif ( $char eq 'C' ) {
       ++$count_of_C;
   } elsif ( $char eq 'G' ) {
       ++$count_of_G;
   } elsif ( $char eq 'T' ) {
       ++$count_of_T;
   }
     elsif ( $char eq 'S' ) {
       ++$count_of_S;
   }
     elsif ( $char eq 'D' ) {
       ++$count_of_D;
   }
     elsif ( $char eq 'P' ) {
       ++$count_of_P;
   }
     elsif ( $char eq 'V' ) {
       ++$count_of_V;
   }
     elsif ( $char eq 'L' ) {
       ++$count_of_L;
   }
     elsif ( $char eq 'I' ) {
       ++$count_of_I;
   }
     elsif ( $char eq 'M' ) {
       ++$count_of_M;
   }
     elsif ( $char eq 'F' ) {
       ++$count_of_F;
   }
     elsif ( $char eq 'Y' ) {
       ++$count_of_Y;
   }
     elsif ( $char eq 'W' ) {
       ++$count_of_W;
   }
     elsif ( $char eq 'H' ) {
       ++$count_of_H;
   }
     elsif ( $char eq 'K' ) {
       ++$count_of_K;
   }
     elsif ( $char eq 'R' ) {
       ++$count_of_R;
   }
     elsif ( $char eq 'Q' ) {
       ++$count_of_Q;
   }
     elsif ( $char eq 'N' ) {
       ++$count_of_N;
   }
     elsif ( $char eq 'E' ) {
       ++$count_of_E;
   }
      else {
       print "!!!!!!!! Error - I don\'t recognize this char: $char\n";
       ++$errors;
   }
                  }

print "A = $count_of_A\n";
print "C = $count_of_C\n";
print "G = $count_of_G\n";
print "T = $count_of_T\n";
print "Q = $count_of_Q\n";
print "S = $count_of_S\n";
print "R = $count_of_R\n";
print "W = $count_of_W\n";
print "Y = $count_of_Y\n";
print "I = $count_of_I\n";
print "H = $count_of_H\n";
print "D = $count_of_D\n";
print "E = $count_of_E\n";
print "L = $count_of_L\n";
print "N = $count_of_N\n";
print "P = $count_of_P\n";
print "V = $count_of_V\n";
print "F = $count_of_F\n";
print "K = $count_of_K\n";
print "M = $count_of_M\n";
print "errors = $errors\n";
     
close FILE;


This is working finely,but it is really large.Is there any way to do this same work with much smaller script??

Posted by KevinAD (KevinAD), 14 November 2007
I am not sure this will be any fastser than your existing code but you can try:

Code:
#!usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
my @SEQ = qw(A C D E F G H I K L M N P Q R S T V W Y);
my %counts = map{ $_ => 0} @SEQ;
my %errors = ();
open(FILE,"<FASTA1.fa") or die "cant open file:$!\n";
while ( <FILE> ) {
  if (m/^\s*$/ ) {
     next;
  }
  elsif ( m/^\s*#/ ) {
     next;
  }
  elsif ( m/^>/ ) {
     next;
  }
  else {
     while (/(.)/g) {
        my $c = $1;
        if ( exists $counts{$c} ) {
           $counts{$c}++;
        }
        else {
           $errors{$c}++;
        }
     }
  }
}

close FILE;
print Dumper \%counts, \%errors;


I used Data:umper only for convenience, you print the results however you want.

Posted by deep (deep), 24 November 2007
you can use the wonder full "substr"
Here is a partial code, i have used to do the same thing u are doing.
if(/[ARNDCQEGHILKMFPSTWYVJU]+[\n\r]+/) #regex to capture peptide sequences
          {
           $pep_seq = $&;
            #print "$&";
              $len = length($&); #lenght of peptides
             #print "==>$len";
             
             #print "Len==>$len";
           
           for ($i=0;$i<$len;$i++) #Loop that uses lenght of the peptides as sentinal value.
           {
             $temp = substr $&,$i,1;#use of perl function substr to capture each and every AA in the current peptide sequence


ignore the commented part..thats just the way I test my scripts. But this should get u going. My regex might be different then what u need, so u can change it accordingly.

Opps just realized, u will need to compare the "temp" with the 20 AA. I  "had" to do it as  I am doing various processing on each AA.

Posted by KevinAD (KevinAD), 24 November 2007
This can slow down your perl scripts:

$pep_seq = $&;

For reasons I am not sure of, if you use $& (or even worse: $` or $') they can slow down perl scripts. Perl is forced to use them in all regexps if used in one and this can have an impact on performance.



Posted by admin (Graham Ellis), 25 November 2007
on 11/24/07 at 22:48:58, KevinAD wrote:
This can slow down your perl scripts:

$pep_seq = $&;

For reasons I am not sure of, if you use $& (or even worse: $` or $') they can slow down perl scripts.


My understanding is that if you match against a string that is (say) 8 Mbytes in length, then every single match in your program will write 8 Mbytes of temporary variables if there is any mention of $& $' or $`.

See longer article ....

http://www.wellho.net/mouth/1444_Using-English-can-slow-you-right-down-.html




This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2024: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho