Matches and mismatches in perl

Posted by sirisha (sirisha), 23 January 2008

We are working on prion proteins. We extracted pattern signatures for mammals, aves, reptilia, pisces and amphibia.

We are supposed to write a progarm for developing a tool which should be able to find whether it is a prion or not and to which family it belongs to.
We were successful up to that point.

But now we have a small problem.
When we give an input sequence , the tool should match with the pattern and give the matches and mismatches in the output.

i will give you 2 small examples. if you cant get it pls let me know.
i will try to give a clear idea.

example 1:

$a=APPLE; # let it be a pattern
$b=<STDIN>; # input sequence

suppose my input sequence ($b) is : MYAPPLES
Then the input ($b) will match with $a only from 3rd letter to 7th letter.( i.e APPLE)
i want to get out put as below:

The matching region is - - APPLE- (i.e mismatches should be shown as hiphen)

If the pattern is "MYAPPLE" and the input is "APPLEMY"

Then the output should be:

- -APPLE- -

i.e the first 2 hiphens represent gaps and last 2 hiphens represent mismatches.

Example2:
$a="agaaaagavvgglggy" # a pattern signature
$b=<STDIN>;

let the input seq is "ttttttttttagaaaagavtttggyttttttt";

here the input matches with $a only with the letters in bold : ttttttttttagaaaagavtttggyttttttt

i want to know what we can do to give input showing both matches and mismatches in the input.(mismatches in the seq should be shown as a hiphen)
That means the output should be like this:
The given sequence matches with the pattern at - - - - - - - - - - agaaaagav- - -ggy - - - - - -

Thanks in advance,
siri

Posted by admin (Graham Ellis), 23 January 2008

Thanks for that fuller explanation (I'm guessing you've started a new thread that continues your "Help in Perl" question) but I still don't understand the detail of how you decide what a match is - you could have come up with MY as the matching sequence just as easily as APPLE in the first sequence, and in the second case with the gap in the middle, I don't know what the rules are for the gap. As such, I can't point (yet) at a best solution / algorithmic approach.

Did Kevin's BioPerl suggestion help? That's the way I would go, unless you're researching the science beyond what it can provide.

Posted by KevinAD (KevinAD), 23 January 2008

Looks like they have to program it Graham:

Quote:

We are supposed to write a progarm for developing a tool

Posted by KevinAD (KevinAD), 23 January 2008

siri,

Please post the code you have written so far to try and write this tool. I am willing to help but only if I see some effort on your part to write code. I would think using the index() and substr() functions in a recursive loop will do what you want.

Posted by KevinAD (KevinAD), 23 January 2008

I think he may have gotten the answer he is looking for on another forum/blog. I see this same question posted by this same person on some other forums.

This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.