Question

Small Problem: Multiple Alignment To Consensus Sequence

1

Entering edit mode

12.1 years ago

MT ▴ 40

Hi Biostar,

I'm working with softclipped reads in python, mainly using biopython. I have gathered some small clusters of interesting sequences, which i want to do some motif analysis on, maybe blast them, etc.

My problem is: i have between 3 and 20 sequences in each cluster, and i want to reduce that to a consensus sequence. The sequences are highly similar, but sometimes a few corrupted sequences are in the cluster. That means i cannot simply calculate the consensus, since a single unmatching sequences might introduce gaps or otherwise affect the consensus to much.

Is there a way to automatically (without human interference) discard any badly matching sequences from a multiple alignment?

My current implementation first does the clustalw multiple alignment, gets the consensus, and then does pairwise alignment, using emboss needle, to the consensus and discards poorly matching sequences. Then the consensus is rebuilt. This seems rather clumsy, and is terribly slow.

Any advice is greatly appreciated!

python biopython multiple-alignment consensus • 3.3k views

ADD COMMENT • link updated 12.1 years ago by k.nirmalraman ★ 1.1k • written 12.1 years ago by MT ▴ 40

score 0 · Answer 1 · 2013-05-29

0

Entering edit mode

12.1 years ago

k.nirmalraman ★ 1.1k

How about this online tool where you can define the threshold?

ADD COMMENT • link 12.1 years ago by k.nirmalraman ★ 1.1k