How to cluster existing multiple sequence alignments to identify homologous clusters
0
0
Entering edit mode
5 weeks ago
jbt38 • 0

I have a number of existing multiple sequence nucleotide alignments from closely related taxa (two clades which are sisters), and need to align these alignments for analysis. Some are homologous and some not. I think the best way is to cluster them all together to identify these homologous clusters. I know how to do this for single sequences but not entire alignments.

0
Entering edit mode

How about just pooling all the sequences and clustering with e.g. cd-hit or vsearch?

0
Entering edit mode

Hi, I've ran cd-hit to identify clusters and aligned them, but some are fragment sequences with full counterparts that need to be merged, and there are too many to go manually. So I'm looking to identify the pairs of clusters likely to be homologous and align them

0
Entering edit mode

Not sure I fully understand your question but if you want to align two MSAs to each other (or even a sequence to an existing alignment) you should look for profile-to-profile alignment tools. Some that come to mind: t-coffee, muscle, clustal-omega, ...

0
Entering edit mode

Hi, thanks yeah I have mafft in mind for the aligning task but before that I'd like to cluster homologous pairs of alignments, because some are made of fragmentary sequences with full-sequence counterparts.

0
Entering edit mode

I see, perhaps you can first run a simple blast and run blastclust or such on the results to get a rough clustering ?