Get summary of aligned sequences
0
0
Entering edit mode
4.5 years ago
Chirag Parsania ★ 2.0k

Hi,

I have aligned a set of ~3000 protein sequences. Next, I want to do is to generate a phylogenetic tree. Before that, I want to check the quality of the alignment. Is there any way to get a summary of aligned sequences quickly? Also, can anyone suggest a tool to process the aligned sequences before I do the phylogenetic analysis? Currently, I am using trimAL to trim the alignment.

Thanks, Chirag.

sequence alignment fasta • 1.5k views
ADD COMMENT
0
Entering edit mode

Do you mean a multiple sequence alignment of 3000 proteins? Sounds like a huge number

ADD REPLY
0
Entering edit mode

Yes. I want to generate statistics out of that. For example, the distribution of gaps, number of conserved columns etc.

ADD REPLY
0
Entering edit mode

That's a huge number. Unless they are almost identical I expect this alignment to be misleading. Consider dividing into clusters before aligning.

ADD REPLY
0
Entering edit mode

Can you elaborate a little more on generating clusters? I mean how can I do that and with the clusters how to proceed with downstream phylogeny.

ADD REPLY
1
Entering edit mode

I believe what Asaf meant is to perform clustering of the protein sequences first; i.e. cluster similar protein sequences into clusters that meet a user-defined similarity threshold. This could be achieved using clustering tools such as CD-HIT; have a look this.

After that, align them, each cluster separately.

ADD REPLY
0
Entering edit mode

Usually using biological knowledge. I don't know what you're trying to achieve but if, for instance, one would like to generate a phylogenetic tree of 3,000 bacterial species based on one protein the strategy I suggest would be to take each phylum and generate a tree and then combine the trees. I think there should be a balance between brute-force methods (let's feed the algorithm with everything) and refined understanding of the problem.

ADD REPLY

Login before adding your answer.

Traffic: 822 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6