Question: Get summary of aligned sequences
0
gravatar for Chirag Parsania
4 months ago by
Chirag Parsania1.9k
University of Macau
Chirag Parsania1.9k wrote:

Hi,

I have aligned a set of ~3000 protein sequences. Next, I want to do is to generate a phylogenetic tree. Before that, I want to check the quality of the alignment. Is there any way to get a summary of aligned sequences quickly? Also, can anyone suggest a tool to process the aligned sequences before I do the phylogenetic analysis? Currently, I am using trimAL to trim the alignment.

Thanks, Chirag.

alignment sequence fasta • 205 views
ADD COMMENTlink written 4 months ago by Chirag Parsania1.9k

Do you mean a multiple sequence alignment of 3000 proteins? Sounds like a huge number

ADD REPLYlink written 4 months ago by Asaf8.4k

Yes. I want to generate statistics out of that. For example, the distribution of gaps, number of conserved columns etc.

ADD REPLYlink written 4 months ago by Chirag Parsania1.9k

That's a huge number. Unless they are almost identical I expect this alignment to be misleading. Consider dividing into clusters before aligning.

ADD REPLYlink written 4 months ago by Asaf8.4k

Can you elaborate a little more on generating clusters? I mean how can I do that and with the clusters how to proceed with downstream phylogeny.

ADD REPLYlink written 4 months ago by Chirag Parsania1.9k
1

I believe what Asaf meant is to perform clustering of the protein sequences first; i.e. cluster similar protein sequences into clusters that meet a user-defined similarity threshold. This could be achieved using clustering tools such as CD-HIT; have a look this.

After that, align them, each cluster separately.

ADD REPLYlink modified 3 months ago • written 3 months ago by lakhujanivijay5.2k

Usually using biological knowledge. I don't know what you're trying to achieve but if, for instance, one would like to generate a phylogenetic tree of 3,000 bacterial species based on one protein the strategy I suggest would be to take each phylum and generate a tree and then combine the trees. I think there should be a balance between brute-force methods (let's feed the algorithm with everything) and refined understanding of the problem.

ADD REPLYlink written 4 months ago by Asaf8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1889 users visited in the last hour