Question: Is Snp Data Obtained From Mauve Progressive Alignment Useful For Phylogeny Of Bacterial Whole Genomes?
gravatar for Nari
7.4 years ago by
United States
Nari880 wrote:

I have SNP data of multiple genomes obtained by Mauve Progressive alignment.
How can I utilize this data for plotting Phylogenetic tree of those genomes?

Sample data: (For 3 genomes)

SNP     sequence_1    sequence_2    sequence_3  
ACA        4        4        4  
AAG        9        9        9  
CTC        10        10        10  
TGT        12        12        12  
GAG        15        15        15  
TCT        18        18        18
........................and so on till last position.

(Numbers are positions in whole genome)

phylogeny snp • 4.1k views
ADD COMMENTlink modified 7.4 years ago by aidan-budd1.9k • written 7.4 years ago by Nari880
gravatar for aidan-budd
7.4 years ago by
aidan-budd1.9k wrote:

I've no experience or knowledge about using SNP data in this format for estimating a phylogenetic tree.

My (strong!) preference when it comes to tree building is to use explicit probabilistic evolutionary-process based models - almost always focused, if working on nucleotide sequences, on the evolutionary process of base substitution.

There are lots and lots of software packages out there, that use such models, and which take as input a multiple sequence alignment of your sequences, rather than a list of SNPs.

Without, as I say, any experience working with a list such as you describe (maybe this is common practice in some contexts?!), I would rather recommend that you instead get your hands on data which can be transformed e.g. into a fasta format multiple sequence alignment file, which can be used (or adapted) for input to software such as PhyML, RAxML, MrBayes etc.

I notice, looking at the PLoS One progressiveMauve paper, that they say

"The alignment can also be used to extract variable sites for more traditional phylogenetic analyses. "

which suggest to me that you may be able to get something like this out of the aligner.

ADD COMMENTlink written 7.4 years ago by aidan-budd1.9k

Thank you for your response. I did a small mistake while asking. instead of saying "Phylogeny" I should have said "Clustering.". Just now I found a way to do clustering in Statistica Package.However, your suggestion about doing Phylogeny using Multiple alignment file (With little formatting) in PhyML or RAxML is a better option.

ADD REPLYlink written 7.4 years ago by Nari880

Thanks for the feedback, /\/ari - nice/motivating to have the feeling that my posts are indeed being read by the question posters :)

One further comment I'd make (to your comment) is that, meh, I wouldn't say that PhyML/RAxML are necessarily the best/better options - it rather (as always in bioinformatics - and I guess in life in general?!) depends on the reason, the specific outcomes you need to get, out of your analysis. If you're interested in making inferences about evolutionary processes from your data, then I'd recommend using tool(s) that incorporate explicit evolutionary models - but there may well be other applications/sets of questions you're interested in asking where alternative approaches may be more useful.

ADD REPLYlink written 7.4 years ago by aidan-budd1.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour