Question: Phylogenetic Tree Of Fragments Of The Same Protein (From A Metagenome)
3
gravatar for Gimly_Gloin
7.2 years ago by
Gimly_Gloin70
Gimly_Gloin70 wrote:

OK, I have several hundred fragments of a protein of interest(699 sequences) that I would like to align and make a neighbor joining tree of. These fragments do not in many cases align well to one another (different regions of the same or similar proteins).

However, whole protein sequence(s) have been defined and submitted to NCBI and other databases etc. There are also trees made in literature for this protein. Is there a way to take my fragments from my metagenome, and align them to the known sequences to define where each of my fragments lie on the published tree? my only solution to this is to run each sequence (or cluster of sequences) on the predefined tree (using the original whole protein sequences from publication) so as to define where each fragment would lie.

My sequences are non assembly sequences (can't assemble them, too diverse)

Average read length is 400bp

General protein length is around 350aa

IS there an easier way to do this?

How accurate would diversity statistics be on this protein? (will not be adding the known protein sequence for this one)

Thanks for any advice/help in advance.

metagenomics phylogenetics • 3.2k views
ADD COMMENTlink modified 7.1 years ago by Dan Gaston7.1k • written 7.2 years ago by Gimly_Gloin70

PAGAN could be helpful in the alignment part. Please see http://code.google.com/p/pagan-msa/wiki/PAGAN?tm=6 and contact the author if you have any questions. The program is actively developed and recent features (e.g. translated and ORF alignment) are still undocumented.

You could try (1) "pileup alignment" (one ref. sequence) and (2) "unguided placement (ref. alignment and tree):

pagan --reads-pileup --ref-seqfile ref_sequence.pep --readsfile prot_frags.fas

pagan --ref-seqfile ref_alignment.fas --ref-treefile ref_tree.nh --readsfile prot_frags.fas --fast-placement --test-every-node

ADD REPLYlink written 7.2 years ago by Ari110
4
gravatar for Ari
7.2 years ago by
Ari110
Ari110 wrote:

PAGAN could be helpful in the alignment part. Please see http://code.google.com/p/pagan-msa/wiki/PAGAN?tm=6 and contact the author if you have any questions. The program is actively developed and some recent features (e.g. translated and ORF alignment) are still undocumented.

You could try (1) "pileup alignment" (with one reference sequence) and (2) "unguided placement" (with a reference alignment and tree):

(1)

pagan --reads-pileup --ref-seqfile ref_peptide.fas --readsfile prot_frags.fas

(2)

pagan --ref-seqfile ref_alignment.fas --ref-treefile ref_tree.nh --readsfile \
prot_frags.fas --fast-placement --test-every-node

With the second option, PAGAN adds the new sequences "inside" the reference alignment at the phylogenetic positions that they match best. This is based on a greedy search, though, and should not be taken as a proper phylogenetic analysis.

ADD COMMENTlink written 7.2 years ago by Ari110
3
gravatar for Miguel Pignatelli
7.2 years ago by
Miguel Pignatelli140 wrote:

I think you can use PaPaRa for this. Build a phylogenetic tree with the full length proteins and align your queries to the tree/s.

Check the publication from Berger&Stamatakis: http://bioinformatics.oxfordjournals.org/content/27/15/2068.long

or their web page: http://www.exelixis-lab.org/

ADD COMMENTlink written 7.2 years ago by Miguel Pignatelli140

See my comment below for the newer EPA algorithm from the same group that extends PaPaRa into an ML framework.

ADD REPLYlink written 7.2 years ago by Dan Gaston7.1k
1
gravatar for ALchEmiXt
7.2 years ago by
ALchEmiXt1.9k
The Netherlands
ALchEmiXt1.9k wrote:

The displayed trees at NCBI are more or less pair-wise BLAST based (or you mean some other trees?). If you have the sequences of a certain tree you should be able to reproduce that tree quite easily based on pair-wise sequence content comparsion (i.e. using BLAST or MUMmer).

If that is the case you can add your own sequences and "see" in which clade they end up all in one go. There assuming the tree is not too much disturbed by the additional sequences. These algorithms are quite fast and therefore allow lots of room to test settings and see various outcomes.

ADD COMMENTlink written 7.2 years ago by ALchEmiXt1.9k
1

I would avoid the NCBI reference trees whenever possible. They are essentially hierarchical distance trees and not necessarily representative of the true phylogeny depending on the sequenc ein question and the taxa represented.

ADD REPLYlink written 7.2 years ago by Dan Gaston7.1k

No, It isn't an NCBI tree, what I meant was that the sequence data for the protein is from NCBI, the tree is actually a Maximum Likelihood generated by PHYML. Thanks for the suggestion though, I can already do this by clustering with USEARCH which gives me a rough Idea where each fragment is but doesn't provide enough data for statistical analysis using OTUs in MOTHUR...

ADD REPLYlink written 7.2 years ago by Gimly_Gloin70
1
gravatar for Dan Gaston
7.2 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

Don't do an NJ tree. NJ phylogenetic tree algorithms are prone to all sorts of biases and artefacts, like long-branch attraction, that could be particularly problematic for this sort of problem.

There is a version of RAxML, a Maximum-Likelihood phylogenetics software (http://www.exelixis-lab.org/) called the Evolutionary Placement Algorithm (Paper is here: http://sco.h-its.org/exelixis/rrdr2009-3.php) You can use a reference phylogeny and aligned sequences to do short-read mapping of your metagenomic data to the try in a full maximum-likelihood context. Including models of substitution, frequency estimates, etc is very important, especially if you are dealing with a large number of taxa and large amounts of diversity.

ADD COMMENTlink written 7.2 years ago by Dan Gaston7.1k

Thanks for the advise, after rereading my source of phylogeny for my protein, it appeared they had done a Maximum Likelihood tree using PHYML. Will have a look at RAxML!

ADD REPLYlink written 7.2 years ago by Gimly_Gloin70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 649 users visited in the last hour