Question: Phylogeny from incomplete orthogroups
0
gravatar for shelkmike
4.0 years ago by
shelkmike290
Russian Federation
shelkmike290 wrote:

Hello, everyone.

I'm building a phylogenetic tree from sequenced transcriptomes of 100 species. I've calculated orthogroups by OrthoMCL and will build a tree by RAxML. In the most articles I've seen that authors take for tree building only those orthogroups, which have exactly one gene from each species. Taking into account that I have transcriptomes, due to, for example, misassemblies, with the increasing number of transcriptomes the number of such orthogroups will decrease. I want, instead, to take all orthogroups where there is exactly one gene from at least 50 species. If there are two genes from some species in some orthogroup, I'll drop both of them (because paralogs can hamper true tree reconstruction). In the resulting concatenated alignments of orthogroups, which I'll give to RAxML, I just fill with gaps (-------) places where some of species doesn't have an ortholog. RAxML can deal with such gaps - it just won't use information from this column in the alignment for this species (https://goo.gl/GZ47bu). So, my method is good, for example because it allows to take information for tree building from more genes. However, in all articles I've seen, people try to build trees from complete orthogroups. Am I missing some drawback in the method?

I would be grateful for possible help

P.S. The lower limit of 50 species in an orthogroup is arbitrary - I just don't want to take too small orthogroups, because they may originate from contamination P.P.S. Speaking in details, I have only 30 orthogroups with exactly one gene assembled from each species, but 4000 orthogroups with one gene assembled from at least 50 species.

ADD COMMENTlink modified 2.1 years ago by Biostar ♦♦ 20 • written 4.0 years ago by shelkmike290
1
gravatar for Brice Sarver
4.0 years ago by
Brice Sarver3.5k
United States
Brice Sarver3.5k wrote:

However, in all articles I've seen, people try to build trees from complete orthogroups

Whether you use 'complete' orthogroups or not depends on the question you are trying to ask. It appears to just be a scale issue in your case. Your tree will still be representative of the relationships for the individual/species you include. If you use your approach and replace missing sequences with all gaps (or Ns), there will be no information about the correct placement of that lineage in the tree. I believe that RAxML will throw a warning/error if you have individuals with completely missing data, but this is something to look into.

ADD COMMENTlink written 4.0 years ago by Brice Sarver3.5k

Thank you for your response.

The aim of the work is simply to build a correct tree of species. Information from only 'complete' orthogroups is insufficient for this, due to a low number of such orthogroups.

If you use your approach and replace missing sequences with all gaps (or Ns), there will be no information about the correct placement of that lineage in the tree

For tree building I use concatenated alignments of orthogroups, so, since all species are represented in some orthogroups, there will be no error

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by shelkmike290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1366 users visited in the last hour