Question

Phylogeny from incomplete orthogroups

0

Entering edit mode

7.5 years ago

shelkmike ★ 1.2k

Hello, everyone.

I'm building a phylogenetic tree from sequenced transcriptomes of 100 species. I've calculated orthogroups by OrthoMCL and will build a tree by RAxML. In the most articles I've seen that authors take for tree building only those orthogroups, which have exactly one gene from each species. Taking into account that I have transcriptomes, due to, for example, misassemblies, with the increasing number of transcriptomes the number of such orthogroups will decrease. I want, instead, to take all orthogroups where there is exactly one gene from at least 50 species. If there are two genes from some species in some orthogroup, I'll drop both of them (because paralogs can hamper true tree reconstruction). In the resulting concatenated alignments of orthogroups, which I'll give to RAxML, I just fill with gaps (-------) places where some of species doesn't have an ortholog. RAxML can deal with such gaps - it just won't use information from this column in the alignment for this species (https://goo.gl/GZ47bu). So, my method is good, for example because it allows to take information for tree building from more genes. However, in all articles I've seen, people try to build trees from complete orthogroups. Am I missing some drawback in the method?

I would be grateful for possible help

P.S. The lower limit of 50 species in an orthogroup is arbitrary - I just don't want to take too small orthogroups, because they may originate from contamination P.P.S. Speaking in details, I have only 30 orthogroups with exactly one gene assembled from each species, but 4000 orthogroups with one gene assembled from at least 50 species.

RNA-Seq Phylogenetics OrthoMCL Orthogroups • 2.0k views

ADD COMMENT • link updated 5.6 years ago by Biostar 20 • written 7.5 years ago by shelkmike ★ 1.2k

score 1 · Answer 1 · 2016-11-11

1

Entering edit mode

7.5 years ago

Brice Sarver ★ 3.8k

However, in all articles I've seen, people try to build trees from complete orthogroups

Whether you use 'complete' orthogroups or not depends on the question you are trying to ask. It appears to just be a scale issue in your case. Your tree will still be representative of the relationships for the individual/species you include. If you use your approach and replace missing sequences with all gaps (or Ns), there will be no information about the correct placement of that lineage in the tree. I believe that RAxML will throw a warning/error if you have individuals with completely missing data, but this is something to look into.

ADD COMMENT • link 7.5 years ago by Brice Sarver ★ 3.8k

0

Entering edit mode

Thank you for your response.

The aim of the work is simply to build a correct tree of species. Information from only 'complete' orthogroups is insufficient for this, due to a low number of such orthogroups.

If you use your approach and replace missing sequences with all gaps (or Ns), there will be no information about the correct placement of that lineage in the tree

For tree building I use concatenated alignments of orthogroups, so, since all species are represented in some orthogroups, there will be no error

ADD REPLY • link 7.4 years ago by shelkmike ★ 1.2k