Question: single copy genes for phylogenetic tree after orthomcl tool
0
gravatar for Mehmet
2.6 years ago by
Mehmet410
Japan
Mehmet410 wrote:

Dear all:

I have completed orthomcl step. now I need to know single copy genes and their sequences to build a phylogenetic tree.

Is there any bioinformatics tool/script for this?

sequence alignment genome • 1.3k views
ADD COMMENTlink modified 2.6 years ago by ALchEmiXt1.9k • written 2.6 years ago by Mehmet410

Why? Please specify what exactly you mean by single copy genes and why you think you need to restrict your analysis to those. Do you mean gene without paralogs in any species, or a subset of species? Do you want to remove genes with paralogs completely, or only per taxon? If so can't you simply filter your output by taxon? Those genes having only 1 homolog per taxon are then defined single-copy.   

ADD REPLYlink written 2.6 years ago by Michael Dondrup44k

I need to use single copy genes for phylogenetic tree building. I mean genes without paralogous completely. I couldn't find any good explanation to do so.  Some people used custom scripts to get single copy genes and their protein sequences to build phylogenetic tree. I tried the scripts but I got many errors.Do you have any solution?

ADD REPLYlink written 2.6 years ago by Mehmet410

Can you give me an example output of orthomlc? Still I am asking you why you only want to use single-copy genes, a few paralogues won't do any harm to phylogeny. 

ADD REPLYlink written 2.6 years ago by Michael Dondrup44k

 

ORTHOMCL1 RCUL|nRc.2.0.1.t01024-RA RCUL|nRc.2.0.1.t14099-RA RCUL|nRc.2.0.1.t24454-RA RCUL|nRc.2.0.1.t25442-RA RCUL|nRc.2.0.1.t29258-RA RCUL|nRc.2.0.1.t

This is head part of the out file "groups.txt."

08268-RA RCUL|nRc.2.0.1.t28641-RA RCUL|nRc.2.0.1.t16389-RA RCUL|nRc.2.0.1.t16228-RA RCUL|nRc.2.0.1.t23443-RA RCUL|nRc.2.0.1.t43280-RA RCUL|nRc.2.0.1.t0

7646-RA RCUL|nRc.2.0.1.t40585-RA RCUL|nRc.2.0.1.t31140-RA RCUL|nRc.2.0.1.t23202-RA RCUL|nRc.2.0.1.t18343-RA RCUL|nRc.2.0.1.t10889-RA RCUL|nRc.2.0.1.t13

 

This is tail part of the groups.txt file.

ORTHOMCL39110 SKOW|XP_006823915.1 SMED|mk4.005302.00

ORTHOMCL39111 SKOW|XP_006824289.1 SMED|mk4.010033.03

ORTHOMCL39112 SKOW|XP_006824316.1 SMED|mk4.026828.00

ORTHOMCL39113 SKOW|XP_006824669.1 SMED|mk4.000573.09

ORTHOMCL39114 SKOW|XP_006825572.1 SMED|mk4.000954.05

ORTHOMCL39115 SMED|mk4.003454.01 TCAS|XP_008196576.1

ORTHOMCL39116 SMED|mk4.004076.04 TCAS|XP_008196044.1

ORTHOMCL39117 SMED|mk4.007341.02 TCAS|XP_974486.1

ORTHOMCL39118 SMED|mk4.014650.00 TCAS|XP_001810176.2

ORTHOMCL39119 SMED|mk4.044940.01 TCAS|XP_008190410.1

 

This is head of orthologus.txt file:

1298|c10008_g1_i1.2-511.F2      1299|c6900_g2_i1ppm.1820        0.563

1298|c10009_g1_i1.1-267.F1      1299|c17675_g1_i1ppm.9461       0.386

1298|c10033_g1_i1.392-81.F2     1299|c13365_g1_i1ppm.8946       0.47

1298|c10041_g1_i1.265-2.F1      1299|c12375_g1_i1ppm.7630       0.395

ADD REPLYlink written 2.6 years ago by Mehmet410

I have seen on papers on which people used only single copy genes to build phylogenetic tree.  By the way, thank you so much for your help.

ADD REPLYlink written 2.6 years ago by Mehmet410

Sorry, I do not recognize all of these identifiers (TCAS tribolium, SMED is maybe smedGD database, but too much guessing), maybe someone who understands the orthomlc output more can help you better. Doesn't the software also predict paralogs? If not you have to try to get the paralogs from e.g. ensembl biomart using these identifiers.  

ADD REPLYlink written 2.6 years ago by Michael Dondrup44k
0
gravatar for ALchEmiXt
2.6 years ago by
ALchEmiXt1.9k
The Netherlands
ALchEmiXt1.9k wrote:
If not restricted to orthomcl.... try proteinortho5 which allows easy identification of single/no orthologous genes.
ADD COMMENTlink written 2.6 years ago by ALchEmiXt1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 722 users visited in the last hour