single copy genes for phylogenetic tree after orthomcl tool
1
0
Entering edit mode
8.7 years ago
Mehmet ▴ 820

Dear all:

I have completed orthomcl step. now I need to know single copy genes and their sequences to build a phylogenetic tree.

Is there any bioinformatics tool/script for this?

genome alignment sequence • 3.3k views
ADD COMMENT
0
Entering edit mode

Why? Please specify what exactly you mean by single copy genes and why you think you need to restrict your analysis to those. Do you mean gene without paralogs in any species, or a subset of species? Do you want to remove genes with paralogs completely, or only per taxon? If so can't you simply filter your output by taxon? Those genes having only 1 homolog per taxon are then defined single-copy.

ADD REPLY
0
Entering edit mode

I need to use single copy genes for phylogenetic tree building. I mean genes without paralogous completely. I couldn't find any good explanation to do so. Some people used custom scripts to get single copy genes and their protein sequences to build phylogenetic tree. I tried the scripts but I got many errors.Do you have any solution?

ADD REPLY
0
Entering edit mode

Can you give me an example output of orthomlc? Still I am asking you why you only want to use single-copy genes, a few paralogues won't do any harm to phylogeny.

ADD REPLY
0
Entering edit mode
ORTHOMCL1 RCUL|nRc.2.0.1.t01024-RA RCUL|nRc.2.0.1.t14099-RA RCUL|nRc.2.0.1.t24454-RA RCUL|nRc.2.0.1.t25442-RA RCUL|nRc.2.0.1.t29258-RA RCUL|nRc.2.0.1.t

This is head part of the out file groups.txt.

08268-RA RCUL|nRc.2.0.1.t28641-RA RCUL|nRc.2.0.1.t16389-RA RCUL|nRc.2.0.1.t16228-RA RCUL|nRc.2.0.1.t23443-RA RCUL|nRc.2.0.1.t43280-RA RCUL|nRc.2.0.1.t0
7646-RA RCUL|nRc.2.0.1.t40585-RA RCUL|nRc.2.0.1.t31140-RA RCUL|nRc.2.0.1.t23202-RA RCUL|nRc.2.0.1.t18343-RA RCUL|nRc.2.0.1.t10889-RA RCUL|nRc.2.0.1.t13

This is tail part of the groups.txt file.

ORTHOMCL39110 SKOW|XP_006823915.1 SMED|mk4.005302.00
ORTHOMCL39111 SKOW|XP_006824289.1 SMED|mk4.010033.03
ORTHOMCL39112 SKOW|XP_006824316.1 SMED|mk4.026828.00
ORTHOMCL39113 SKOW|XP_006824669.1 SMED|mk4.000573.09
ORTHOMCL39114 SKOW|XP_006825572.1 SMED|mk4.000954.05
ORTHOMCL39115 SMED|mk4.003454.01 TCAS|XP_008196576.1
ORTHOMCL39116 SMED|mk4.004076.04 TCAS|XP_008196044.1
ORTHOMCL39117 SMED|mk4.007341.02 TCAS|XP_974486.1
ORTHOMCL39118 SMED|mk4.014650.00 TCAS|XP_001810176.2
ORTHOMCL39119 SMED|mk4.044940.01 TCAS|XP_008190410.1

This is head of orthologus.txt file:

1298|c10008_g1_i1.2-511.F2      1299|c6900_g2_i1ppm.1820        0.563
1298|c10009_g1_i1.1-267.F1      1299|c17675_g1_i1ppm.9461       0.386
1298|c10033_g1_i1.392-81.F2     1299|c13365_g1_i1ppm.8946       0.47
1298|c10041_g1_i1.265-2.F1      1299|c12375_g1_i1ppm.7630       0.395
ADD REPLY
0
Entering edit mode

I have seen on papers on which people used only single copy genes to build phylogenetic tree. By the way, thank you so much for your help.

ADD REPLY
0
Entering edit mode

Sorry, I do not recognize all of these identifiers (TCAS tribolium, SMED is maybe smedGD database, but too much guessing), maybe someone who understands the orthomlc output more can help you better. Doesn't the software also predict paralogs? If not you have to try to get the paralogs from e.g. ensembl biomart using these identifiers.

ADD REPLY
0
Entering edit mode
8.7 years ago
ALchEmiXt ★ 1.9k
If not restricted to orthomcl.... try proteinortho5 which allows easy identification of single/no orthologous genes.
ADD COMMENT

Login before adding your answer.

Traffic: 1543 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6