Question: Best method for orthology prediction of more than two protein datasets
0
gravatar for ariannapbartlett
2.2 years ago by
ariannapbartlett0 wrote:

Hello all,

Looking for any suggestions on the currently accepted methodology for isolating orthologous proteins from multiple datasets. We are working with eukaryotes who are non-model organisms. Our datasets are in proteins assembled using transdecoder and we have done our best to eliminate redundant sequences. I am somewhat familiar with Hamstr, Orthofinder, OrthoDB, etc. but am not super confident as to which method would be best. Our goal is to rule out paralogous genes and construct a phylogenetic tree. We then want to explore certain genes of interest that are shared between the different species. Any links to good reviews would also be appreciated.

Best,

A.B.

rna-seq genome • 534 views
ADD COMMENTlink modified 2.2 years ago by Jean-Karim Heriche23k • written 2.2 years ago by ariannapbartlett0

Hi,

Can you describe what you done to eliminate redundant sequences? how did you obtain your proteins, from genome or transcriptome? If you have proteins from genome and transcriptome derived, I can suggest that you can first get orthologs of genome-derived protein data set, and later you can use those orthologs to find in transcriptome-derived proteins. If you use both genome and transcriptome-derived proteins together in orthologs analysis, you may not get enough number (>50) orthologs proteins (if you have more than 10 species data).

In addition to tools you mentioned you can use OMA tool, but OMA requires much storage area and takes longer than other tools.

ADD REPLYlink written 2.2 years ago by Mehmet580
0
gravatar for Jean-Karim Heriche
2.2 years ago by
EMBL Heidelberg, Germany
Jean-Karim Heriche23k wrote:

Check out the TreeFam papers. The project isn't active anymore but the pipeline is now part of Ensembl Compara and the code still available. You could either run the pipeline again with your sequences or use the Ensembl compara HMMs to identify families for your proteins and add them to the corresponding trees.
By definition, you can only identify paralogues if you build a phylogenetic tree.

ADD COMMENTlink written 2.2 years ago by Jean-Karim Heriche23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 639 users visited in the last hour