Question: Finding The Orthologs In The Genome
3
gravatar for Viktor
7.3 years ago by
Viktor30
Viktor30 wrote:

Hi

I want to know how much your ortholog identification affects the phylogentic tree if you use cdna,cds or protein sequence to find the orhtolog using the BLAST.Which is the best sequences to start the ortholog identification

orthologues • 4.0k views
ADD COMMENTlink modified 7.3 years ago by 2184687-1231-83-4.9k • written 7.3 years ago by Viktor30

Orthology and paralogy are evolutionary concepts (defined by Fitch in 1970). Orthologous genes are homologous sequences that started to diverge through a speciation event. The same, paralogs orginate from duplication events. BLAST finds sequence similarities and based on this you can make statement whether two sequences are homologous or not. But it tells you nothing about orthology/paralogy.

ADD REPLYlink written 7.3 years ago by Leszek4.0k

Thanks for your reply ... suppose if i have two genome (human and chimp) cdna I make a blast database of that and set a blast cut off say 1e-10 and do the blast of that .now i do not select those genes from chimp cdna for which I have two or more cdna as a hit as they are very similar (e values are very near) and i put the criteria of length also . Then isn't it you are removing the paralogs from them ?

ADD REPLYlink written 7.3 years ago by Viktor30

The strategy you are describing is similar in concept to defining inparalogs and using that definition to filter out one-to-many and many-to-many orthologs from your pair of species.

ADD REPLYlink written 7.3 years ago by Ahdf-Lell-Kocks1.6k

Thanks again ... do you think if in place of cdna if i take protein sequence i can reach better resolved phylogentic tree or the tree will be more or less same if my species are not so diverged. Definitely the databases provides a great resource for the analysis, but if you take this strategy how close you are to the correct phylogentic tree - regards

ADD REPLYlink written 7.3 years ago by Viktor30

@victor: it should work for pair of species, but not always. have a look here for comparison of methods: http://www.biomedcentral.com/content/pdf/gb-2008-9-10-235.pdf (Fig.2) and keep in mind, that evolution of many families is complex (including many duplications and losses), so assignment of orthology without phylogenetic reconstruction often leads to wrong assignments what is better for phylogenetic reconstruction, dna or protein, look here: http://biostar.stackexchange.com/questions/3739/protein-phylogenetic-analysis (the same link is in my post below)

ADD REPLYlink written 7.3 years ago by Leszek4.0k

Answering to to "viktor Mar 2 at 7:32": resolution is conditional to distance: at same species level we detect SNPs by basically aligning genomes, for different species at short genetic distances, genomic alignments at 5'UTR, cdna, even intronic levels better reflect the phylogeny of the species, at longer distances, cdna overlaps with protein similarity, then at even longer distances, protein similarity and conserved domains are better at resolving phylogenies than anything else. So it's a continuum from genome alignments to gene/cdna alignments to protein alignments to conserved domains. HIH

ADD REPLYlink written 7.3 years ago by Ahdf-Lell-Kocks1.6k
5
gravatar for 2184687-1231-83-
7.3 years ago by
2184687-1231-83-4.9k wrote:

If you work with protein sequences you can reach out to more distant orthology relationships.

In my experience, using HMMER's jackhmmer tool to search for homology of a query protein against a set of target proteins is the approach that gives the most distant relations.

BLAST+ is a very good option in terms of speed/sensitivity if you proteomes are not extremely distant. OrthoMCL is a good option for simplicity of use.

If the two proteomes or sets of cdnas are close like human-chimp, it is important to be able to separate one-to-one from one-to-many from many-to-many orthologues, and for that using gene trees usually helps. There was a recent method published that is specially important for the analysis of gene trees for closely related species, called DLCoal:
http://www.ncbi.nlm.nih.gov/pubmed/22271778

ADD COMMENTlink modified 7.3 years ago • written 7.3 years ago by 2184687-1231-83-4.9k

Yes, this is good for homologous proteins, but the question pertains to orthologs, which presumably are not so distantly related.

ADD REPLYlink written 7.3 years ago by Larry_Parnell16k

I have added a few comments now that the user has given more details.

ADD REPLYlink written 7.3 years ago by 2184687-1231-83-4.9k

thanks for your reply ... it was very useful for me ... especially the DLCoal

ADD REPLYlink written 7.3 years ago by Viktor30
1
gravatar for Leszek
7.3 years ago by
Leszek4.0k
IIMCB, Poland
Leszek4.0k wrote:

Have a look at:
protein or dna for phylogenetic analyses
finding-protein-homology
the-best-method-to-find-orthologous-genes-of-a-species

ADD COMMENTlink written 7.3 years ago by Leszek4.0k
0
gravatar for Larry_Parnell
7.3 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

If you're interested in protein-coding genes and their orthologs, then use a protein sequence as your query in sequence similarity searches.

ADD COMMENTlink written 7.3 years ago by Larry_Parnell16k
0
gravatar for Lhl
7.3 years ago by
Lhl730
United States
Lhl730 wrote:

if you have protein sequences. please try orthoMCL

ADD COMMENTlink written 7.3 years ago by Lhl730

This seems like a good tool. How well does orthoMCL work if the genome under study is not publicly available or complete?

ADD REPLYlink written 7.3 years ago by Larry_Parnell16k

I think this depends on how good you data is (e.g. how good and correct (i am talking about predicting protein sequences for your newly sequenced coding sequences) are your the protein sequence? ). Kind regards

ADD REPLYlink written 7.3 years ago by Lhl730
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1725 users visited in the last hour