Finding The Orthologs In The Genome
4
5
Entering edit mode
12.2 years ago
Viktor ▴ 50

Hi

I want to know how much your ortholog identification affects the phylogentic tree if you use cdna,cds or protein sequence to find the orhtolog using the BLAST.Which is the best sequences to start the ortholog identification

orthologues • 7.0k views
ADD COMMENT
0
Entering edit mode

Orthology and paralogy are evolutionary concepts (defined by Fitch in 1970). Orthologous genes are homologous sequences that started to diverge through a speciation event. The same, paralogs orginate from duplication events. BLAST finds sequence similarities and based on this you can make statement whether two sequences are homologous or not. But it tells you nothing about orthology/paralogy.

ADD REPLY
0
Entering edit mode

Thanks for your reply ... suppose if i have two genome (human and chimp) cdna I make a blast database of that and set a blast cut off say 1e-10 and do the blast of that .now i do not select those genes from chimp cdna for which I have two or more cdna as a hit as they are very similar (e values are very near) and i put the criteria of length also . Then isn't it you are removing the paralogs from them ?

ADD REPLY
0
Entering edit mode

The strategy you are describing is similar in concept to defining inparalogs and using that definition to filter out one-to-many and many-to-many orthologs from your pair of species.

ADD REPLY
0
Entering edit mode

Thanks again ... do you think if in place of cdna if i take protein sequence i can reach better resolved phylogentic tree or the tree will be more or less same if my species are not so diverged. Definitely the databases provides a great resource for the analysis, but if you take this strategy how close you are to the correct phylogentic tree - regards

ADD REPLY
0
Entering edit mode

@victor: it should work for pair of species, but not always. have a look here (Fig.2) for comparison of methods.

and keep in mind, that evolution of many families is complex (including many duplications and losses), so assignment of orthology without phylogenetic reconstruction often leads to wrong assignments what is better for phylogenetic reconstruction, dna or protein, look here: http://biostar.stackexchange.com/questions/3739/protein-phylogenetic-analysis (the same link is in my post below)

ADD REPLY
0
Entering edit mode

Answering to to "viktor Mar 2 at 7:32": resolution is conditional to distance: at same species level we detect SNPs by basically aligning genomes, for different species at short genetic distances, genomic alignments at 5'UTR, cdna, even intronic levels better reflect the phylogeny of the species, at longer distances, cdna overlaps with protein similarity, then at even longer distances, protein similarity and conserved domains are better at resolving phylogenies than anything else. So it's a continuum from genome alignments to gene/cdna alignments to protein alignments to conserved domains. HIH

ADD REPLY
5
Entering edit mode
12.2 years ago

If you work with protein sequences you can reach out to more distant orthology relationships.

In my experience, using HMMER's jackhmmer tool to search for homology of a query protein against a set of target proteins is the approach that gives the most distant relations.

BLAST+ is a very good option in terms of speed/sensitivity if you proteomes are not extremely distant. OrthoMCL is a good option for simplicity of use.

If the two proteomes or sets of cdnas are close like human-chimp, it is important to be able to separate one-to-one from one-to-many from many-to-many orthologues, and for that using gene trees usually helps. There was a recent method published that is specially important for the analysis of gene trees for closely related species, called DLCoal:
http://www.ncbi.nlm.nih.gov/pubmed/22271778

ADD COMMENT
0
Entering edit mode

Yes, this is good for homologous proteins, but the question pertains to orthologs, which presumably are not so distantly related.

ADD REPLY
0
Entering edit mode

I have added a few comments now that the user has given more details.

ADD REPLY
0
Entering edit mode

thanks for your reply ... it was very useful for me ... especially the DLCoal

ADD REPLY
1
Entering edit mode
ADD COMMENT
0
Entering edit mode
12.2 years ago

If you're interested in protein-coding genes and their orthologs, then use a protein sequence as your query in sequence similarity searches.

ADD COMMENT
0
Entering edit mode
12.2 years ago
Lhl ▴ 760

if you have protein sequences. please try orthoMCL

ADD COMMENT
0
Entering edit mode

This seems like a good tool. How well does orthoMCL work if the genome under study is not publicly available or complete?

ADD REPLY
0
Entering edit mode

I think this depends on how good you data is (e.g. how good and correct (i am talking about predicting protein sequences for your newly sequenced coding sequences) are your the protein sequence? ). Kind regards

ADD REPLY

Login before adding your answer.

Traffic: 1807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6