Question

If there are two or more B genes with same highest bit score in RBH method

0

Entering edit mode

5.7 years ago

madapada • 0

Hi. I want to analyze the coding sequence (CDS) data of closely related vertebrates in NCBI database. Most of the species are in the same order. So I've been looking for the proper method to obtain ortholog gene sets from CDS data of these species and now I try RBH method (Reciprocal best hit) by BLAST.

1st BLAST: query-A species, database-B species 2nd BLAST: query-B species, database-A species

Then what I want to ask is that if there are two or more B genes with same highest bit score in the result of the first BLAST using the query of A, should I choose one of these B genes? For example,

species A B

gene1 a1 b1

gene2 a2 b2

gene3 a3 b3

gene4 a4 b4

and if the result of the first BLAST is...

query: a1 =BLAST=> hit: b1 (with highest bit score and identity)

query: a2 =BLAST=> hit: b1 (with highest bit score and identity)

query: a3 =BLAST=> hit: b3 (with highest bit score and identity)

query: a4 =BLAST=> hit: b2, b4 (with same and highest bit score and identity on b2 and b4, mutually)

If I choose one of these B genes, is there any proper index (like identity or e-value...)to compare the genes and choose the best one? Or, if it's ok to use all of them as the next query for the second BLAST, it seems easier for me. But then, can we know the N to N relationships between the gene of A and B without regarding the case like <query: a4="BLAST=>" hit:="" b2,="" b4=""> or <a1, a2="" and="" b1="" are="" on="" "many="" to="" one"="" relationship="">? (see below to see my thought)

a1, a2 and b1 are on "many to one" relationship (-> not ortholog).
a3 and b3 are on "one to one" ortholog relationship (-> ortholog).
a4 and b2, b3 are on "one to many" ortholog relationship (-> not ortholog).

In my regards, it's impossible but then why RBH method is regarded as the good method even though it can't say the real N to N relationship between genes which is very important to find ortholog gene sets?

Sorry for the long question. I'm waiting for your help! Thanks!

sequence genome • 1.6k views

ADD COMMENT • link updated 5.7 years ago by lieven.sterck 15k • written 5.7 years ago by madapada • 0

score 2 · Accepted Answer · 2018-08-10

2

Entering edit mode

5.7 years ago

lieven.sterck 15k

You explain correctly the procedure of the RBH approach and immediately also nicely its main limitations. Indeed in the case of N to N simple RBH will not resolve that. Moreover, i would not consider the RBH approach as the 'good method', I rather consider it an OK start approach that might provide some circumstantial evidence. There are much more sophisticated (reliable) approaches to look for orthology

You should however also provide (or at least take into account) the result of the reverse blast result (you only gave the result of one of your blasts) . Also keep in mind that by definition blast scores are not symmetrical , meaning that the a1 - b1 can have different score compared to b1 - a1 .

There are also a number of tools that have this kind of approach implemented, Inparanoid being the typical one I guess. Added bonus of this one is that is will also report (or try to) the one-many or even many-many relations (== thus also tries to denote the in-paralogs rather than only the orthologs).

last point: it's not strictly correct to label the one-may or many-one as not being orthologs, they can very well be correct orthologs of each other (but that you're probably not looking for those I can understand). Orthology/paralogy etc delineations are notoriously difficult and much research/debate has gone (and will go) into this topic.

ADD COMMENT • link 5.7 years ago by lieven.sterck 15k

0

Entering edit mode

Hi lieven.sterck. Thanks for your kind explanation! It helps me a lot, especially for understanding the limitation of RBH method and the definition of one-to-one orthologs. But there are still some questions remained. Could you help me some more, please?

I couldn't provide my reverse blast results because I've not performed the second blast. In order to do the reverse one, I should decide the next query genes. But It's very confusing for me how to treat the genes with same bit score after the first blast. What I confused is that if there are two (or more) B genes with same, highest bit scores (example==> query: a1 gene, hit1: b1 gene with bit score 750, hit2: b2 gene with bit score 750), I'm not sure if I can use both of them as the query for the second blast. Is it OK or should I remove one of the two genes by using any other methods?

I can't use Inparanoid or other ortholog defining programs because some of my species are not included in their databases. So I think I should keep using RBH.

Thanks again for your help!

ADD REPLY • link 5.7 years ago by madapada • 0

0

Entering edit mode

Well, for the RBH approach I would advise to simply blast the complete set of A proteins against B and also B against A (no subset, just the complete fasta files). In a second step you simply parse the results of both blasts and select the proteins for which a1 - b1 and b1 - a1 . it's not more convenient it's also more correct as by using subsets (or filtering before blasting) you might bias the results.

You can nearly any orthology program also with "new" species, as certainly it is possible for inparanoid (give it a try)

ADD REPLY • link 5.7 years ago by lieven.sterck 15k