Hello, I have a question about finding reciprocal best hits in the NCBI protein blast.
If I understand well, if you blast a gene of species X and find certain genes of other species above your threshold, then those are homologs. To ascertain orthology, you need to take those homologs and blast again and see if the best hit is the original query gene of species X.
But what if you find 5 resulting genes of a single species Y after blasting gene of species X. You then try to blast these 5 resulting genes, and their best hits are always one of each other's genes of species Y with the gene of species X after these 4 other genes.
Are those 5 genes of species Y not orthologs or paralogs then? Are they just plain homologs even if they had the strongest identities? What if you get a slightly better reciprocal hit of another gene in another species? Are these genes homologs or still orthologs?
Thank you
Maybe RBH should not be discarded immediately. It depends on the requirements, roughly:
Graph-based methods (i.e RBH) when: - Large dataset with many species and genes and you care about speed - Don't care about full phylogenetic trees including gene losses, but ok with just in-paralogs
Tree-based methods (NJ,ML...): - Sure about accuracy of alignment and model - Small dataset, perhaps one tree - Interested in very accurate predictions including all evolutionary events