Using blast to find homologs between two species
1
0
Entering edit mode
12 months ago
Chris ▴ 10

Hi all,

I was wondering if there is a standardised pipeline / tool for comparing all the genes or genomes of two organisms to generate a list of homologs between the two. Something similar to Orthofinder but for homologs.

I am currently trying Blast, but I feel like my parameters aren't stringent enough as each gene has multiple matches. Is there a standardised set of parameters to set blast to which allows detection of homologs instead of similar regions of a gene?

Many thanks,

blast orthofinder homology • 682 views
ADD COMMENT
0
Entering edit mode

Evolutionarily, how similar or apart are these organisms? Have you looked at NCBI homologene section in case they have a precomputed result (if your organisms are included).

ADD REPLY
1
Entering edit mode
12 months ago
Mensur Dlakic ★ 27k

I am currently trying Blast, but I feel like my parameters aren't stringent enough as each gene has multiple matches.

I am not sure you have the right approach to this problem. Proteins are domain-based, especially if you are working with eukaryotes.

Any two proteins that share the same domain will have a significant match to each other. Let's say that we have two proteins with domains A-B-C-D and A-X-Y-Z. These are different proteins, yet they have a region of similarity. If domain A is big enough, say >200 residues, a BLAST match between these two proteins will persist even if you lower the E-value threshold to 1e-40. I think that is what you refer to as multiple matches. No single BLAST parameter such as E-value, regardless of how you set it, will give you a single hit per protein.

I suggest you try a program such as MMseqs2 which is meant for clustering proteins. It will take into account protein similarities, lengths and domain compositions when creating clusters. This still may not be perfect and will require manual cleanup here and there, but it should give you much better starting point than BLAST comparisons.

I suggest you start here and try different clustering parameters.

ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6