Detecting unique genes or non-homologus genes in a species
3
2
Entering edit mode
9.2 years ago
Ritvik ▴ 30

Hi,

What's the best method to find unique genes or non-homologous genes in a species when comparing two or more species?

I am using blast but I have a problem as to what should be the criteria for deciding that no significant hit has been found for a particular gene. Is the criteria 70% query coverage, 30% identity and e value .01 good enough for this? But here also there is a problem regarding query coverage as sometimes the query length is smaller than hit length and we get 100% query coverage, should this be deemed correct or the greater of the two sequence lengths should be used in determining coverage as the the query and the hit sometimes encode functionally different proteins?

blast homology • 4.7k views
ADD COMMENT
2
Entering edit mode
9.2 years ago
sentausa ▴ 650

I agree with dago that you're basically searching the non-orthologs. To do that for (several) bacteria, I used OrthoMCL. It gives you the orthologous genes/proteins among your species, and the unique genes are the genes which are not clustered by OrthoMCL. I used the downloadable software, though it seems now that we can use it on their web interface.

ADD COMMENT
0
Entering edit mode

I did know that there was a web interface! This is quite interesting! Plus one for sharing!

ADD REPLY
0
Entering edit mode

What do you think about get_homologues. It is integrating OMCL, BDBH and COG to create consensus core and pan genomes?

ADD REPLY
0
Entering edit mode

Yeah, I just saw OrthoMCL website when I tried to find the link to include in my answer and I was also surprised to see the web interface. I've never heard of get_homologues before, but it sounds interesting too!

ADD REPLY
1
Entering edit mode
9.2 years ago

Blast is not a good tool for finding distant homologs as you've just found out. If the species are covered by it, you could use a database of phylogenetic trees e.g. TreeFam and consider genes that are not in a tree with a gene from the other species. If your species of interest are not covered by TreeFam but are metazoans, then you could scan their genes against TreeFam HMMs with the treefam_scan.pl script.

ADD COMMENT
0
Entering edit mode

Thanks for the answer but, I am working on bacterial species and trying to find genes which are present in a pathogenic species and absent by homology in a non-pathogenic species.

ADD REPLY
0
Entering edit mode
9.2 years ago
dago ★ 2.8k

If you want to use Blast you have to first define the orthologus. You could apply the best reciprocal hits approach, defining as criterion the 50-50 rule (for bacterial this is commonly used)

  • Blast specie A against specie B
  • Blast specie B against A
  • Select the best reciprocal hit, meaning if gene X has as best match Y also Y has to have X has best match.

Otherwise, you could find the orthologus using COG, OMCL and other algorithms. I personally use the program get_homologus

When you know the orthologus, meaning the genes that are shared, you can look for the one that are unique.

ADD COMMENT
0
Entering edit mode

Thanks for the answer, so this is what should I do. In my case, I am checking if out of 100 genes in species A are there any genes which are not present in species B. So, I would blast these 100 genes against genome of species B then I should take the top hits and blast it against the whole genome of species A (or those 100 genes only?) and identify the reciprocal best hits. So if any gene out of these 100 in species A don't have a reciprocal best hit then that gene should be considered unique to species A?

Also should I eliminate those ortholog pairs that contain genes of species A and B that encode functionally different proteins if by chance, I find any?

ADD REPLY
0
Entering edit mode

I would blast them against the whole genome. If the best hits are on your 100, then you can say with a certain degree of confidence that these genes are orthologus. If there is no reciprocal best hit you can speculate that they are unique. I will suggest you to specify always the detail you use for your analysis, because applying different methods you can get slightly different results.

ADD REPLY
0
Entering edit mode

Ok, thanks, just one last question is it logically correct to eliminate those ortholog pairs that contain genes of species A and B that encode functionally different proteins if by chance, I find any?

ADD REPLY

Login before adding your answer.

Traffic: 2077 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6