Choosing the right BLAST result when using multiple databases
0
0
Entering edit mode
5.4 years ago
Dracaena ▴ 50

Hi guys,

I am a little bit stuck in choosing the right blast result. I have a protein sequence of a non model organism I am interested in (Primula veris, plant). I have successfully performed a blast for this protein against 2 databases, 1 being A. thaliana (one of the best reviewed and most reliable genomes) and Actinidia chinensis, the closest relative to my study species with a fully annotated genome. Here are the results for the blast:

A. thaliana blast: Protein ID NP_191888.2 Identity: 46.36 % E-value: 4e-118 bitscore: 353 (2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein)

A. chinensis blast: Protein ID PSR94821.1 Identity 58.60 % E-value: 1e-154 bitscore 448 (Serine--tRNA ligase)

Both seem good results (very low E-value). Identity is higher for A. chinensis for a lot of protein sequences I have tested. This is normal considering the smaller genetic distance. How do I select the right protein from these results?

Cheers

blast proteinblast • 1.5k views
ADD COMMENT
1
Entering edit mode

Can you add some information about how long your query protein sequence is? What does the alignment look like in terms of how much of that sequence is covered/represented in the result. On face value those two proteins appear unrelated (unless they have common domains which is where your hit is). I would conservatively say that the hit on the genome closest to the one you are working with is probably more reliable, IF everything has been done right.

What happens if you just do delta-blast using your protein at NCBI? What family of proteins do you get in the result?

ADD REPLY
0
Entering edit mode

Thanks for the fast reply! I appreciate it. I have used the command line blast. So these value I have are out of this blast. Now I blasted the sequence in the browser using NCBI. The results for A. thaliana turn up the same as. The result for A. chinensis however are totally different with no good hits... Strange, since I downloaded the fasta files from ncbi, made databases for both of them in the same way and used the same blast parameters. Thanks again for replying.

ADD REPLY
1
Entering edit mode

Do keep in mind (and this is important !) that E-vales are NOT transferable/comparable between searches with different databases. Moreover, the implementation of the NCBI online blast is little bit different then for the standalone ones

I'm also keen to know what your 'definition of best hit' is? best hit to do what with? or do you want most similar one?

why don't you create a DB with both these datasets in it, and then do one blast and pick the top one.

ADD REPLY
0
Entering edit mode

It would still be useful to get answers for the questions I had asked in my comment above.

Did you do blastp and then delta-blast at NCBI site?

ADD REPLY
0
Entering edit mode

What is the "right" protein for you ? If you're trying to find orthologs then you should probably build a phylogenetic tree.

ADD REPLY

Login before adding your answer.

Traffic: 1988 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6