Blast scores...two annotations for the same piece of sequence
0
1
Entering edit mode
8.0 years ago
friasoler ▴ 30

Hello everybody!!!

I have a sequence of DNA that matches with two different proteins depending whether I look at the scores or at the sequence identity in BLAST....Which criterion should I trust the most? I have designed primers using this sequence to measure the gene expression of this gene, that's why it is so critical for me to know the exact match to the sequence. Here is the sequence:

GCCGCAGCCCCGCTGCAGACGCGCCGCGTCCCCGCCGGAGAAGGAGCGAGGCCGTTCCCTGCGCATCCTGCAGCAGCATGACTCTTCAGGCTGACTTTGATGGTGCTGCAGAAGATGTAAAAAAaTTAAAAaCaAGACCAACTGATGAAGAACTGAAGGAACTATATGGATTCTACAAACAGGCTACTGTTGGAGATATTAATATTGAATGTCCAGGAATGCTAGATTTGAAAGGCAAAGCCAAATGGGAGGCATGGAACCTGAAAAAAGGTTTATCAAAGGAGGATGCCATGAATGCCTATATCTCTAAAGCAAGAGCAATGGTAGAAAAATATGGAATCTAGAATATTCAAAATAATTCCCACTAATAATTAACTACTCTTCAGTAGCTGATGAACTAACTTGAGAAAAAcGCAGTACTAACTCCTTTTTGTGTAGTCTGACACTAATATCTTTTAAGCATCAGCTGTTTGACTTTAAAGGGTATTTACATATATAATCGATTTTTAGCTTGTATATTAATCTAAATAAATTTGAACTGAATAAATTAAGCTTTATTAAGAATTGTGGATTTTtGTGGGTATTAAATTATATTTAGCATTTTGACAGAAGAAGACAAACAGAAAAGCTCTAACAGTTAAATAACATAGACATGATTTTTTGCAAGCAAGGTTATGGAATAAAGTGAAGAGTTTGTGCATAAGGAAGAGAAGAAGGAAAAGATGAAACCTTTTTtAAGACCCAAAGCCAATGTTTGaTTTTTAAAAAaaTCAGGAAAaCTTCCCCTTATAAAGGATTACAGAGGAGGACCAGAACAACTTTTAGGCATAACTGCATGCAATGTAGAGAAaGAAGTGACTTATTATAAATTGCTGTGGACTAACCTACACATTCTGCCATTAAAaTTGaGGgAAaTaCTCAtAGACTGGCaTTTTcTATGCATGTTGtGATATGTTTTATCAAGAAacTTTCATTAGATGGTTTCAGcAGATAAAAGTGATCTCCAGGAAGgTCATAAAAGGAAACATCtCCaTTTGTtAGTtCTtGCcAaCCTAAAAAaGATATTtGAAGTGTCAGAGAAaC


Roberto

alignment • 1.9k views
2
Entering edit mode

I guess that depends on what the score/identity values is. if it is in a gray zone then this is a tricky question but if not than:

In Score we trust.

2
Entering edit mode

Bitscore > Evalue > Identity

High identity means nothing by itself because it can be for a very short alignment covering just a tiny proportion of the query sequence. Keep that in mind. Basic Local Alignment Search Tool.

1
Entering edit mode

As you said: "High identity means nothing by itself because it can be for a very short alignment covering just a tiny proportion of the query sequence". That is why in grey zone ( e = 10^-2 - 10^-4) the two values can give different results and thus pose a challenge while interpreting. In such cases I would always trust score value over identity. So I don't quite get your reply to my post.

ps

Thumbs up

1
Entering edit mode

Well, it was really meant as a reply for OP. Also, generally I wouldn't even consider hits with such high evalue, I mean 1 in 100 or even 10,000 isn't very good if your db has millions of sequences.

0
Entering edit mode

I have this extreme alternatives for the Alignment :

PREDICTED: Ficedula albicollis acyl-CoA-binding protein-like (LOC101820061), mRNA
Max score Total score Query cover E value Iden
887             887                46%                0.0      98%
Select seq ref|XM_005040820.1|             PREDICTED: Ficedula albicollis S-acyl fatty acid synthase thioesterase, medium chain-like (LOC101815966), mRNA
Max score Total score Query cover E value Iden
239              239              12%                8e-59    99%


If I follow your criteria I have to choose: acyl-CoA-binding protein-like?

Thanks
roberto

2
Entering edit mode

My guess is that these are multi domain proteins and in the case of the second hit you're getting a nice hit to one domain, whereas in the first case you're getting a hit that covers multiple domains..

1
Entering edit mode

I second that. so if you are looking for a gene and not a domain the first hit should be your choice.

0
Entering edit mode

Tx you all very much .-)
Roberto