Entering edit mode
9.8 years ago
friasoler
▴
50
Hello everybody!!!
I have a sequence of DNA that matches with two different proteins depending whether I look at the scores or at the sequence identity in BLAST....Which criterion should I trust the most? I have designed primers using this sequence to measure the gene expression of this gene, that's why it is so critical for me to know the exact match to the sequence. Here is the sequence:
GCCGCAGCCCCGCTGCAGACGCGCCGCGTCCCCGCCGGAGAAGGAGCGAGGCCGTTCCCTGCGCATCCTGCAGCAGCATGACTCTTCAGGCTGACTTTGATGGTGCTGCAGAAGATGTAAAAAAaTTAAAAaCaAGACCAACTGATGAAGAACTGAAGGAACTATATGGATTCTACAAACAGGCTACTGTTGGAGATATTAATATTGAATGTCCAGGAATGCTAGATTTGAAAGGCAAAGCCAAATGGGAGGCATGGAACCTGAAAAAAGGTTTATCAAAGGAGGATGCCATGAATGCCTATATCTCTAAAGCAAGAGCAATGGTAGAAAAATATGGAATCTAGAATATTCAAAATAATTCCCACTAATAATTAACTACTCTTCAGTAGCTGATGAACTAACTTGAGAAAAAcGCAGTACTAACTCCTTTTTGTGTAGTCTGACACTAATATCTTTTAAGCATCAGCTGTTTGACTTTAAAGGGTATTTACATATATAATCGATTTTTAGCTTGTATATTAATCTAAATAAATTTGAACTGAATAAATTAAGCTTTATTAAGAATTGTGGATTTTtGTGGGTATTAAATTATATTTAGCATTTTGACAGAAGAAGACAAACAGAAAAGCTCTAACAGTTAAATAACATAGACATGATTTTTTGCAAGCAAGGTTATGGAATAAAGTGAAGAGTTTGTGCATAAGGAAGAGAAGAAGGAAAAGATGAAACCTTTTTtAAGACCCAAAGCCAATGTTTGaTTTTTAAAAAaaTCAGGAAAaCTTCCCCTTATAAAGGATTACAGAGGAGGACCAGAACAACTTTTAGGCATAACTGCATGCAATGTAGAGAAaGAAGTGACTTATTATAAATTGCTGTGGACTAACCTACACATTCTGCCATTAAAaTTGaGGgAAaTaCTCAtAGACTGGCaTTTTcTATGCATGTTGtGATATGTTTTATCAAGAAacTTTCATTAGATGGTTTCAGcAGATAAAAGTGATCTCCAGGAAGgTCATAAAAGGAAACATCtCCaTTTGTtAGTtCTtGCcAaCCTAAAAAaGATATTtGAAGTGTCAGAGAAaC
Thanks in advance
Roberto
I guess that depends on what the score/identity values is. if it is in a gray zone then this is a tricky question but if not than:
In Score we trust.
Bitscore > Evalue > Identity
High identity means nothing by itself because it can be for a very short alignment covering just a tiny proportion of the query sequence. Keep that in mind. Basic Local Alignment Search Tool.
As you said: "High identity means nothing by itself because it can be for a very short alignment covering just a tiny proportion of the query sequence". That is why in grey zone ( e = 10^-2 - 10^-4) the two values can give different results and thus pose a challenge while interpreting. In such cases I would always trust score value over identity. So I don't quite get your reply to my post.
ps
Thumbs up
Well, it was really meant as a reply for OP. Also, generally I wouldn't even consider hits with such high evalue, I mean 1 in 100 or even 10,000 isn't very good if your db has millions of sequences.
Thanks for your answers .-)
I have this extreme alternatives for the Alignment :
If I follow your criteria I have to choose: acyl-CoA-binding protein-like?
Thanks
roberto
My guess is that these are multi domain proteins and in the case of the second hit you're getting a nice hit to one domain, whereas in the first case you're getting a hit that covers multiple domains..
I second that. so if you are looking for a gene and not a domain the first hit should be your choice.
Tx you all very much .-)
Roberto