Are The Query Strand And Hit Strand From Blastx The Same Strand?
1
1
Entering edit mode
12.2 years ago

I'm writing code to parse the XML output from a BLASTX search on NCBI's servers, and from this grab from NCBI the nucleotide sequence whose translated protein was found to be a hit by BLASTX . This is working, and I am tracking back from the protein ID to the original nucleotide record ID to do this. But it is possible that my original query sequence is from the other strand, and thus BLASTX may have had to reverse complement it to find the match. I want to know if that is so, in order to automatically reverse-complement one of them. BLASTX in discovering the hit knew whether it was a reverse-complement or not, but I can't see any hint of it reporting this back to me in the XML, nor can I see how to get that info from other queries. Yes, I can try go through the trouble myself of reverse-complimenting the hit sequence, try the three reading frames, translate to amino acids, and see if it matches better with my original sequence, but that's what BLASTX has just done and I would rather just get the info from the BLASTX and/or the NCBI databases. Can I?

That is, how can I determine if BLASTX had to reverse-compliment my query sequence when it found a hit?

Thanks! David

blast xml strand • 5.5k views
ADD COMMENT
1
Entering edit mode
12.2 years ago

There is a query frame tag in the xml file that'll tell you what frame the hit is in. For example here is a line from one of my blast xml output:

<Hsp_query-frame>3</Hsp_query-frame>

Alternatively, You can look at the query start/end and subject start/end coordinates. If your query aligns to reverse of subject, then the subject end coordinate will be smaller than the start coordinate. For example, if SeqA aligns to reverse of SeqB, you might see position 50-100 of SeqA aligning to position 200-150 of SeqB.

ADD COMMENT
0
Entering edit mode

Thanks! The query start/end one makes sense to me, but how does query-frame tell me the direction?

ADD REPLY
0
Entering edit mode

Alas, the query_from/query_to and hit_from/hit_to scheme doesn't work.

I have an example where BLASTX returns query_from less than query_to AND hit_from less than hit_to, and yet the query sequence is definitely the reverse complement of both the response protein sequence, and the response DNA sequence if you go after the nucleotide sequence that is eLinked to the protein BLASTX response.

Any other thoughts on how to solve this?

ADD REPLY
0
Entering edit mode

If the Hsp_query-frame is positive (plus strand) then the supplied query sequence was matched. Alternatively if Hsp_query-frame is negative then the reverse complement (minus strand) of the query sequence was matched.

ADD REPLY

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6