Aligned Ranges In Blast
2
1
Entering edit mode
12.8 years ago
Random ▴ 160

I have several sequences which I'm blasting against the nt database, and I wanted to know how to retrieve for each hit the intervals matched.

As far as I'm aware, the standard output only gives this interval both in the query and reference sequence for the largest continuous match.

What happens many times is that my query sequence has more than one continuous match, for which I will also would like to know the positions.

For example,take this as my query sequence with results for some organisms from the nt DB that matched at some positions:

M=match
-=Nothing aligns here


---MMMMMMM-------MMMMMMMMMMMM

In this case I will only get the interval positions for the second match, but since the first is also large I will want that.

I'm running these on my local computer with the standalone blast.

How can i make this happen?

The reason why I want to do this to see how much of my query sequences cover some organisms.

blast ncbi web • 4.5k views
ADD COMMENT
0
Entering edit mode

can you show us one of your problematic blast output ?

ADD REPLY
0
Entering edit mode

It's not really problematic. But I'll attach a BLAST image to clarify my question

http://imgur.com/J7Vgc

So as you can see my query sequence is matched by the same organism at two large ranges. They are probably separated by non-match base pairs probably due to bad contig assembly, which inserted sequences not belonging to this organism between those two. BLAST will only give me the coordinates for the first one because it is the largest. And I also want the coordinates for the second.

How can i retrieve them in standalone blast?

ADD REPLY
0
Entering edit mode

yes, please paste the XML result with just one query sequence.. there should be all HSPs listed with the corresponding sequence matches

ADD REPLY
1
Entering edit mode
12.8 years ago

The aligned sequences in the results part are ordered by e-value, which means you just have to scroll further down usually (or increase the max. number of alignments shown).

Try [?]CTRL[?]+F and enter your query sequence name.

ADD COMMENT
0
Entering edit mode

Indeed it is. I was requesting only one hit per match, unaware that it outputs all the alignment coordinates in the following lines. Thanks

ADD REPLY
1
Entering edit mode
12.8 years ago

You can get the intervals for hits if you supply the -m 8 option to Blast. This will make blast output hits in tab separated file with 12 columns as described here http://www.pangloss.com/wiki/Blast

Then run:

cat my_blast_output.tab | cut -f 2,9,10 > results_intervals.tab

and you have the intervals (start, stop positions) of the subjects.

ADD COMMENT
0
Entering edit mode

I know, what I hadn't realized is that for each hit to a reference, several lines with the different intervals are produced, in the case that my query sequence matches more than one place for the same organism. Since i was only requesting the first line for each sequence I was missing a lot of intervals for each hit.

ADD REPLY

Login before adding your answer.

Traffic: 1949 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6