Force blast to align the whole query read
1
3
Entering edit mode
5.1 years ago
godeludanu ▴ 30

I have looked through the documentation for BLAST command line tools and can't figure this out.

I have a set of long reads and I would like to find the accuracy of the reads. I am aligning the reads against the reference genome using BLAST.

However, very often BLAST will shorten a query read to get a better alignment, how can I force BLAST to only align the whole read? In the cases where the whole read does not align just create mismatches.

blast • 6.0k views
3
Entering edit mode

BLAST : blast basic local alignment search tool ;-)

0
Entering edit mode
5.1 years ago

In blastn you have this parameter :

-qcov_hsp_perc <Real, 0..100>


Which is the percent query coverage per hsp. Set it to 100 to return only hits with full coverage on query.

PS : I'm not sure that blast is the best tool for read mapping on a genome...

EDIT : note that this parameter only filters the results and doesn't change the way BLAST seach for alignments, meaning that many possible hits will not be found. See discussion below for details.

0
Entering edit mode

This will only return hits that have that query coverage, not force blast to align the whole query.

0
Entering edit mode

Edited. I understand the conceptual difference but the end result is still the same right ?

0
Entering edit mode

No, this is a hit reporting filter. OP wants to force BLAST to perform a global alignment, setting this to 100 would only cause alignments with less than 100% coverage to not be reported in the results. So OP would simply stop seeing hits for the reads mentioned in their post.

From the BLAST website:

...command line option qcov_hsp_perc that removes alignments below the specified query coverage.


In theory, playing with the gap and mismatch penalties should allow what OP describes, but I don't think this is optimal.

OP should be using something like BWA.

0
Entering edit mode

I certainly agree with the BWA/bowtie suggestion, but I'm not sure to understand your point concerning BLAST. With -qcov_hsp_perc set to 100, if there is a hit with 100% coverage, it will be reported even if its not the absolute best hit. Conversely, if no hit is reported for the read, then that means that there is no alignment with 100% coverage leading to a significant e-value (correct me if I'm wrong)...

1
Entering edit mode

Yes, exactly. Using this option BLAST will only report hits with 100% coverage having sufficient expect values. Assuming they exist, which they won't.

BLAST is a local alignment tool, it is only designed to find local alignments. BLAST will only concatenate HSPs if the score of the new segment is significantly (or just greater, depending on the specific version) greater than the score of the current segment. Gaps and mismatches have penalties associated with them. So, if a new segment requires too many gaps and/or mismatches, the score of that new segment will be less than or not significantly larger than the current (shorter) best hit. The shorter hit has a higher score and represents the local optimum and is what is reported.

Setting qcov_hsp_perc doesn't change how BLAST searches for things, so if OP isn't finding 100% coverage hits already, setting this won't change anything. Outside of a few specific situations, I doubt there's a case where BLAST would extend to cover 100% of the query and score it lower than a shorter hit.

Basically, OP will see the same results as before, with hits having less than 100% coverage being omitted from the results.

0
Entering edit mode

Thanks for these precisions. I understand your point now.