Question: Force blast to align the whole query read
3
gravatar for godeludanu
3.1 years ago by
godeludanu30
godeludanu30 wrote:

I have looked through the documentation for BLAST command line tools and can't figure this out.

I have a set of long reads and I would like to find the accuracy of the reads. I am aligning the reads against the reference genome using BLAST.

However, very often BLAST will shorten a query read to get a better alignment, how can I force BLAST to only align the whole read? In the cases where the whole read does not align just create mismatches.

Thanks for your help.

blast • 3.8k views
ADD COMMENTlink modified 3.1 years ago by Carlo Yague4.4k • written 3.1 years ago by godeludanu30
3

BLAST : blast basic local alignment search tool ;-)

ADD REPLYlink written 3.1 years ago by Pierre Lindenbaum120k
0
gravatar for Carlo Yague
3.1 years ago by
Carlo Yague4.4k
Belgium
Carlo Yague4.4k wrote:

In blastn you have this parameter :

-qcov_hsp_perc <Real, 0..100>

Which is the percent query coverage per hsp. Set it to 100 to return only hits with full coverage on query.

PS : I'm not sure that blast is the best tool for read mapping on a genome...

EDIT : note that this parameter only filters the results and doesn't change the way BLAST seach for alignments, meaning that many possible hits will not be found. See discussion below for details.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Carlo Yague4.4k

This will only return hits that have that query coverage, not force blast to align the whole query.

ADD REPLYlink written 3.1 years ago by pld4.8k

Edited. I understand the conceptual difference but the end result is still the same right ?

ADD REPLYlink written 3.1 years ago by Carlo Yague4.4k

No, this is a hit reporting filter. OP wants to force BLAST to perform a global alignment, setting this to 100 would only cause alignments with less than 100% coverage to not be reported in the results. So OP would simply stop seeing hits for the reads mentioned in their post.

From the BLAST website:

...command line option qcov_hsp_perc that removes alignments below the specified query coverage.

In theory, playing with the gap and mismatch penalties should allow what OP describes, but I don't think this is optimal.

OP should be using something like BWA.

ADD REPLYlink written 3.1 years ago by pld4.8k

I certainly agree with the BWA/bowtie suggestion, but I'm not sure to understand your point concerning BLAST. With -qcov_hsp_perc set to 100, if there is a hit with 100% coverage, it will be reported even if its not the absolute best hit. Conversely, if no hit is reported for the read, then that means that there is no alignment with 100% coverage leading to a significant e-value (correct me if I'm wrong)...

ADD REPLYlink written 3.1 years ago by Carlo Yague4.4k
1

Yes, exactly. Using this option BLAST will only report hits with 100% coverage having sufficient expect values. Assuming they exist, which they won't.

BLAST is a local alignment tool, it is only designed to find local alignments. BLAST will only concatenate HSPs if the score of the new segment is significantly (or just greater, depending on the specific version) greater than the score of the current segment. Gaps and mismatches have penalties associated with them. So, if a new segment requires too many gaps and/or mismatches, the score of that new segment will be less than or not significantly larger than the current (shorter) best hit. The shorter hit has a higher score and represents the local optimum and is what is reported.

Setting qcov_hsp_perc doesn't change how BLAST searches for things, so if OP isn't finding 100% coverage hits already, setting this won't change anything. Outside of a few specific situations, I doubt there's a case where BLAST would extend to cover 100% of the query and score it lower than a shorter hit.

Basically, OP will see the same results as before, with hits having less than 100% coverage being omitted from the results.

ADD REPLYlink written 3.1 years ago by pld4.8k

Thanks for these precisions. I understand your point now.

ADD REPLYlink written 3.1 years ago by Carlo Yague4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1503 users visited in the last hour