Question: blastn report apparently not showing all of the alignments, probably re-adjusting parameters, which option can control that?
gravatar for prishly
2.6 years ago by
prishly10 wrote:

I blasted locally a series of short dna queries - all in one fasta file - against portions of ngs reads from the same set: 10 reads, then 100, 1000 and finally 10k reads. Each read portion included all the smaller ones. But the number of hits that the blast report contained seems to be reaching a plateau rather than increase more or less proportionally. I also noticed that that the reads with hits for a given query don't accumulate as the number of reads blasted increases: reads with hits found when blasting 1000 reads are generally different from those found when blasting against 10k reads instead of the former always being included in the latter. Max_hsps option doesn't have any effect, max_alignments doesn't change much either. In contrast, blat seems to behave the 'correct' way. Is blast adjusting some parameters to increase speed as the size of dna database increases? If I understand it correctly, there's a comp_based_stats option for protein alignment (blastp) to control just that but not for blastn which I'm using. How can I make blastn report all the alignments that fit the criteria with no adjustments for dna db size? Must be something simple I'm missing here...

dna blastn blast alignment ngs • 1.1k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by prishly10

Were you using -task blastn-short option?

ADD REPLYlink written 2.6 years ago by GenoMax93k

No, parameters had blastn default values

ADD REPLYlink written 2.6 years ago by prishly10

For 100, 1000, 10k and appr. 230k (entire set) reads I had the following results/number of hits:

  • blastn default 364, 1762, 12806, 13068
  • blastn short 3011, 4682, 13104, 14553
  • blat 212, 1074, 21138, 301037.

Every read should yield at least 1 hit, most of them more than one. It's the same set of chimeric reads as in my previous questions. At first I thought that there are errors in my perl parser that counts alignment hits, but the sheer report file size and number of lines confirms that the figures are correct. I used very relaxed parameters for blat because queries were short (around 20 bases).

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by prishly10

Let me ask this. Are you looking to see how much redundancy there is in these sequences? There are other options for that than blast.

ADD REPLYlink written 2.6 years ago by GenoMax93k

No, it started out as an attempt to trim primers in our amplicon library reads (and split them if reads are chimeric after adapter ligation, which most reads seem to be). (Finding all possible alignments of two sequences)

ADD REPLYlink written 2.6 years ago by prishly10

Trimming primers may be best done using a scan/trim program. from BBMap can rapidly find reads that match arbitrary sequences. Guide here. Could be used to identify chimeric reads quickly.

ADD REPLYlink written 2.6 years ago by GenoMax93k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1060 users visited in the last hour