Question

blastn report apparently not showing all of the alignments, probably re-adjusting parameters, which option can control that?

0

Entering edit mode

7.3 years ago

prishly ▴ 10

I blasted locally a series of short dna queries - all in one fasta file - against portions of ngs reads from the same set: 10 reads, then 100, 1000 and finally 10k reads. Each read portion included all the smaller ones. But the number of hits that the blast report contained seems to be reaching a plateau rather than increase more or less proportionally. I also noticed that that the reads with hits for a given query don't accumulate as the number of reads blasted increases: reads with hits found when blasting 1000 reads are generally different from those found when blasting against 10k reads instead of the former always being included in the latter. Max_hsps option doesn't have any effect, max_alignments doesn't change much either. In contrast, blat seems to behave the 'correct' way. Is blast adjusting some parameters to increase speed as the size of dna database increases? If I understand it correctly, there's a comp_based_stats option for protein alignment (blastp) to control just that but not for blastn which I'm using. How can I make blastn report all the alignments that fit the criteria with no adjustments for dna db size? Must be something simple I'm missing here...

blast alignment ngs dna blastn • 2.8k views

ADD COMMENT • link 7.3 years ago by prishly ▴ 10

0

Entering edit mode

Were you using -task blastn-short option?

ADD REPLY • link 7.3 years ago by GenoMax 152k

0

Entering edit mode

No, parameters had blastn default values

ADD REPLY • link 7.3 years ago by prishly ▴ 10

0

Entering edit mode

For 100, 1000, 10k and appr. 230k (entire set) reads I had the following results/number of hits:

blastn default 364, 1762, 12806, 13068
blastn short 3011, 4682, 13104, 14553
blat 212, 1074, 21138, 301037.

Every read should yield at least 1 hit, most of them more than one. It's the same set of chimeric reads as in my previous questions. At first I thought that there are errors in my perl parser that counts alignment hits, but the sheer report file size and number of lines confirms that the figures are correct. I used very relaxed parameters for blat because queries were short (around 20 bases).

ADD REPLY • link 7.3 years ago by prishly ▴ 10

0

Entering edit mode

Let me ask this. Are you looking to see how much redundancy there is in these sequences? There are other options for that than blast.

ADD REPLY • link 7.3 years ago by GenoMax 152k

0

Entering edit mode

No, it started out as an attempt to trim primers in our amplicon library reads (and split them if reads are chimeric after adapter ligation, which most reads seem to be). (Finding all possible alignments of two sequences)

ADD REPLY • link 7.3 years ago by prishly ▴ 10

0

Entering edit mode

Trimming primers may be best done using a scan/trim program. bbduk.sh from BBMap can rapidly find reads that match arbitrary sequences. Guide here. Could be used to identify chimeric reads quickly.

ADD REPLY • link 7.3 years ago by GenoMax 152k