Nucleotide BLAST
Entering edit mode
2.8 years ago


Fairly short and sweet question - how many contiguous Ns (= any nucleotide) can you have in a nucleotide BLAST query before you start getting crappy outputs? They're kind of unavoidable for my use case.

Any opinions on the impact of Ns that are more spread out through the sequence (rather than being contiguous) will also be welcome :D


BLAST • 1.4k views
Entering edit mode
2.8 years ago

You should reverse your question/statement: how many meaningful nucleotides should you retain to still get meaningful hits. The Ns don't really matter for blastn, it's the actual sequence that counts.

The theoretical minimum of non-Ns is actually the word size of the blast parameters. for blastn this is 11 default so in that case you will need at least a stretch of 11 OK nucleotides in your query otherwise blast will not find any hits (== looking for 'words' of the word-size is step one in the whole blast process ) . Coming to think of it I seem to remember I once read that NCBI itself recommends to use queries of length at least twice the word-size to get any (meaningful) hits at all.

For your continuous vs spread out Ns that same goes: having an N every 10th nucelotide is actually equally bad as stretches of 10 nucleotides spaced with stretches of Ns.

Once you get past that first step (== the HSP extension /alignment) it becomes a bit more 'tricky' to predict the impact of Ns

Entering edit mode

Perfect, also thanks for the algorithm context - made everything much clearer in general!


Login before adding your answer.

Traffic: 1473 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6