I am completely mystified by this very basic asymmetry in blastn results. I have two FASTA files, which share a 15-mer. However, that common 15-mer only results in a hit if I have one file as subject and the other as query; if I reverse the files, no hit:
$ blastn -task blastn-short -outfmt 6 -ungapped -strand plus -perc_identity 100 -word_size 15 \ -query Phvul.007G125800.5.0kb.upstream.fasta \ -subject Phvul.009G200800.5.0kb.upstream.fasta Phvul.007G125800 Phvul.009G200800 100.00 15 0 0 3698 3712 1965 1979 0.020 30.2 $ blastn -task blastn-short -outfmt 6 -ungapped -strand plus -perc_identity 100 -word_size 15 \ -query Phvul.009G200800.5.0kb.upstream.fasta \ -subject Phvul.007G125800.5.0kb.upstream.fasta $
This is highly reproducible and has some consistency. I've got five 5000-nt sequences, all of which share this same 15-mer. When I use two of them as query sequences, I get the 15-mer hit in all cases. When I use the other three as query sequences, I never get the 15-mer hit. If I blast just the 15-mer by itself against the sequences, I get the hit on all sequences.
Any ideas what's going on? This behavior is independent of word_size, by the way, all the way down to 8. I find this very disconcerting, since I thought that blastn would be symmetric w.r.t. query and subject, at least when they're the same size.