I encountered a strange issue running blastn. I use the same set of query sequences in two scenarios:
1. Running against a DB of genomic contigs
2. Running against the same contigs, after they have been scaffolded into pseudomolecules
The second run is about x100 slower!
I should note that the pseudomolecules are pretty large - this is a plant genome with chromosomes each over 600 Mbp.
Does this even make sense? why would blast be slower when the sequences in the DB are longer? and is there any way I can improve performance?
I'd have just switched to Blast or DIAMOND, but this Blast run is invoked by BUSCO, so I don't really have a choice. The command run by BUSCO looks like:
tblastn -evalue 0.001 -num_threads 40 -query ancestral.fasta -db scaffolds.fasta -out tblastn.tsv -outfmt 7