I've been using CLCBio to blast assembled contigs, but it's really slow. I decided to try setting up a local blast database and using that to blast my contigs, but I'm getting different results even though I'm using the same parameters. The parameters are below:
Query genetic code: 1 Standard Limit by entrez query: All organisms Filter low complexity Expect: 10 Word Size: 3 Matrix: BLOSUM62 Gap cost: Existence 11, Extension 1 Max number of hit sequences: 3
blastx -db nr \ -query ../results/contigs/CLC-contigs.fa \ -evalue 10 \ -matrix 'BLOSUM62' \ -word_size 3 \ -gapopen 11 \ -gapextend 1 \ -max_target_seqs 3 \ -outfmt "10 std stitle" \ -out ../results/blast/blast-005.csv \ -num_threads 4
Since CLCBio and blast+ are using the same parameters and the same query sequence and the same database, I should the same results right? But I'm getting 226 hits in CLCBio that aren't in the local blast. Of these species, 2 are extremely important and are known to be in the query sequence.