Entering edit mode
6.4 years ago
gb
★
2.2k
Does some one has experience with using the -seqidlist parameter in blast? If I make a selection of sequences from the nt database and build a database with those extracted sequences it takes about 5 seconds to blast 50 reads. If I extract the accessions of that sub selection and write them to a file and use -seqidlist to blast against the nt database it takes much longer (I stopped it after half an hour or so)
The file contains 2346860 accessions.
The command I used
sudo ncbi-blast-2.7.1+/bin/blastn -query 50_otus.fasta -db nt -outfmt 6 -out test -num_threads 10 -seqidlist accession_list
The accession_list file contain one accession on each line.
Isn't that logical? When you use
-seqidlist
the program has to go through the entirent
database as opposed to a smaller subset.I expected a smaller difference in time. The new -taxidlist option in 2.8.0 filters the database and is really fast so I expected similar results
I see that you are using release version so it may be so in v. 2.8 (which is still in beta).
BTW: Using
sudo
for user applications is not good practice. It should not be needed to run installed programs.I used
-taxidlist
with 2.8 and I used -seqidlist with 2.7 and 2.8. But I see now that my seqidlist was much smaller (37497 id's). If the speed is expected I continue to index sub-selections.To come back at at this, with only 10 accessions it also take's long (half hour). Only tested with 2.7 maybe is has to do with the new database version. 2.8 used v5 and 2.7 uses v4.
If you use a list then blast has no way of knowing that it is or it is not going to find those ID's (one or all or any combination thereof) until it looks through the entire database. So this should always take longer than creating a smaller subset.
off course is does take longer. I just did not expected it so much longer since I had such good results in running time with -taxidlist in 2.8. But good to know that it is as expected, (which was my question). Thanks!