Question: blast -seqidlist is slow
0
gravatar for gb
19 months ago by
gb1.3k
gb1.3k wrote:

Does some one has experience with using the -seqidlist parameter in blast? If I make a selection of sequences from the nt database and build a database with those extracted sequences it takes about 5 seconds to blast 50 reads. If I extract the accessions of that sub selection and write them to a file and use -seqidlist to blast against the nt database it takes much longer (I stopped it after half an hour or so)

The file contains 2346860 accessions.

The command I used

sudo ncbi-blast-2.7.1+/bin/blastn -query 50_otus.fasta -db nt -outfmt 6 -out test -num_threads 10 -seqidlist accession_list

The accession_list file contain one accession on each line.

blast • 610 views
ADD COMMENTlink modified 17 months ago by Biostar ♦♦ 20 • written 19 months ago by gb1.3k
1

Isn't that logical? When you use -seqidlist the program has to go through the entire nt database as opposed to a smaller subset.

ADD REPLYlink written 19 months ago by genomax77k

I expected a smaller difference in time. The new -taxidlist option in 2.8.0 filters the database and is really fast so I expected similar results

ADD REPLYlink modified 19 months ago • written 19 months ago by gb1.3k
1

I see that you are using release version so it may be so in v. 2.8 (which is still in beta).
BTW: Using sudo for user applications is not good practice. It should not be needed to run installed programs.

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax77k

I used -taxidlist with 2.8 and I used -seqidlist with 2.7 and 2.8. But I see now that my seqidlist was much smaller (37497 id's). If the speed is expected I continue to index sub-selections.

ADD REPLYlink written 19 months ago by gb1.3k

To come back at at this, with only 10 accessions it also take's long (half hour). Only tested with 2.7 maybe is has to do with the new database version. 2.8 used v5 and 2.7 uses v4.

ADD REPLYlink written 19 months ago by gb1.3k

If you use a list then blast has no way of knowing that it is or it is not going to find those ID's (one or all or any combination thereof) until it looks through the entire database. So this should always take longer than creating a smaller subset.

ADD REPLYlink modified 19 months ago • written 19 months ago by genomax77k

off course is does take longer. I just did not expected it so much longer since I had such good results in running time with -taxidlist in 2.8. But good to know that it is as expected, (which was my question). Thanks!

ADD REPLYlink written 19 months ago by gb1.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1730 users visited in the last hour