creating a subset of a BLAST database
0
0
Entering edit mode
21 months ago
ashenflower ▴ 30

Hello everybody,

I anticipate that I am not a computer scientist, so I hope this won't be a dumb question.

I have a blast db on my server containing all eukariotes sequences, but it has a size 250GB, and it requires too much memory to be used for querying. Since I am only interested in querying against some specific sequences, after looking around on the internet, I tried to create a subset of the db using an accession list (about 2000 elements) and tusing the command blastdb_aliastool as follows:

blastdb_aliastool -db euk_genomes -seqidlist my_accessions.txt -dbtype nucl -out subset_db

Apparently it worked, but when I tried to query it, my process still crushed because of lack of memory.

Now, my quetion is: since we are talking about an alias of the db, does Blast still load the original db entirely in memory before "selecting" only the specified accessions for subsequent querying? If so, is there a way to avoid it? Or should I just retrieve the sequences I am interested in and re-build another db?

blastdb_aliastool blastn blastdbcmd blast • 767 views
ADD COMMENT
0
Entering edit mode

It is possible that you don't have enough memory to use the subset database you made (assuming all worked well there). Did you use the subset in your subsequent search? Even if you were to extract the sequences and remake the database, memory may still remain a challenge.

ADD REPLY
0
Entering edit mode

I would say there are really low chances that the memory is not sufficient for the subset (this was a trial, so I only used all the accessions relative to 4 different genomes, and I'm using a cpu with 128GB of RAM). But I am not sure how to check the size of my subset, since the "blastdb_aliastool" only produce a .nal file. That's also why I guessed that maybe it was still needed to load the entire db in order to "extract" the subset. And yes, I tried to use the "subset_db" in my search, and I wasn't expecting a memory error

ADD REPLY
0
Entering edit mode

I have not personally used the aliastool in that way. If it only made a single file (without producing a set of subset files) then you should extract the sequences you need and then create a new database. That should be foolproof.

ADD REPLY

Login before adding your answer.

Traffic: 1316 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6