Remove species from nr
1
0
Entering edit mode
10.0 years ago
biobio ▴ 50

Hi,

I'm using a local installation of blast with the nr database. I'm mostly looking for viral species, but the viral refseq database misses some hits compared to a full nr search, but the nr search takes a long time. I'd like to reduce the time it takes by removing species I don't care about from the database. For example, I'd like to remove all eukaryote DNA from nr. Is this possible? How would I go about doing that?

Thanks!

nr blast blast-plus • 2.3k views
ADD COMMENT
0
Entering edit mode
10.0 years ago
dssouzadan ▴ 30

You can check the taxonomy ID of the organism that you're studying and after search this ID at NCBI.
After that, look for the type of sequences you're looking for, nucleotide or protein and browse all sequences from organisms that have this taxonomy. Than download the entire GI list.

Use the GI list file to retrieve all sequences in fasta format from NR that matches in the list with blastdbcmd tool. With this fasta file, you can recreate your database filtered by taxonomy.

I did it only once, but I didn't remember the full roadmap to do it. But these are the steps.

ADD COMMENT
0
Entering edit mode

Thanks! I'm actually trying to do the opposite of this. I'm interested in a lot of species so I want to remove sequences from the database that I know I'm not looking for so that I can hopefully speed up my search.

ADD REPLY
0
Entering edit mode

you can use the same database passing the following command:

-negative_gilist <String>
   Restrict search of database to everything except the listed GIs
    * Incompatible with:  gilist, seqidlist, remote, subject, subject_loc
ADD REPLY

Login before adding your answer.

Traffic: 2973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6