Restricting database by taxa in local blast searches
1
2
Entering edit mode
4.2 years ago
dimitrischat ▴ 210

Hello, i am trying to do a blastx search locally but only for plants. How do i restrict it only searching plants? -taxids ? but how do find i all taxids for plants? or for whatever someone needs to do a search ? thanks

RNA-Seq • 4.9k views
ADD COMMENT
0
Entering edit mode

which version of blast are you using? (== can you already make use of the new dbv5 format of blast DBs?)

ADD REPLY
0
Entering edit mode

the version is 2.8.1+ (no idea)

ADD REPLY
0
Entering edit mode

You could download taxonomic information here: ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.zip

To get the taxids you can filter the rankedlineage file.

EDIT: Changed the link

ADD REPLY
5
Entering edit mode
4.2 years ago
GenoMax 141k

I got the following answer from NCBI for how to do this with new v5 blast indexes. You may need to upgrade to latest blast+ if v.2.8.1 is not compatible with v5 indexes.

BLAST+ package includes a script that allows you to extract taxid of interest (e.g. for bacteria below, use taxID you are interested in).

$ sh get_species_taxids.sh -t 2 > bacterial.ids

This dumps all the taxids under taxid 2 into the file bacterial.ids. You can then run blast to limit the search to those id's:

$ blastn -db nt_v5 -query test.txt -out out.put -taxidlist bacterial.ids
ADD COMMENT
0
Entering edit mode

thanks a lot for your input. Just wanted to ask, because there are these taxid available: Archaea, Bacteria, Eukaryota, Viruses, Other, Unclassified. How can i specify plants ?

ADD REPLY
1
Entering edit mode

Viridiplantae (green plants, taxid 33090) is the top of the hierarchy. Narrow down further as needed.

ADD REPLY
0
Entering edit mode

so i do this ?

sh get_species_taxids.sh -t 33090 ?
ADD REPLY
1
Entering edit mode

Yes. Embryophyta (taxid: 3193) are land plants (if you want to narrow down further).

ADD REPLY
0
Entering edit mode

thank you for your input! Much appreciated

ADD REPLY
0
Entering edit mode

-db database_name = the folder i downloaded from ncbi database ( which i downloaded from here = update_blastdb.pl --decompress nr [*] ) ?

ADD REPLY
0
Entering edit mode

-db should point to the basename (nr) of the database you are going to search against.

ADD REPLY
0
Entering edit mode

i did

-db '../ncbi_blast_database/nr'

and

-taxidlist '.../ncbi-blast-2.10.0+/bin/plant.ids'

hope this works

ADD REPLY
0
Entering edit mode

i got nr00 up to nr38, all files about ~200gb. I need all right? i mean when i use -db ../ncbi_blast_database/nr , it searches all? but with -taxidlist it searches only this taxa within all nr files?

ADD REPLY
1
Entering edit mode

Correct. You need all nr files. With -taxidlist blast should report only the hits you are interested in.

ADD REPLY
0
Entering edit mode

using as database the uniprot_sprot.fasta, can i use the -taxidlist plants.ids ( sh get_species_taxids.sh -t 33090 > plants.ids ) ? or it doesnt make any sense? And if i cant, how can i do that with the uniprot_sprot database?

ADD REPLY
0
Entering edit mode

Yes that should work. You mean with swissprot DB correct? That is all NCBI has.

ADD REPLY

Login before adding your answer.

Traffic: 2631 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6