Question: NCBI Blast locally: filter by accession number and NOT by GI number
2
gravatar for tlorin
2.6 years ago by
tlorin250
Switzerland
tlorin250 wrote:

I have downloaded the NCBI nt database using the blastdb_update.pl perl script, but I want to blast some query file not on the whole nt database but on specific species. I know that when using blast locally it is possible to subset the nt/nr database using a list of GI identifiers, as explained here.

However, NCBI is phasing out GIs and we should instead use accession.version identifiers. I have downloaded those for my species, below is part of the file mygi.txt.

When I run

blastdb_aliastool -gilist mygi.txt -db nt -out sthg.out -title sometitle

I obviously get

BLAST Database error: Specified file is not a valid GI/TI list. since I am not providing a GI list.

I cannot find any command-line option in the manual to specify that I want to filter the nt database by accession number; any idea of how I can achieve that? I bet this option will have to be added by the BLAST team at some point :)


mygi.txt below

AF324813.1
AF324814.1
AF324815.1
AF324816.1
AF324817.1
AF324818.1
AF324819.1
AF324820.1
AF324821.1
AF324822.1
AF324823.1
AF324824.1
AF370451.1
AY198341.1
AY198342.1
ncbi • 1.7k views
ADD COMMENTlink modified 2.6 years ago by genomax64k • written 2.6 years ago by tlorin250

An alternative (and dirtier ;) ) possibility could be using this, then using makeblastdb and blast on this newly created database.

ADD REPLYlink written 2.6 years ago by tlorin250
3
gravatar for genomax
2.6 years ago by
genomax64k
United States
genomax64k wrote:

This solution adds a step but until NCBI updates the blastdb_aliastool to accept accession numbers this may the only way.

You can use blastdbcmd from blast+ package to retrieve sequences from nt db as fasta file followed by makeblastdb to make the blast indexes for the subset of sequences. my_acc.txt file is the file with accession numbers (one per line).

blastdbcmd -db /path_to/nt -entry_batch my_acc.txt -out my_seq.fa
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 829 users visited in the last hour