Question: Extract subset of Nr database
0
gravatar for sbchua.1990
2.2 years ago by
sbchua.199030
sbchua.199030 wrote:

I have downloaded and format (makeblastdb) the nr database from NCBI for my own local database (February 2017) . I want to extract subset from the nr database.

I have tried below method:

  1. Download GIlist from NCBI
  2. Use blastdb_aliastool to extract the subset.

Above method worked well for older database (2013) but show error "BLAST Database error: GI list specified but no ISAM file found for GI" for my recently download database. My understanding is NCBI no longer support GI.

I have also tried download target sequences as fasta file from NCBI directly as shown in http://www.ionsource.com/tutorial/db/tips_for_creating_species_specif.htm but download seem to be failed every time before complete.

Any other suggestion? I need to download nr database for txid5204[ORGN].

blast gene • 1.9k views
ADD COMMENTlink written 2.2 years ago by sbchua.199030

Have you tried blastdbcmd?

http://nebc.nerc.ac.uk/bioinformatics/docs/blastdbcmd.html

ADD REPLYlink written 2.2 years ago by Jake Warner730
3
gravatar for shenwei356
2.2 years ago by
shenwei3564.7k
China
shenwei3564.7k wrote:

try taxonkit, Extract all protein sequences of specific taxons from the NCBI nr database

ADD COMMENTlink written 2.2 years ago by shenwei3564.7k

Thanks for reply, Refer to the tutorial http://bioinf.shenwei.me/taxonkit/tutorial/ In step 2, 'prot.accession2taxid.gz' is just sample data for tutorial? If so, what is the equivalent of 'prot.accession2taxid.gz' for my data? I am a bit confused by that.

ADD REPLYlink written 2.2 years ago by sbchua.199030
1

Of cause not! It's from NCBI: ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz

ADD REPLYlink written 2.2 years ago by shenwei3564.7k

Thanks for your help. Good work on both 'taxonkit' and 'seqkit'. Both helped me a lot.

ADD REPLYlink written 2.2 years ago by sbchua.199030

glad it helps, can you give this answer an upvote or and accept it?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by shenwei3564.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1504 users visited in the last hour