Question: Extract subset of Nr database
0
gravatar for sbchua.1990
23 months ago by
sbchua.199020
sbchua.199020 wrote:

I have downloaded and format (makeblastdb) the nr database from NCBI for my own local database (February 2017) . I want to extract subset from the nr database.

I have tried below method:

  1. Download GIlist from NCBI
  2. Use blastdb_aliastool to extract the subset.

Above method worked well for older database (2013) but show error "BLAST Database error: GI list specified but no ISAM file found for GI" for my recently download database. My understanding is NCBI no longer support GI.

I have also tried download target sequences as fasta file from NCBI directly as shown in http://www.ionsource.com/tutorial/db/tips_for_creating_species_specif.htm but download seem to be failed every time before complete.

Any other suggestion? I need to download nr database for txid5204[ORGN].

blast gene • 1.6k views
ADD COMMENTlink written 23 months ago by sbchua.199020

Have you tried blastdbcmd?

http://nebc.nerc.ac.uk/bioinformatics/docs/blastdbcmd.html

ADD REPLYlink written 23 months ago by Jake Warner680
3
gravatar for shenwei356
23 months ago by
shenwei3564.5k
China
shenwei3564.5k wrote:

try taxonkit, Extract all protein sequences of specific taxons from the NCBI nr database

ADD COMMENTlink written 23 months ago by shenwei3564.5k

Thanks for reply, Refer to the tutorial http://bioinf.shenwei.me/taxonkit/tutorial/ In step 2, 'prot.accession2taxid.gz' is just sample data for tutorial? If so, what is the equivalent of 'prot.accession2taxid.gz' for my data? I am a bit confused by that.

ADD REPLYlink written 23 months ago by sbchua.199020
1

Of cause not! It's from NCBI: ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.gz

ADD REPLYlink written 23 months ago by shenwei3564.5k

Thanks for your help. Good work on both 'taxonkit' and 'seqkit'. Both helped me a lot.

ADD REPLYlink written 23 months ago by sbchua.199020

glad it helps, can you give this answer an upvote or and accept it?

ADD REPLYlink modified 23 months ago • written 23 months ago by shenwei3564.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1256 users visited in the last hour