Question: Fundamental BLAST problem
0
gravatar for kevinbennett3000
3.5 years ago by
kevinbennett30000 wrote:

I ran a local blastp on the nr database from NCBI and got 100,000 hits. I organized the ones I wanted to keep in excel and I have a text file of all of their headers/description lines. How do I use what I have to get all of the actual sequences from NCBI? This may be a batch entrez thing, or it may possibly be the exact opposite...either way I figured this is a very common issue people deal with but I couldn't find a concrete solution.

blastp blast retrieve seqs • 895 views
ADD COMMENTlink written 3.5 years ago by kevinbennett30000
7
gravatar for genomax
3.5 years ago by
genomax80k
United States
genomax80k wrote:

You use the identifiers you are interested in and query nr database using a tool called blastdbcmd that is included in blast+ package.

Put your identifiers (one on each line, use Accession #) and -entry_batch id_file option with blastdbcmd.
Your command would look something like: blastdbcmd -db /path_to/nr -entry_batch Acc_ID_file -outfmt '%f' -out sequence_file

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by genomax80k

Perfect! Thank you very much. I was actually looking at this before but I wasn't entirely sure.

ADD REPLYlink written 3.5 years ago by kevinbennett30000

This might be a REALLY stupid question, but do I need to use the unformatted fasta nr database?

EDIT: I tested it out and learned that I can just use the formatted db I was using for blastp. Thanks again; your command example worked perfectly.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by kevinbennett30000
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2126 users visited in the last hour