Question: how to makeblastdb with taxon id's
0
gravatar for gb
2.1 years ago by
gb1.2k
gb1.2k wrote:

Dear all,

I extracted a subset of sequences from the nt database and I want to make a blast database with those sequences including the taxon id's.

If I execute the following command :

./ncbi-blast-2.6.0+/bin/makeblastdb -in ntselection.fa -dbtype nucl -taxid_map nucl_gb.accession2taxid -parse_seqids

I get no errors and the database is working with blast, but with output parameter -outfmt '6 qseqid sseqid stitle sgi sacc pident length qlen evalue bitscore staxids' there are no taxon id's in the output.

What is the correct command and input files? If I use the gi_taxid_nucl.dmp I also get no errors but no blast database is made.

blast makeblastdb • 1.4k views
ADD COMMENTlink modified 17 months ago by Biostar ♦♦ 20 • written 2.1 years ago by gb1.2k
1

If you use the search, you'll find out that this has been asked before and at least back then there was no direct solution. However, post blast it's very easy to add that information to your output. Just use join and sort, e.g.:

join -t $'\t' -1 1 -2 1 -o 1.1,1.2,1.3,...,2.2 \
    <(sort -t $'\t' -k1,1 blastoutput) \
    <(sort -t $'\t' -k1,1 nucl_gb.accession2taxid)
ADD REPLYlink written 2.1 years ago by 5heikki8.6k

It sounded that upgrading to 2.6.0 was the solution. If I want to use your solution I have to adjust someone else's pipeline and was trying to avoid that. According the blast documentation it should be possible.

Anyways thanks for the answer

ADD REPLYlink written 2.1 years ago by gb1.2k

Have you tried using blast taxdb provided on the blastdb website? Ref: https://www.ncbi.nlm.nih.gov/books/NBK279680/

ADD REPLYlink written 2.1 years ago by Sej Modha4.5k

The taxdb is to retrieve the scientific name from the taxid. I need the taxid in my blast output. In the last column I now only see 'N/A'. But I do know that there are taxid's available because I can find them in the file nucl_gb.accession2taxid (ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/)

ADD REPLYlink written 2.1 years ago by gb1.2k

I agree and taxonomy IDs exist in the pre-formatted blast databases but for custom db, it might be trickier to incorporate taxonomy IDs. Please check your blast version: How to make a custom blast db with taxon IDs from a taxid_map file

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Sej Modha4.5k

I am aware of that post and already using blast 2.6.0 and it is still not working. I do not get errors, if I use the same format taxon map file as in that post and use blast 2.6.0 I do not get errors but there is also no database made. I think I go try blast 2.3.0 with one of the solutions

ADD REPLYlink written 2.1 years ago by gb1.2k

I have some weird results now, I use one sequence to make the blast database for test purposes.

If I use an accession_taxonid file consisting of one line (the accession and taxon id of that single sequence) it works!

But, if I use the same makeblastdb command with the complete accession_taxonid file there is no database made... Maybe the length and order of the accession_taxonid file must be the same as the input sequences.

ADD REPLYlink written 2.1 years ago by gb1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 899 users visited in the last hour