Question: Stand alone Blast annotation with multiple databases
gravatar for Biogeek
5.2 years ago by
Biogeek400 wrote:

Dear users,






Apologies for asking such a broad and somewhat obvious question to some, but I am wanting to annotate a new transcriptome across several databases (inc. NCBI, Swissprot, Uniprot, custom databases) from my standalone BlastX server set-up on a cluster.

Whilst I am comfortable blasting against 1 database using BlastX. How can I go about annotating an assembly via multiple databases and collating all the information and the best suited hits? Can this be run in one process or do I need specialised scripts to combine and sort out all the information collated?

Answer may be obvious to many, but I am still learning more on command line and scripting.

Thanks for the help!






blast annotation • 1.6k views
ADD COMMENTlink modified 5.2 years ago • written 5.2 years ago by Biogeek400
gravatar for Michael Dondrup
5.2 years ago by
Bergen, Norway
Michael Dondrup48k wrote:

Regarding: "inc. NCBI, Swissprot, Uniprot, custom databases" as far as I know Swissprot, Uniprot, etc. are all contained in the NCBI NR database you can obtain using update_blastdb. Swissprot for example is a subset of NR and provided as an alias, you would still need to download NR to blast against Swissprot. If you need more options to generate compound databases see: How To Blast A Sequence Against Multiple Databases

I recommend the following options:

  1. blastdb_aliastool: use this if you want to join several databases and use the resulting compound multiple time
  2. use blasts -db option with multiple databases, in case of a large number of possible combinations and little re-use of any compound database, e.g. users may select some blast databases from a large number of custom databases on a web-server. To use the -db parameter in this way, seems to be an undocumented feature.
ADD COMMENTlink modified 14 months ago by Ram32k • written 5.2 years ago by Michael Dondrup48k

Thanks Michael

So I can combine all databases using the blastdb_aliastool function of BLAST+ on Linux command, then start the blastX process from there. Assuming I do it this way, the best hit will come from the combined dataset, correct? Just trying to understand the process in my head.

What is the difference between aliastool and the -db function?

I only have 4 databases max, so I guess aliastool would be best for me?

Apologies for such simple questions.

ADD REPLYlink modified 14 months ago by Ram32k • written 5.2 years ago by Biogeek400

Hello Michael Dondrup, how big is the size of the nr database? I'm trying to blast against SwissProt database only but from your answer above also have to download the nr database.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by nazza20080
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2204 users visited in the last hour