Question

Stand alone Blast annotation with multiple databases

0

Entering edit mode

8.3 years ago

Biogeek ▴ 470

Dear users,

Apologies for asking such a broad and somewhat obvious question to some, but I am wanting to annotate a new transcriptome across several databases (inc. NCBI, Swissprot, Uniprot, custom databases) from my standalone BlastX server set-up on a cluster.

Whilst I am comfortable blasting against 1 database using BlastX. How can I go about annotating an assembly via multiple databases and collating all the information and the best suited hits? Can this be run in one process or do I need specialised scripts to combine and sort out all the information collated?

Answer may be obvious to many, but I am still learning more on command line and scripting.

Thanks for the help!

annotation blast • 2.4k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 8.3 years ago by Biogeek ▴ 470

Ram · Answer 1 · 2016-01-04

1

Entering edit mode

8.3 years ago

Michael 54k

Regarding: "inc. NCBI, Swissprot, Uniprot, custom databases" as far as I know Swissprot, Uniprot, etc. are all contained in the NCBI NR database you can obtain using update_blastdb. Swissprot for example is a subset of NR and provided as an alias, you would still need to download NR to blast against Swissprot. If you need more options to generate compound databases see: How To Blast A Sequence Against Multiple Databases

I recommend the following options:

blastdb_aliastool: use this if you want to join several databases and use the resulting compound multiple time
use blasts -db option with multiple databases, in case of a large number of possible combinations and little re-use of any compound database, e.g. users may select some blast databases from a large number of custom databases on a web-server. To use the -db parameter in this way, seems to be an undocumented feature.

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Michael 54k

0

Entering edit mode

Thanks Michael

So I can combine all databases using the blastdb_aliastool function of BLAST+ on Linux command, then start the blastX process from there. Assuming I do it this way, the best hit will come from the combined dataset, correct? Just trying to understand the process in my head.

What is the difference between aliastool and the -db function?

I only have 4 databases max, so I guess aliastool would be best for me?

Apologies for such simple questions.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by Biogeek ▴ 470

0

Entering edit mode

Hello Michael Dondrup, how big is the size of the nr database? I'm trying to blast against SwissProt database only but from your answer above also have to download the nr database.

ADD REPLY • link 5.4 years ago by nazza2008 • 0