Question: Blasting against all assemblies on NCBI without downloading the genomes
gravatar for genomes_and_MGEs
8 days ago by
genomes_and_MGEs0 wrote:

Hey guys,

I would like to blastx all bacterial assemblies available on NCBI against a database of several proteins, without downloading the genomes (to save disk space - I'm working on a supercomputer, no cloud available so I'm totally relying on PC's physical space). The only output I would like to have on my PC is the hit genomes containing those proteins from my physical database. Do you have a solution? Thanks!

assembly genome • 85 views
ADD COMMENTlink written 8 days ago by genomes_and_MGEs0

All bacterial assemblies takes less than 600GB space. Surely a supercomputer has that much space to spare. All bacterial proteomes would be like 100GB, if even that..

ADD REPLYlink written 8 days ago by 5heikki8.3k

Use the -remote option of commandline blast?

ADD REPLYlink written 8 days ago by jrj.healey11k

Will try to have a look, thanks!

ADD REPLYlink written 8 days ago by genomes_and_MGEs0

You may want to do tblastn with your proteins (rather than the other way around) if you use the -remote option (since you don't want to download the genomes). Not sure how much time NCBI allows per query but choosing all bacterial assemblies/genomes may run up against the limit. Start with a single protein and an "Entrez query" restricting blast to a genus before expanding the search.

As @5heikki says below, as long as you have space available on the supercomputer this search would be best done locally by downloading the genomes/proteomes.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax64k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2145 users visited in the last hour