Question: Blasting against all assemblies on NCBI without downloading the genomes
8 days ago
genomes_and_MGEs0 wrote:

Hey guys,

I would like to blastx all bacterial assemblies available on NCBI against a database of several proteins, without downloading the genomes (to save disk space - I'm working on a supercomputer, no cloud available so I'm totally relying on PC's physical space). The only output I would like to have on my PC is the hit genomes containing those proteins from my physical database. Do you have a solution? Thanks!

All bacterial assemblies takes less than 600GB space. Surely a supercomputer has that much space to spare. All bacterial proteomes would be like 100GB, if even that..

Use the -remote option of commandline blast?

Will try to have a look, thanks!

You may want to do tblastn with your proteins (rather than the other way around) if you use the -remote option (since you don't want to download the genomes). Not sure how much time NCBI allows per query but choosing all bacterial assemblies/genomes may run up against the limit. Start with a single protein and an "Entrez query" restricting blast to a genus before expanding the search.

As @5heikki says below, as long as you have space available on the supercomputer this search would be best done locally by downloading the genomes/proteomes.

genomax wrote:
