Question: Blasting against all assemblies on NCBI without downloading the genomes
0
gravatar for genomes_and_MGEs
8 days ago by
genomes_and_MGEs0 wrote:

Hey guys,

I would like to blastx all bacterial assemblies available on NCBI against a database of several proteins, without downloading the genomes (to save disk space - I'm working on a supercomputer, no cloud available so I'm totally relying on PC's physical space). The only output I would like to have on my PC is the hit genomes containing those proteins from my physical database. Do you have a solution? Thanks!

assembly genome • 85 views
ADD COMMENTlink written 8 days ago by genomes_and_MGEs0
1

All bacterial assemblies takes less than 600GB space. Surely a supercomputer has that much space to spare. All bacterial proteomes would be like 100GB, if even that..

ADD REPLYlink written 8 days ago by 5heikki8.3k

Use the -remote option of commandline blast?

ADD REPLYlink written 8 days ago by jrj.healey11k

Will try to have a look, thanks!

ADD REPLYlink written 8 days ago by genomes_and_MGEs0

You may want to do tblastn with your proteins (rather than the other way around) if you use the -remote option (since you don't want to download the genomes). Not sure how much time NCBI allows per query but choosing all bacterial assemblies/genomes may run up against the limit. Start with a single protein and an "Entrez query" restricting blast to a genus before expanding the search.

As @5heikki says below, as long as you have space available on the supercomputer this search would be best done locally by downloading the genomes/proteomes.

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2145 users visited in the last hour