Question: Possible to get NCBI assembly and read set of several genomes?
10 months ago
beginner_problem10 wrote:


I am trying to get a set of assemblies from one species, lets say for example Bacillus cereus, for which also the according read sets are available.

Is that somehow possible in NCBI - to get the direct connection?

I tried sofar by searching in the database "Assembly" for B.cereus, but if I then choose some assembly, there is no connection to the read set, from which this assembly was created? DOes someone know how to do the trick?


modified 9 months ago • written 10 months ago

This question has been answered multiple times in past. See the answers and the links I posted in this thread: more elegant way to bulk download genomes from the NCBI

In short, NCBI genome download tool mentioned in @jrj.healey's answer should do the trick.

You will need to look through the biosamples accessions associated with read assemblies to get the read data. Use sra-explorer tool from Phil Ewels for that.

modified 10 months ago • written 10 months ago

Thank you for the answer, but which @jrj.healey answer do you mean? I did not see anyone named like that.

and which accessions do you mean? I tried to use the Biosample ids, or the assembly accessions but that did not work out.

written 9 months ago

That user changed his screen name to @Joe. So that would be the answer to look for.

Second answer in the thread I linked above can be used for an example. Using that if you did this search at NCBI you are going to see some assemblies for Lactobacillus. Select sort by date refseq assembly released (at top of page, newer assemblies are likely to have NGS data) you will see this first result.. Clicking on associated biosample gives you the SRA accession.

You can probably use EntrezDirect to get some of this information. I may look it up later today.

modified 9 months ago • written 9 months ago
