Question: How to download genome assemblies from NCBI with a list of GCA identifiers?
gravatar for O.rka
23 months ago by
O.rka210 wrote:

I went to, typed in my organism, and now I want to download all of the assemblies that pop up. If I click [Download Assemblies] then it only downloads 1/22 of them and it's been saying "calculating size..." for about 30 minutes now. I tried using but not all of the records were downloaded

download data assembly ncbi • 1.9k views
ADD COMMENTlink modified 23 months ago by vkkodali2.1k • written 23 months ago by O.rka210

Are you sure you set the right filters on ngd, such as assembly level etc? It should download anything that’s present in the asssembly summary file.

ADD REPLYlink written 23 months ago by Joe17k

Apparently the organism I wanted had all of its records in GenBank and not RefSeq

ADD REPLYlink written 23 months ago by O.rka210

Yep, that is what I expected.

RefSeq is a subset of the total data in Genbank, that has been curated to a high degree manually. They are "reference sequences".

ADD REPLYlink written 23 months ago by Joe17k

I'm not sure if you can rely on this all the time, but IIRC the accessions starting with "GCA_" are from GenBank. Accessions from RefSeq tend to start with "GCF_".

ADD REPLYlink written 22 months ago by kblin10
gravatar for vkkodali
23 months ago by
United States
vkkodali2.1k wrote:

You can try the following:

esearch -db assembly -query 'Bos taurus[organism] AND latest[filter]' \
    | esummary \
    | xtract -pattern DocumentSummary -element FtpPath_GenBank \
    | while read -r line ; 
        fname=$(echo $line | grep -o 'GCA_.*' | sed 's/$/_genomic.fna.gz/') ;
        wget "$line/$fname" ;

Here, I am first fetching the FTP path for the GenBank assembly using edirect tools and then use standard linux commands to download the genomic fasta file.

ADD COMMENTlink written 23 months ago by vkkodali2.1k

Hi! Great solution!

Do you think that would be possible to extract the list of downloaded GCAs, ie to the GCA_list.txt file?

ADD REPLYlink written 7 months ago by agata10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1278 users visited in the last hour