Uncommon wget behavior with ncbi genomes
1
0
Entering edit mode
16 months ago
flogin ▴ 270

Hey guys, I'm trying to download all genomes of Eimeria, present on NCBI. So, as usually, I wrote this line:

wget -r --accept-regex ".*_genomic.fna.gz" "ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/protozoa/Eimeria*" -P .

But, none file was returned (I tried with -A insetead of --accept-regex too), but the same occurs.

I'm using GNU Wget 1.17.1, and other times I already download several genomes with the same line.

this is an example of a link ftp directory with the file that I want.

https://ftp.ncbi.nlm.nih.gov/genomes/refseq/protozoa/Eimeria_necatrix/latest_assembly_versions/GCF_000499385.1_ENH001/

the file in the case is: GCF_000499385.1_ENH001_genomic.fna.gz

Can anyone help ?

ncbi wget genomes • 454 views
ADD COMMENT
0
Entering edit mode
16 months ago
vkkodali ★ 2.6k

You can use Entrez Direct for this as shown below:

esearch -db assembly -query 'Eimeria necatrix[organism]' \
| esummary \
| xtract -pattern DocumentSummary -element FtpPath_RefSeq \
| while read -r url ; do 
    path=$(echo $url | perl -pe 's/(GC[FA]_\d+.*)/\1\/\1_genomic.fna.gz/g') ; 
    wget -q --show-progress "$path" -P genome_data ; 
done

Alternatively, you can go to the NCBI Assembly portal, search for Eimeria necatrix[organism] and use the blue 'Download Assemblies' button to download the files of your choice.

ADD COMMENT

Login before adding your answer.

Traffic: 1230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6