How to download a large number of protein.faa.gz files from NCBI's FTP site in one go
2
0
Entering edit mode
2.0 years ago
beginner123 ▴ 30

Does anyone know if it is possible to get protein.faa.gz file from NCBI FTP site by using efetch (https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi)?

The file can be downloaded from 【protein】link shown in the image, but I have to download over 100 sequences.

enter image description here

NCBI entrez FTP efetch • 1.3k views
ADD COMMENT
4
Entering edit mode
2.0 years ago
vkkodali_ncbi ★ 3.7k

To download a large number of protein FASTA files at the genome scale, you should use NCBI Datasets. You can use the web interface to search for a taxonomic group of interest, choose the scope of genomes and pick specific files of interest and download them all in one go. There's a command-line tool and an API that you can use in your own scripts.

ADD COMMENT
2
Entering edit mode
2.0 years ago
GenoMax 143k

Prior answers of interest:
How to extract Refseq of downloaded files from NCBI
How to download genome assemblies from NCBI with a list of GCA identifiers? (change the .fna file to .faa)
NCBI datasets bulk protein fasta download (if you used datasets then the solution here may be needed to rename files)

ADD COMMENT
0
Entering edit mode

Thank you vkkodali_ncbi and GenoMax ! I successfully downloaded the file via NCBI Datasets. Next, I need to change the GCA_ style annotations to NC_ annotations in order to create a reference list of RefSeq (NC_). I have a tab-separated list of GCA_ downloaded from NCBI Datasets.

ADD REPLY

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6