Hi, I was trying to download SARS-CoV-2 sequences data from NCBI following this link: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049 When I click the empty box, I can only get like 200 sequences, each time. So I was wondering, is there a way to batch download all the genome sequences data with a click? Many thanks. I thought I did this earlier, but I do not quite recall.
You can use NCBI Datasets for this. A dedicated page for Coronavirus Datasets is available. If you would prefer, a command line tool is also available. For example, you can use the command line tool to download SARS-Cov2 data as shown below:
datasets download virus genome taxon sars-cov-2 --complete-only --filename virus.zip
I click the empty box, I can only get like 200 sequences, each time.
Try this. Do not click any boxes. Click on
Download button at top. In step 2
Download All Records should be automatically selected. This downloads ALL sequences. As of today that number stands at 43676 genomes (~1.2 GB file).