How do I use entrez fetch queries on the command line to download an entire RefSeq genome?
A prior answer may work better with Eukaryotic genomes: C: Retrieve genome in fasta format from ncbi
$ esearch -db genome -query "Saccharomyces cerevisiae [ORGN]"|elink -target assembly|esummary|xtract -pattern FtpPath_RefSeq -element FtpPath_RefSeq
You'll have to know the accession number(s) for the sequences in the genome. The sequences are most easily accessible from the nucleotide database.
For example "Aeromonas hydrophila" genome sequence has accession CP007518.2 to retrieve it in genbank format type into the terminal the following:
efetch -db nucleotide -id CP007518.2 -mode text -format gb
Replace the CP007518.2 with accession for the sequence that you want.
The format may be i.e. fasta instead of gb for genbank.
What if its a a large Eukaryotic genome?
It's the same, get the accessions for the sequences in the genome, pass them, comma delimited to the -id as -id "CP007518.2,CP007518.1".
There are other options, but they involve more entrez direct tools then efetch. For more complex stuff read: https://www.ncbi.nlm.nih.gov/books/NBK179288/.
Login before adding your answer.
Use of this site constitutes acceptance of our User Agreement and Privacy