How to download genome via Entrez
1
0
Entering edit mode
4.5 years ago
DNAlias ▴ 40

How do I use entrez fetch queries on the command line to download an entire RefSeq genome?

Thank you

entrez fetch ncbi • 4.2k views
ADD COMMENT
0
Entering edit mode

A prior answer may work better with Eukaryotic genomes: C: Retrieve genome in fasta format from ncbi

For example:

$ esearch -db genome -query "Saccharomyces cerevisiae [ORGN]"|elink -target assembly|esummary|xtract -pattern FtpPath_RefSeq -element FtpPath_RefSeq
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64
ADD REPLY
3
Entering edit mode
4.5 years ago

You'll have to know the accession number(s) for the sequences in the genome. The sequences are most easily accessible from the nucleotide database.

For example "Aeromonas hydrophila" genome sequence has accession CP007518.2 to retrieve it in genbank format type into the terminal the following:

efetch -db nucleotide -id CP007518.2 -mode text -format gb

Replace the CP007518.2 with accession for the sequence that you want. The format may be i.e. fasta instead of gb for genbank.

ADD COMMENT
0
Entering edit mode

What if its a a large Eukaryotic genome?

ADD REPLY
0
Entering edit mode

It's the same, get the accessions for the sequences in the genome, pass them, comma delimited to the -id as -id "CP007518.2,CP007518.1".

There are other options, but they involve more entrez direct tools then efetch. For more complex stuff read: https://www.ncbi.nlm.nih.gov/books/NBK179288/.

ADD REPLY

Login before adding your answer.

Traffic: 2689 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6