Question

How to retrieve all fasta sequences using Assembly/BioProject ID using Entrez Programming Utilities

0

Entering edit mode

7.9 years ago

fengzys ▴ 50

I am trying to download all the viral and bacterial genome (in Genome database), I have used Entrez utilities. Firstly, Esearch was use to retrieve all the viral Genome UID, which was then translated to nuccore gi number by Elink, some gi number corresponds to the parental description of a WGS projects, thus the fasta sequence can not obtained by efetch directly, by parsing the gb output of these gi, I can get the accession number, but this is very tedious. Is there a way to get all the sequences belongs to a Assembly or Bioproject? (Elink could translate genome UID to Assembly or BioProject ID). Thanks for your time.

Utilities NCBI efetch • 3.6k views

ADD COMMENT • link 7.9 years ago by fengzys ▴ 50

score 0 · Answer 1 · 2016-05-17

0

Entering edit mode

7.9 years ago

wpwupingwp ▴ 120

http://www.ncbi.nlm.nih.gov/sites/batchentrez

try this

ADD COMMENT • link 7.9 years ago by wpwupingwp ▴ 120

score 0 · Answer 2 · 2016-05-17

0

Entering edit mode

7.9 years ago

fengzys ▴ 50

Thanks, however, I found batchentrez can not deal with these master record as well. e.g. genome id 44851 44849 44848 or assembly 739291 738621 194461 or gi 1027168540 1027763052 738803256

ADD COMMENT • link 7.9 years ago by fengzys ▴ 50