How to retrieve all fasta sequences using Assembly/BioProject ID using Entrez Programming Utilities
2
0
Entering edit mode
7.9 years ago
fengzys ▴ 50

I am trying to download all the viral and bacterial genome (in Genome database), I have used Entrez utilities. Firstly, Esearch was use to retrieve all the viral Genome UID, which was then translated to nuccore gi number by Elink, some gi number corresponds to the parental description of a WGS projects, thus the fasta sequence can not obtained by efetch directly, by parsing the gb output of these gi, I can get the accession number, but this is very tedious. Is there a way to get all the sequences belongs to a Assembly or Bioproject? (Elink could translate genome UID to Assembly or BioProject ID). Thanks for your time.

Utilities NCBI efetch • 3.6k views
ADD COMMENT
0
Entering edit mode
ADD COMMENT
0
Entering edit mode
7.9 years ago
fengzys ▴ 50

Thanks, however, I found batchentrez can not deal with these master record as well. e.g. genome id 44851 44849 44848 or assembly 739291 738621 194461 or gi 1027168540 1027763052 738803256

ADD COMMENT

Login before adding your answer.

Traffic: 2710 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6