'ORIGIN' flag missing in GenBank files
1
4
Entering edit mode
7.0 years ago
Sej Modha 5.3k

Dear All,

I noticed that a lot of GenBank files(e.g. bacterial genomes, human genome - NC_000007.14, NC_000002.12) do not contain 'ORIGIN' flag that holds the sequence on NCBI webpage as well as eutils version of the record in GenBank format.

Just wondering if something has changed or NCBI has decided to remove sequences from GenBank files?

genbank eutils ncbi • 1.3k views
ADD COMMENT
1
Entering edit mode
7.0 years ago
Joseph Hughes ★ 3.0k

How about trying something like this:

esearch -db assembly -query "Homo sapiens[ORGN] AND latest[SB]" | efetch -format docsum | xtract -pattern DocumentSummary -element AssemblyAccession SpeciesTaxid SpeciesName FtpPath_RefSeq | sed 's/,.*//' | sort -k 3,3 | tee downloaded_genomes.tsv | cut -f 4 | sed -e 's/$/\/*genomic.gbff.gz/' | wget -i /dev/stdin

It is pretty ugly and I know it used to be so easy.

ADD COMMENT

Login before adding your answer.

Traffic: 2204 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6