Question

[Ncbi Entrez] Retrieving Complete Genome Informations From Ncbi Genome

0

Entering edit mode

11.1 years ago

ls.duchemin • 0

Hello All,

I'm trying to build the correct Entrez query in order to get the informations for complete eukaryotic genomes from the NCBI Genome database. The genome browser (http://www.ncbi.nlm.nih.gov/genome/browse/) displays 185 entries when searching complete eukaryotic genomes.

I've been trying these :

eukaryota[organism] AND complete[status] ; entries count = 319
eukaryota[organism] AND complete[status] AND "genome sequencing"[Project Type] ; count = 300

Any ideas on either the best query to do what I want or which query corresponds to what is displayed in the browser ?

Thanks a lot !

ncbi entrez genome browser • 3.5k views

ADD COMMENT • link updated 11.1 years ago by User ▴ 70 • written 11.1 years ago by ls.duchemin • 0

0

Entering edit mode

Hello!

What kind of information do you want exactly?

Just the number of complete genomes?

ADD REPLY • link 11.1 years ago by Leandro Lima ▴ 970

0

Entering edit mode

No, I was trying to reproduce the genome browser output for complete eukaryotic genomes, using Entrez. That's why I started comparing the numbers of complete genomes, to see if my queries were corrects. Actually I want to get the informations like assembly ID, taxon ID, number of loci, % GC etc… for all complete eukaryotic genomes using BioPerl and Entrez. The problem is, if what I get through Entrez queries is different from genome browser's informations, which one do I choose ? And is there a query that would give the same output ?

ADD REPLY • link 11.1 years ago by ls.duchemin • 0

score 2 · Answer 1 · 2013-03-11

I'm not convinced that the data on that page can be retrieved via Entrez.

If you follow the link to the FTP site and download the file eukaryotes.txt, you'll see a field named Status. This is where the value of 185 comes from - I opened this file in R:

euk <- read.table("eukaryotes.txt", header = T, sep = "\t", stringsAsFactors = F, comment.char = "", quote = "")
table(euk$Status)

#         Chromosomes              No data Scaffolds or contigs 
#                 185                 1609                  722 
#       SRA or Traces 
#                 455

However, if you experiment with the Advanced query builder at the NCBI website, you'll find that:

database Genome has field Status, but "chromosomes" is not a valid value
databases Bioproject and Assembly do not have field Status

So it may be that there is no direct relation to the Entrez databases. Or I may be wrong and it's just very difficult to formulate the query :)

score 0 · Answer 2 · 2013-03-12

0

Entering edit mode

11.1 years ago

User ▴ 70

The post was deleted.

ADD COMMENT • link 3.8 years ago by User ▴ 70

0

Entering edit mode

But that gives 32 results; we're looking for 185. And you did not specify eukaryota.

ADD REPLY • link 11.1 years ago by Neilfws 49k

0

Entering edit mode

Then try eukaryota[organism] AND complete[status] AND "has chromosome"[properties]?