[Ncbi Entrez] Retrieving Complete Genome Informations From Ncbi Genome
2
0
Entering edit mode
11.1 years ago

Hello All,

I'm trying to build the correct Entrez query in order to get the informations for complete eukaryotic genomes from the NCBI Genome database. The genome browser (http://www.ncbi.nlm.nih.gov/genome/browse/) displays 185 entries when searching complete eukaryotic genomes.

I've been trying these :

  • eukaryota[organism] AND complete[status] ; entries count = 319
  • eukaryota[organism] AND complete[status] AND "genome sequencing"[Project Type] ; count = 300

Any ideas on either the best query to do what I want or which query corresponds to what is displayed in the browser ?

Thanks a lot !

ncbi entrez genome browser • 3.5k views
ADD COMMENT
0
Entering edit mode

Hello!

What kind of information do you want exactly?

Just the number of complete genomes?

ADD REPLY
0
Entering edit mode

No, I was trying to reproduce the genome browser output for complete eukaryotic genomes, using Entrez. That's why I started comparing the numbers of complete genomes, to see if my queries were corrects. Actually I want to get the informations like assembly ID, taxon ID, number of loci, % GC etc… for all complete eukaryotic genomes using BioPerl and Entrez. The problem is, if what I get through Entrez queries is different from genome browser's informations, which one do I choose ? And is there a query that would give the same output ?

ADD REPLY
2
Entering edit mode
11.1 years ago
Neilfws 49k

I'm not convinced that the data on that page can be retrieved via Entrez.

If you follow the link to the FTP site and download the file eukaryotes.txt, you'll see a field named Status. This is where the value of 185 comes from - I opened this file in R:

euk <- read.table("eukaryotes.txt", header = T, sep = "\t", stringsAsFactors = F, comment.char = "", quote = "")
table(euk$Status)

#         Chromosomes              No data Scaffolds or contigs 
#                 185                 1609                  722 
#       SRA or Traces 
#                 455

However, if you experiment with the Advanced query builder at the NCBI website, you'll find that:

  • database Genome has field Status, but "chromosomes" is not a valid value
  • databases Bioproject and Assembly do not have field Status

So it may be that there is no direct relation to the Entrez databases. Or I may be wrong and it's just very difficult to formulate the query :)

ADD COMMENT
0
Entering edit mode

That's right but it feels weird that NCBI doesn't use the content of its databases to generate this file... I started using that file, since it already contains most of the informations I need. It's just, I'm not very comfortable with working on it while not knowing how its generated and if it corresponds or not to NCBI databases content.

Edit : An interesting fact is that "eukaryota"[organism] gives me like 2100 lines and the eukaryotes section in genome browser is more like 2900…

ADD REPLY
0
Entering edit mode
11.1 years ago
User ▴ 70

The post was deleted.

ADD COMMENT
0
Entering edit mode

But that gives 32 results; we're looking for 185. And you did not specify eukaryota.

ADD REPLY
0
Entering edit mode

Then try eukaryota[organism] AND complete[status] AND "has chromosome"[properties]?

ADD REPLY
0
Entering edit mode

That gives 128. Is no-one trying before posting :)

ADD REPLY

Login before adding your answer.

Traffic: 2016 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6