I am trying to find the total number of genomes available in the genome database on NCBI using E-utilities. I see just from using the website that there are 39,625 when browsing by organism. I'd just like to pull this number using E-utilities. I've used the following code in Mac Terminal, but it returned only 6314 results. curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=genome&term=overview&rettype=count"
Any ideas how I can edit that code to return the full list of genome IDs?
Thanks - I found that number too using: curl "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=genome"
Any idea why these don't match? Sorry if these are very basic questions, I'm extremely new to this and just starting grad school for Biomedical Informatics.
Genomes are in various stages of completion and as a result they may be listed in different sections. The total number shown may include
sequences, maps, chromosomes, assemblies, and annotations
If you are only interested in
complete
genomes then you could parse the genome reports files as shown here: A: most sequenced genomesOkay, I was thinking it was due to any that might be incomplete. Appreciate the help, but I'm not familiar with that language you're using. Is it Perl? Sorry I'm hopeless! Really just trying to understand how to work within terminal using the specific e-utilities.
It is a command line tool made available by NCBI called unix utilities. You can read more about it here. You can also get similar functionality by NCBI eutils a web interface.