Counting Genomes from NCBI and GOLD
Entering edit mode
2.9 years ago
Morgan S. ▴ 80


I am trying to find the best and easiest way to get counts of SAGs, MAGs, and isolate genomes for each bacterial phylum from either NCBI or GOLD. I wish these databases made the numbers easily accessible like IMG.

I first used NCBI's E-utilities to search for assemblies "derived from metagenome" or "derived from single cell", but this only reported 1,500 total SAGs for Bacteria which is too low considering GOLD reports almost 5,000. So I am not sure what is the most accurate way to search for the numbers on NCBI.

Then through GOLD, I used their search function to count the number of genomes within the Single Cell - Screened/Unscreened project types but started to notice that some genomes were duplicated in these two categories.

Is there a better way to get these numbers? Or does it really require a lot of manual curation? I'm almost tempted to just use IMG's numbers (even though they are lower) because they make it so easy to get the totals.


genome single cell ncbi gold metagenome • 498 views

Login before adding your answer.

Traffic: 2106 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6