I am trying to find the best and easiest way to get counts of SAGs, MAGs, and isolate genomes for each bacterial phylum from either NCBI or GOLD. I wish these databases made the numbers easily accessible like IMG.
I first used NCBI's E-utilities to search for assemblies "derived from metagenome" or "derived from single cell", but this only reported 1,500 total SAGs for Bacteria which is too low considering GOLD reports almost 5,000. So I am not sure what is the most accurate way to search for the numbers on NCBI.
Then through GOLD, I used their search function to count the number of genomes within the Single Cell - Screened/Unscreened project types but started to notice that some genomes were duplicated in these two categories.
Is there a better way to get these numbers? Or does it really require a lot of manual curation? I'm almost tempted to just use IMG's numbers (even though they are lower) because they make it so easy to get the totals.