Difference between the original Green Genes db and Green Genes Second Genome?
8.3 years ago
diltsjeri ▴ 470


We've been looking at Greens Genes as a potential 16S database. Initially we downloaded the file current_GREENGENES_gg16S_unaligned.fasta.gz from here (2011), but then we realized there was a second more recent Green Genes site and downloaded gg_13_5.fasta.gz (2013).

What is the criteria difference of each database?

For instance, we ran a women's health sample against the first db and found a vast amount of gardnerella vaginalis, but when we ran it against the second no g.vag was found?! After looking into this further we realized that the second db wasn't classifying any gardnerella it found down to species. So, why was GG comfortable declaring gvag down to the specie level in their original database and not in their second?

Why is the second significantly larger? What is going on?!!! We can't find documentation on this.

8.3 years ago
5heikki 11k

GreenGenes has not been updated in a very long time and I doubt anyone is even looking after it anymore. For high quality up-to-date 16S reference db, use either SILVA (if in non-profit setting) or RDP.


