Metagenomics best reference DB
1
3
3.7 years ago
lotus28

I cannot decide which reference I should use to map my gut microbiome 16S data. Greengenes seems to be the best choice, but last time it was updated in 2013 and metagenomics has moved far ahead since then. Won't I be missing a huge part of biodiversity, if I use it?

Also, this article keeps me worried. SILVA and Greengenes have a rather small overlap in taxonomy. It makes me want to make my own reference DB, but that'd be a serious standalone project.

What do you use to map gut 16S? Is there really nothing better than these two arguably accurate ref DBs?

metagenomics 16s silva greengenes • 1.4k views
1
1
It's not very good, it only has 20,792 sequences. I guess that's improvement over what it was a few years ago (back then it was ~10k seqs)..

2
3.7 years ago
5heikki

Non-profit -> SILVA

For-profit -> RDP

Greengenes is dead, don't bother with it. Overall I think SILVA might be more accurate/up-to-date than RDP. However, SILVA has a costly license for non-academic settings..

1
For what it's worth: Silva plans to change the license but they're a bit behind schedule (expected date fall 2018)

0
That's great news I am not sure that it counts as behind schedule. The announcement is from April 2018 and says that from Fall 2018 onwards SILVA will be free for all purposes. So, I believe that Silva is now free for commercial use.

0
well legally, I guess the first statement is more binding: "With the next full database release [...] the SILVA datasets will become free [...] for commercial/non-academic users.". The term "expected in fall" is very vague

The current release 132 was published in December 2017, so I fear we non-academics have to be a bit more patient

0
But RDP does not report taxa names below genus, does it? That is what I'd like the most from a ref DB, as I intend to use 16S as a "crutch" for my WGS study.

1
It assigns even serovars although it's a bit silly because generally speaking you cannot tell serovars apart from 16S (I parsed this mapfile from sequence headers):

grep Salmonella /home/RefData/RDP.map | head
S000000641  Salmonella enterica subsp. null; ATCC 13311
S000003829  Salmonella sp.
S000004396  Salmonella enterica subsp. null; E10, NCTC 8391; pHRC4, pHRC5
S000006115  Salmonella enterica (T); ATCC 13314
S000007382  Salmonella enterica subsp. enterica serovar Give
S000007835  Salmonella sp.; ATCC 9712
S000009313  Salmonella enterica subsp. null; Ty2, ATCC 19430; type strain; pHRC1, pHRC2, pHRC3
S000012024  Salmonella enterica subsp. enterica serovar Sofia
S000012025  Salmonella enterica subsp. enterica serovar Shomron
S000014314  Salmonella enterica subsp. null

0
That's interesting. Are you aware of any qiime2 compatible classifiers for RDP?

1
Afraid not. For 16S taxonomy assignments, I generally use alignmentTools from RDP. When I got started with 16S stuff, I also used QIIME or QIIME2 or whatever was the main version back then. However, quite fast I moved away from it into standalone tools because it felt too much like a black box. So, for example, getting representative sequences is very easy with CD-HIT-EST. Then you just assign taxonomy to those seqs with whatever, e.g. the alignmentTools I mentioned..