Rdp, Greengenes And Arb-Silva. Are There Any Advantages Of One Over The Other?
3
9
Entering edit mode
13.1 years ago
Daniel ★ 4.0k

For aligning 16S sequences against a reference database, I know these are the three major players and each is often recommended when people ask about microbiomic databases.

My question is why are there three when they all contain the same sequence data? (although ARB-silva does do eukaryotes and prokaryote and small and large subunits too).

Or are they not all the same and one reference set is better than another? How are you supposed to decide which is most appropriate for your work?

(to be clear, I'm not talking about their alignment programs, just the reference databases)

• 13k views
ADD COMMENT
5
Entering edit mode
13.1 years ago
Neilfws 49k

This is a common issue: the development of what look like very similar resources by different groups, leading to the problem of which to use. It happens for a variety of reasons (often "historical") and there is no easy answer to your question.

One approach is to read the documentation and the associated publications for each resource, to see if the developers explain why they created it and if they compare their work to existing resources. Here are publications describing RDP, SILVA and Greengenes.

At the websites you could start with background (SILVA), objectives (Greengenes) and history (RDP).

Another approach is to search for third party comparisons. This PLoS ONE article looks useful in that regard. It briefly compares the 3 tools and seems to conclude that Greengenes provides the best combination of speed and quality.

Ultimately, your choice depends on precisely what you want to do. Do you need eukaryotic sequences? Do you need the most "comprehensive" database; i.e. the most sequences? How much error are you willing to tolerate? Sometimes it comes down to: which tool makes retrieval of the data you want easiest? Needless to say, it's important to consult with as many experts in the field as possible.

Another possibility: don't make a choice. Try to fetch the subset of data you need from all 3 sources. That way, you can do a comparative study, which can be valuable information in itself.

ADD COMMENT
0
Entering edit mode

Cheers. I think the 'try each one out' is going to be the best bet. Was hoping there was going to be some unspoken truth that biostar could bestow upon me but I guess actual work will have to suffice! The PLoS paper is interesting (I'd explored it before posting a question) but it focuses on the alignment tools whereas I only want to pull out the sequences and stick them in qiime. Will give it a whirl, thanks!

ADD REPLY
3
Entering edit mode
13.1 years ago

I think there are two differences between these databases: sequences (not the same) and taxonomic classification (again, not the same). In other words you might get different results in terms of coverage and assignment (your sequence might be assigned at a different level and/or to a different branch) when using all three.

Choosing one database over another is mainly a practical issue - if you want your results to be comparable to certain publications, you need to use the same databases.

ADD COMMENT
0
Entering edit mode
8.2 years ago
5heikki 11k

In my opinion (this applies for prokaryotic 16S):

If you work in academic setting, SILVA is the best reference database.

If you work in non-academic setting (and your boss doesn't want to pay for SILVA), RDP is the best reference database.

Greengenes has not been updated in forever. Is the project even alive anymore?

ADD COMMENT

Login before adding your answer.

Traffic: 2713 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6