Question: Rdp, Greengenes And Arb-Silva. Are There Any Advantages Of One Over The Other?
9
gravatar for Daniel
9.5 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

For aligning 16S sequences against a reference database, I know these are the three major players and each is often recommended when people ask about microbiomic databases.

My question is why are there three when they all contain the same sequence data? (although ARB-silva does do eukaryotes and prokaryote and small and large subunits too).

Or are they not all the same and one reference set is better than another? How are you supposed to decide which is most appropriate for your work?

(to be clear, I'm not talking about their alignment programs, just the reference databases)

• 11k views
ADD COMMENTlink modified 4.7 years ago by 5heikki9.0k • written 9.5 years ago by Daniel3.8k
5
gravatar for Neilfws
9.5 years ago by
Neilfws49k
Sydney, Australia
Neilfws49k wrote:

This is a common issue: the development of what look like very similar resources by different groups, leading to the problem of which to use. It happens for a variety of reasons (often "historical") and there is no easy answer to your question.

One approach is to read the documentation and the associated publications for each resource, to see if the developers explain why they created it and if they compare their work to existing resources. Here are publications describing RDP, SILVA and Greengenes.

At the websites you could start with background (SILVA), objectives (Greengenes) and history (RDP).

Another approach is to search for third party comparisons. This PLoS ONE article looks useful in that regard. It briefly compares the 3 tools and seems to conclude that Greengenes provides the best combination of speed and quality.

Ultimately, your choice depends on precisely what you want to do. Do you need eukaryotic sequences? Do you need the most "comprehensive" database; i.e. the most sequences? How much error are you willing to tolerate? Sometimes it comes down to: which tool makes retrieval of the data you want easiest? Needless to say, it's important to consult with as many experts in the field as possible.

Another possibility: don't make a choice. Try to fetch the subset of data you need from all 3 sources. That way, you can do a comparative study, which can be valuable information in itself.

ADD COMMENTlink written 9.5 years ago by Neilfws49k

Cheers. I think the 'try each one out' is going to be the best bet. Was hoping there was going to be some unspoken truth that biostar could bestow upon me but I guess actual work will have to suffice! The PLoS paper is interesting (I'd explored it before posting a question) but it focuses on the alignment tools whereas I only want to pull out the sequences and stick them in qiime. Will give it a whirl, thanks!

ADD REPLYlink written 9.5 years ago by Daniel3.8k
3
gravatar for Pawel Szczesny
9.5 years ago by
Pawel Szczesny3.2k
Poland
Pawel Szczesny3.2k wrote:

I think there are two differences between these databases: sequences (not the same) and taxonomic classification (again, not the same). In other words you might get different results in terms of coverage and assignment (your sequence might be assigned at a different level and/or to a different branch) when using all three.

Choosing one database over another is mainly a practical issue - if you want your results to be comparable to certain publications, you need to use the same databases.

ADD COMMENTlink written 9.5 years ago by Pawel Szczesny3.2k
0
gravatar for 5heikki
4.7 years ago by
5heikki9.0k
Finland
5heikki9.0k wrote:

In my opinion (this applies for prokaryotic 16S):

If you work in academic setting, SILVA is the best reference database.

If you work in non-academic setting (and your boss doesn't want to pay for SILVA), RDP is the best reference database.

Greengenes has not been updated in forever. Is the project even alive anymore?

ADD COMMENTlink written 4.7 years ago by 5heikki9.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1823 users visited in the last hour