Agglomerate taxa at genus level

Question

Difference in relative abundance between MicrobiomeDB and r phyloseq

0

Entering edit mode

4 weeks ago

benkosta • 0

Hello,

I'm currently facing a bit of a puzzle in my analysis of microbial communities and was hoping to draw on the collective wisdom of this forum for insights.

In my recent work, I've been using the Phyloseq package in R to analyze 16S rRNA gene sequencing data. Part of my analysis involves calculating the relative abundances of different microbial taxa within my samples. To ensure the accuracy of my findings, I compared my calculated relative abundances with those reported in various MicrobiomeDB databases for similar microbial communities.

Surprisingly, I noticed substantial differences between the relative abundances I calculated using Phyloseq and those listed in MicrobiomeDB databases. These discrepancies are puzzling and potentially significant for the interpretation of my results.

Before delving deeper into troubleshooting and comparisons, I wanted to reach out and ask if anyone here has experienced similar issues or might have insights into potential causes for such differences. Specifically, I'm curious about the following:

Are there known methodological differences between how Phyloseq and MicrobiomeDB calculate relative abundances that could account for these discrepancies?

Here I provide my r code:

Agglomerate taxa at genus level

pseq2 <- aggregate_rare(ps, level = "Genus", detection = 0.0001, prevalence = 50/100)

pseq2 = merge_samples(pseq2, "group") # summed

Calculate relative abundance

pseq2 <- transform(pseq2, "compositional")

Top N taxa

N <- 20

top <- names(sort(taxa_sums(pseq2), decreasing = TRUE))[1:N]

Subset object to top N taxa

pseq2.top <- prune_taxa(top, pseq2)

otumerged2<- otu_table(pseq2.top)

Here are the results in r:

enter image description here

and here are the MicrobioDB results:

enter image description here

For example, in MicrobiomeDB, Blautia, Bacteroides, and Faecalibacterium are not presented in the plot.

Thank you in advance for your help, and I am looking forward to your responses!

phyloseq • 191 views

ADD COMMENT • link updated 4 weeks ago by Chris Dean ▴ 390 • written 4 weeks ago by benkosta • 0

0

Entering edit mode

There are two Lachnospiraceae genera in the boxplot, which indicates that MicrobiomeDB is not calculating the relative abundances in the same way you are (i.e., aggregating taxon abundances to the Genus level).

However, there are numerous other important reasons why taxon relative abundances from your study would differ from those produced in other studies, e.g., differences in study design; batch effects; sequencing depth; combination of 16S hypervariable region sequenced, reagent contamination, and many other reasons.

Take a look at some review papers on these topics -- if you have not already -- because understanding them are going to be much more important in helping you interpret your results compared to some methodological difference in how relative abundances are calculated using two different software packages.

ADD REPLY • link 4 weeks ago by Chris Dean ▴ 390