Question: Should we rarefy our amplicon sequencing data?
0
gravatar for songzewei
4 months ago by
songzewei10
United States
songzewei10 wrote:

On Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible's figure 1: https://journals.plos.org/ploscompbiol/article/comment?id=10.1371/annotation/043bcfb2-1583-41a8-9497-807232f001f4

Am I the only one to think that Fig 1 actually shows the oppsite conclusion?

The stat is significant due to larger sampling effect on sample B. After adjusting the sampling effect, we no longer have the false positive.

On the other word, if random sampling is inadmissible, what if I sequenced sample B twice. One time I got 50, 50, and the other time I got 5000, 5000. How should I interpret the totally differnt stat outcome if random sampling is not applied?

sequencing • 162 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by songzewei10

It is true that with more reads, we have larger statistic power.

But how should we deal with the uneven stat power among samples with different depth?

Comparison between two depth seqenced samples will have a larger statistic power than that between two shallow samples. Is our conclusion based on the uneven depth justified, if we cannot fix the "fase negative" by sequencing again?

ADD REPLYlink written 3 months ago by songzewei10

But how should we deal with the uneven stat power among samples with different depth?

The authors argue one could use edgeR or DESeq2 (which account for differences in library sizes) to analyse microbiome data:

Fortunately, we have demonstrated that strongly-performing alternative
methods for normalization and inference are already available. In
particular, an analysis that models counts with the Negative Binomial
– as implemented in DESeq2 [13] or in edgeR [41] with RLE
normalization – was able to accurately and specifically detect
differential abundance over the full range of effect sizes, replicate
numbers, and library sizes that we simulated (Figure 6).

Is our conclusion based on the uneven depth justified, if we cannot fix the "fase negative" by sequencing again?

Of course one can sequence again to balance all library sizes to an appropriate sequencing depth, but this costs time and money. Using more powerful analysis methods is cheaper and faster.

ADD REPLYlink modified 3 months ago • written 3 months ago by h.mon28k
0
gravatar for h.mon
3 months ago by
h.mon28k
Brazil
h.mon28k wrote:

What do you mean by "larger sampling effect"?

If one has larger samples, statistical tests have more power. Hence, in the Figure 1 example, when testing the rarefied counts there is no difference, but when testing with the original counts, there is a statistically significant difference - it is showing a false negative when using rarefied data.

You interpret the "totally differnt stat outcome if random sampling is not applied" by considering the statistical power associated with sample sizes, which means there is no paradox about a test yielding positive results with larger sample sizes, and negative results with smaller sample sizes.

ADD COMMENTlink written 3 months ago by h.mon28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2008 users visited in the last hour