Question: choose between normalization techniques for OTU counts
gravatar for a1788
13 months ago by
a178810 wrote:


After having noticed some spurious results in my data set I wanted to contact this expert community here to get help with choosing the right normalization approach for my data.

I have two groups, patients and healthy controls, where microbiota OTUs have been measured from biopsies: Quality filtering was performed using SDM software and default criteria parameter adapted to the 454 sequencing platform using the LotuS pipeline. High-quality and midquality sequences were mapped to count the occurrence of OTUs in a single sample and clustering was done with UPARSE. The OTU sequences were then taxonomically assigned using Greengenes database34 (3.8, August 2013) and RDP II database35 (release version 11).

Now I want to use this data to correlate to host mRNA expression, preferably using Spearmans Ranks.

The default procedure in my lab is to normalize for sequencing depth by calculating ratios, but I think that ratios are not the ideal way to test my hypothesis, so Im looking into more useful alternatives. Also I have quite a number of columns that are either sum-zero or have very low variance, so just calculating ratios might blow up noise overproportiannly.

From all the options out there I think that Deseq2 or TMM, cumulative sum scaling or just subsampling by number of reads (multiplying all of the entries by (#reads in smallest sample)/(#reads in this sample)) would be best.

The thing is that we have a very low number of observations (around 30 per group) give difficulties of obtaining these samples, so im a bit hesitant with Deseq2.

Any input regarding this question would be highly appreciated, thanks in advance!

sequencing otu • 1.2k views
ADD COMMENTlink modified 4 weeks ago by erwan.scaon670 • written 13 months ago by a178810
gravatar for Carambakaracho
13 months ago by
Carambakaracho1.0k wrote:

Hi a1788,

I absolutely agree with your opinion on traditional ratio scaling (or rarefaction, your last suggestion). I personally use an library size scaling on the maximum library size with Box-Cox transformation. Note that I don't consider this the best approach, but I suggest reading this paper by Paul McMurdie and Susan Holmes for a great overview. Deseq2 / TMM is certainly better than fraction or rarefaction scaling.


ADD COMMENTlink written 13 months ago by Carambakaracho1.0k
gravatar for erwan.scaon
4 weeks ago by
Nantes - France
erwan.scaon670 wrote:


You should have a look at GMPR

ADD COMMENTlink written 4 weeks ago by erwan.scaon670
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1716 users visited in the last hour