Question: output TMM normalized counts with edgeR
gravatar for guillaume.rbt
15 months ago by
guillaume.rbt620 wrote:

Hi all,

Sorry I know that this question has been asked several times, but unfortunately I haven't been able to find the right answer, or didn't understand.

I'm trying to get TMM normalized counts thanks to edgeR.

I understand that I have to compute normalization factors :

dgList <- calcNormFactors(dgList, method="TMM")

which gives me a normalization factor for all samples :


group lib.size norm.factors
S1     1 21087314    0.9654794
S2     1 16542810    1.1589117
S3     1 18875473    0.8763291
S4     1 15865414    1.0864038
S5     1 19179795    1.0488230
S6     1 15063992    1.0707007

But at this step I don't know what to do to get a matrix of normalized TMM counts.

I know that I can get CPM normalized counts thanks to :


But CPM and TMM are not the same, right ?

Thanks in advance for any of your input on this topic.

ADD COMMENTlink modified 15 months ago by James Ashmore2.7k • written 15 months ago by guillaume.rbt620
gravatar for lieven.sterck
15 months ago by
VIB, Ghent, Belgium
lieven.sterck5.8k wrote:

No, CPM and TMM are not exactly the same indeed.

perhaps try this snippet of code:

dgList <- estimateCommonDisp(dgList)
dgList <- estimateTagwiseDisp(dgList)
norm_counts.table <- t(t(dgList$pseudo.counts)*(dgList$samples$norm.factors))
write.table(norm_counts.table, file="./normalizedCounts.txt", sep="\t", quote=F)
ADD COMMENTlink written 15 months ago by lieven.sterck5.8k

Thank for your help.

Could you explain me what the "pseudo.counts" are?

ADD REPLYlink modified 15 months ago • written 15 months ago by guillaume.rbt620

There's no need to calculate the TMM values yourself, the cpm function should do it for you given a DGEList with the lib.size and norm.factors columns present (which you get after running calcNormFactors).

ADD REPLYlink modified 15 months ago • written 15 months ago by James Ashmore2.7k

ADD REPLYlink modified 4 months ago • written 4 months ago by iraia.munoa70
gravatar for James Ashmore
15 months ago by
James Ashmore2.7k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.7k wrote:

If you run the cpm function on a DGEList object which contains TMM normalisation factors then you will get TMM normalised counts. Here is a snippet of the source code for the cpm function:

cpm.DGEList <- function(y, normalized.lib.sizes=TRUE, log=FALSE, prior.count=0.25, ...)
#   Counts per million for a DGEList
#   Davis McCarthy and Gordon Smyth.
#   Created 20 June 2011. Last modified 10 July 2017
    lib.size <- y$samples$lib.size
    if(normalized.lib.sizes) lib.size <- lib.size*y$samples$norm.factors

The function checks to see if a DGEList object was provided with a lib.size and norm.factors column (created when you run calcNormFactors), if so then it uses those in the normalisation of the raw counts. You were right in your original post, just run the following and you will have TMM normalised counts:

dge <- calcNormFactors(dge, method = "TMM")
tmm <- cpm(dge)
ADD COMMENTlink modified 15 months ago • written 15 months ago by James Ashmore2.7k

Ok great, I did think there was something with the cpm function, but I get it know.

ADD REPLYlink written 15 months ago by guillaume.rbt620

Just to again ensure that people understand:

cpm() produces log2-transformed normalised counts per million (log2 CPM). These are not the TMM-normalised counts. Take a look at lieven.sterck's answer.

In RNA-seq, the data processing steps usually go:

  1. raw counts
  2. normalised counts
  3. transformed, normalised counts (the transformatio varies from program to program)
ADD REPLYlink modified 15 days ago • written 15 days ago by Kevin Blighe48k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour