Question: How to normalized TPM with TMM method?
gravatar for xiaoguang
13 months ago by
xiaoguang20 wrote:

We can use edgeR to normalized raw read counts with TMM method, but how can we normalized TPM with the same method?

tmm edger rna-seq tpm • 1.1k views
ADD COMMENTlink modified 12 months ago by Gordon Smyth1.8k • written 13 months ago by xiaoguang20
gravatar for Gordon Smyth
12 months ago by
Gordon Smyth1.8k
Gordon Smyth1.8k wrote:

I don't quite follow your question. TMM is a method for normalizing the library sizes rather than a method for normalizing read counts. As the edgeR User's Guide says (page 15):

normalization in edgeR is model-based, and the original read counts are not themselves transformed.

Which way around is your question? Do you have TPMs and want to compute TMM factors or do you have TMM factors and want to compute TPMs?

If you are asking the first question, then no, TMM factors can only be computed from the raw counts, not from quantities such as TPMs or CPMs from which the library sizes have already been divided out. If you already have TPMs from some software package, then normalization has almost certainly already been applied, so I would be very wary about trying to re-normalize them unless you really know what you're doing.

If you are asking the second question then, yes, TMM factors can in principle be used to compute TPMs. In edgeR, any downstream quantity that is computed from the library sizes will incorporate the TMM factors automatically, because the factors are considered part of the effective library sizes. TMM normalization factors will be applied automatically when you use

CPM <- cpm(dge)


RPKM <- rpkm(dge)

in edgeR to compute CPMs or RPKMs from a DGEList object. I don't necessarily recommend TPM values myself, but if you go on to compute TPMs by

TPM <- t( t(RPKM) / colSums(RPKM) ) * 1e6

then the TMM factors will naturally have been incorporated into the computation.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Gordon Smyth1.8k

Would that not be an idea if you wanted to do a inter-library normalization of of the TPM values? Kinda like what's happing under the hood of edgeR::cpm() which allows the CPM values to be normalized according to the TMM normalization factors - just for TPM values?

ADD REPLYlink written 12 months ago by kristoffer.vittingseerup3.4k

TMM normalizes library sizes and can only be applied to counts. Any downstream quantity such as CPMs or TPMs that are computed from library sizes will obviously incorporate the TMM normalization, but that is not the same thing as trying to estimate the TMM factors from the CPMs or TPMs.

In my opinion, normalization should be done prior to the computation of TPMs but, even if you wanted to re-normalize TPMs for some reason, one would not input them to the calcNormFactors function in edgeR, which is how I interpreted OP's question.

Or perhaps I have mis-interpreted the question. Perhaps you and the OP are actually asking a simpler question about whether TMM-normalized library sizes can be used to compute TPMs. That is of course true, and I have edited my answer now to say that.

ADD REPLYlink modified 12 months ago • written 12 months ago by Gordon Smyth1.8k

But is the main goal of the inter-library normalisation not simply to make sure the expression/abundance distributions from multiple samples are comparable? I agree that it would be a nicer approach to re-calcuate the TPM values from the TMM normalized counts. But if we ignore that possibility for a moment I'm not sure I see why inter-library normalisation of TPM values are a bad idea - would it not just produce abundance estimates that are more comparable across conditions? Or does TMM normalisation itself assumes something about the data which does not hold true for TPM values?

ADD REPLYlink written 11 months ago by kristoffer.vittingseerup3.4k

Just out of curiosity I tried it and it does look like there is some improvement on a dataset I already had open as can be seen here. Naturally this is not a "bad" dataset which needs it - but sometimes it could be needed (as TPM normalization does not account for e.g. different levels of rRNA contamination).

ADD REPLYlink modified 11 months ago • written 11 months ago by kristoffer.vittingseerup3.4k

I don't know what you are referring to when you say that you "tried it", or what type of data you have applied it to, or what you mean by "some improvement". I am not going to continue this discussion.

I have tried to answer OP's question as completely as I can. I have nothing to add to what I have already written.

ADD REPLYlink modified 11 months ago • written 11 months ago by Gordon Smyth1.8k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1747 users visited in the last hour