Question

Does normalization in edgeR conflict with tximeta-generated offset matrix?

1

Entering edit mode

2.8 years ago

skjw1029 ▴ 70

After aligning my raw RNA-Seq reads with Salmon, I imported the quant files into R and created a count matrix with annotations using tximeta (creating a SummarizedExperiment object).

Afterwards I built a DGEList object in order to pass it onto edgeR, using the tximeta function makeDGEList. This creates an offset matrix, which I believe normalizes with respect to average transcript length.

Now, when I tried normalize the data in edgeR using calcNormFactors, it gives me this warning:

LL_normalized <- calcNormFactors(LL_genefiltered, method = "TMM")

Warning message: In calcNormFactors.DGEList(LL_genefiltered, method = "TMM") : object contains offsets, which take precedence over library sizes and norm factors (and which will not be recomputed).

It appears the offset matrix from tximeta and the normalization attempted by edgeR conflict with one another, and they have different norm factor values.

I would like to normalize by sequencing depth, RNA composition (effective library size), and gene length.

So my question is, does the offset matrix created by tximeta's makeDGEList function account for all these factors (I don't think it does, only for gene length), or do I need to pass on the DGEList object without an offset matrix and let edgeR take care of the normalization(if so, how?), or is there a way to make them work together?

Of note, the tximport vignette warns not to manually pass the original gene-level counts to downstream methods without an offset. Perhaps this was assuming that no other normalization would be done after tximport/tximeta, and can be disregarded?

Normalization tximeta RNA-Seq edgeR • 1.0k views

ADD COMMENT • link updated 2.8 years ago by ATpoint 82k • written 2.8 years ago by skjw1029 ▴ 70

score 3 · Accepted Answer · 2021-07-05

The offset matrix includes both correction for average transcript length and library size / composition.

See here the code: https://github.com/mikelove/tximeta/blob/master/R/dgelist.R#L12
It includes the calcNormFactors already.

Here is a more commented version what the steps do at tximport vignette: https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#edgeR

That means you don't have to do anything towards normalization, the DGEList from tximeta is ready to go. The next logical step for edgeR would be filterByExpr and then the DE testing routine as described in the edgeR manual.