Question: edgeR normalization method
2
gravatar for schelarina
3.6 years ago by
schelarina30
European Union
schelarina30 wrote:

Hello,

I am using edgeR for DEG analysis. I am trying different methods for normalization: "RLE", "TMM", "upperquartile". The three methods give me similar results, but these results do not make  any sense for the type of comparisons that I am making. I mean that genes that have to be down-regulated are up-regulated.

When I used the method "none", I obtain the opposite results and these make a lot of sense. When the normalization method "none" is chosen, it means that the normalization is performed on total counts? 

Does anyone still do normalization on total counts with edgeR? I read that this type of normalization is not reccomended, but still in my case it looked the most appropriate.

thanks.

rna-seq • 4.0k views
ADD COMMENTlink modified 3.6 years ago by Christian2.8k • written 3.6 years ago by schelarina30

it is inhibition of transcription

I have a similar issue, which is how to check if the inhibition of transcription worked. Could you please share how you did the normalisation and analysis?

ADD REPLYlink written 3.1 years ago by A. Domingues2.1k
2
gravatar for Christian
3.6 years ago by
Christian2.8k
Cambridge, US
Christian2.8k wrote:
Its possible that in your particular setting one major assumption of EdgeR's normalization techniques is violated: that the majority of genes is NOT differentially expressed. In this case it might make sense to actually normalize by library size (= total counts) only.
ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Christian2.8k
1
gravatar for Devon Ryan
3.6 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

If genes that should be up-regulated are down-regulated then you probably just reversed the comparison (and thus the sign on the log2 foldchange).

You should NOT use "none". If you were to submit a paper using that it should (and hopefully would) be rejected.
 

ADD COMMENTlink modified 3.6 years ago • written 3.6 years ago by Devon Ryan91k

hi, thanks for your answer. I did not reversed the comparison!  For the type of treatment I have applied, genes are either not affected or are down-regulated.. So here is how I am doing the analysis.

x <- read.delim("counts.txt", stringsAsFactors=FALSE)

group <- (c(1,1,2,2)) # Group 2 is the treated and group 1 is the control.

genes <- read.delim("genes.txt")

y <- DGEList(counts=x, group=group, genes=genes, lib.sizes <- c(500000,600000,500000,650000)

y <- calcNormFactors(y, method=”TMM”) # or “RLE” or “none” or “upper-quartile”

y <- estimateCommonDisp(y)

et <- exactTest(y, pair=c("1", "2" )) # it is 2 (treated) vs 1 (control). With TMM, upperquartile and RLE i have similar results but some genes are up-regulated and they should not.. with « none » all genes are down-regulated or not affected. Then since « none » is not acceptable, I normalized on total counts like this

norm.factors <- (sum(y$samples$lib.size)/nrow(y$samples))/ y$samples$lib.size

y <- DGEList(counts=x, group=group, genes=genes, lib.sizes <- c(500000,600000,500000,650000), norm.factors <- (norm.factors)

y <- estimateCommonDisp(y)

et <- exactTest(y, pair=c("1", "2" ))   # gene are either down-regulated or not affected as expected

Again as you you can see I did not reversed the comparison, I don't think there are errors in the script but maybe I am wrong!? I wonder why there is so much difference in the results if I use total counts normalization, and this kind of normalization is acceptable!?    

nb same results if do not put directly the library size into the dgelist object

 

ADD REPLYlink written 3.6 years ago by schelarina30
1

1. It seems you have only 2 biological replicates, your statistical power is not that great and false positives are to be expected.

2. Gene regulation is a complex network of interactions, when you down-regulate one gene, other genes may have its expression up-regulated in response. It seems to me just wishful thinking that no up-regulated genes should be present.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by h.mon26k

it is inhibition of transcription, i would not expect that transcripts increase their level, unless RNA-seq is picking up degradation products.. anyway I would exclude this option because the coverage profile is the same between the two conditions, what changes is the abundancy that is low in the treated sample .. this is why I am not very convinced by the TMM normalization method, while normalizing by total counts seems more appropriate in this case... 

ADD REPLYlink written 3.6 years ago by schelarina30
1

From edgeR manual:

edgeR is concerned with differential expression analysis rather than with the quantitcation of expression levels.  It is concerned with relative changes in expression levels between conditions, but not directly with estimating absolute expression levels.

If you are inhibiting overall expression, but some genes are less prone to inhibition, they will be picked up as up-regulated.

ADD REPLYlink written 3.6 years ago by h.mon26k
1

If you're inhibiting transcription in one group then you need to normalize according to spike-ins.

ADD REPLYlink modified 3.6 years ago • written 3.6 years ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 695 users visited in the last hour