edgeR normalization method
2
2
Entering edit mode
8.4 years ago
schelarina ▴ 50

Hello,

I am using edgeR for DEG analysis. I am trying different methods for normalization: "RLE", "TMM", "upperquartile". The three methods give me similar results, but these results do not make any sense for the type of comparisons that I am making. I mean that genes that have to be down-regulated are up-regulated.

When I used the method "none", I obtain the opposite results and these make a lot of sense. When the normalization method "none" is chosen, it means that the normalization is performed on total counts?

Does anyone still do normalization on total counts with edgeR? I read that this type of normalization is not reccomended, but still in my case it looked the most appropriate.

Thanks

RNA-Seq • 11k views
ADD COMMENT
0
Entering edit mode

it is inhibition of transcription

I have a similar issue, which is how to check if the inhibition of transcription worked. Could you please share how you did the normalisation and analysis?

ADD REPLY
2
Entering edit mode
8.4 years ago
Christian ★ 3.0k
Its possible that in your particular setting one major assumption of EdgeR's normalization techniques is violated: that the majority of genes is NOT differentially expressed. In this case it might make sense to actually normalize by library size (= total counts) only.
ADD COMMENT
1
Entering edit mode
8.4 years ago

If genes that should be up-regulated are down-regulated then you probably just reversed the comparison (and thus the sign on the log2 foldchange).

You should NOT use "none". If you were to submit a paper using that it should (and hopefully would) be rejected.

ADD COMMENT
0
Entering edit mode

hi, thanks for your answer. I did not reversed the comparison! For the type of treatment I have applied, genes are either not affected or are down-regulated.. So here is how I am doing the analysis.

x <- read.delim("counts.txt", stringsAsFactors=FALSE)
group <- (c(1,1,2,2)) # Group 2 is the treated and group 1 is the control.
genes <- read.delim("genes.txt")
y <- DGEList(counts=x, group=group, genes=genes, lib.sizes <- c(500000,600000,500000,650000)

y <- calcNormFactors(y, method="TMM") # or "RLE" or "none" or "upper-quartile"
y <- estimateCommonDisp(y)

et <- exactTest(y, pair=c("1", "2" )) # it is 2 (treated) vs 1 (control). With TMM, upperquartile and RLE i have similar results but some genes are up-regulated and they should not.. with « none » all genes are down-regulated or not affected. Then since « none » is not acceptable, I normalized on total counts like this

norm.factors <- (sum(y$samples$lib.size)/nrow(y$samples))/ y$samples$lib.size

y <- DGEList(counts=x, group=group, genes=genes, lib.sizes <- c(500000,600000,500000,650000), norm.factors <- (norm.factors)
y <- estimateCommonDisp(y)

et <- exactTest(y, pair=c("1", "2" ))   # gene are either down-regulated or not affected as expected

Again as you you can see I did not reversed the comparison, I don't think there are errors in the script but maybe I am wrong!? I wonder why there is so much difference in the results if I use total counts normalization, and this kind of normalization is acceptable!?

nb same results if do not put directly the library size into the dgelist object

ADD REPLY
1
Entering edit mode

1. It seems you have only 2 biological replicates, your statistical power is not that great and false positives are to be expected.

2. Gene regulation is a complex network of interactions, when you down-regulate one gene, other genes may have its expression up-regulated in response. It seems to me just wishful thinking that no up-regulated genes should be present.

ADD REPLY
0
Entering edit mode

it is inhibition of transcription, i would not expect that transcripts increase their level, unless RNA-seq is picking up degradation products.. anyway I would exclude this option because the coverage profile is the same between the two conditions, what changes is the abundancy that is low in the treated sample .. this is why I am not very convinced by the TMM normalization method, while normalizing by total counts seems more appropriate in this case...

ADD REPLY
1
Entering edit mode

From edgeR manual:

edgeR is concerned with differential expression analysis rather than with the quantitcation of expression levels.  It is concerned with relative changes in expression levels between conditions, but not directly with estimating absolute expression levels.

If you are inhibiting overall expression, but some genes are less prone to inhibition, they will be picked up as up-regulated.

ADD REPLY
1
Entering edit mode

If you're inhibiting transcription in one group then you need to normalize according to spike-ins.

ADD REPLY

Login before adding your answer.

Traffic: 2785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6