How to export normalized read count in RLE method from edgeR?
3
0
Entering edit mode
7.7 years ago
ivivek_ngs ★ 5.1k

I want to create the normalized read count with the RLE method from edgeR or DESeq.

Below is the commands am using to normalize my raw read count for edgeR suite with RLE, but how to output that matrix with normalized counts? Am unable to find the object for which the table has to be called.

library(edgeR)

## filtering genes with low expression so filtering gene with a normal read count less than 50

keep <- rowSums(x)>50
x <- x[keep,]
dim(x)

norm_factors=calcNormFactors(as.matrix(x),method="RLE")

group <- c(rep("A",24),rep("B",25),rep("C",23))

y <- DGEList(counts=x,group=group,norm.factors=norm_factors)

design <- model.matrix(~group)


Now I want to extract the normalized count matrix from here. How do I do that?

Also the default normalization in DESeq is RLE method. I want to extract the normalized matrix by DESeq, by any one method I want to get the normalized matrix? how do I do that? below is the DESeq code

cds<-newCountDataSet(x, c(rep("A",25),rep("B",24),rep("C",23)))

disp<-estimateSizeFactors(cds)

disp1<-estimateDispersions(disp)


How do I now extract the matrix of normalized counts for my samples?

So can anyone help me how I can extract the normalized read count matrix from the above methods? It would be very helpful for me then.

next-gen edgeR RNA-Seq • 9.6k views
3
Entering edit mode
7.7 years ago

With DESeq2 (since there's no reason to use DESeq without a good reason):

counts(disp, normalized=T)
#or
counts(disp1, normalized=T)


These will give you the same result. BTW, there's no reason to duplicate the disp object like you did.

Edit: The following is wrong. There's no simple way to get normalized counts exactly like DESeq2 in edgeR without recalculating them. Having said that, a CPM can serve a similar purpose.

With edgeR:

1e6*cpm(y, normalized.lib.sizes=T)

0
Entering edit mode

Thanks @Devon Ryan, but however I see both giving different normalized read counts? Ideally it should be the same right? since the DESeq default is RLE method for normalization and am forcing in edgeR the same. So can you tell me why the two outputs are coming different? Is there anything wrong in my edgeR code for having normalized read counts with RLE method?

0
Entering edit mode

They implement RLE slightly differently, so the floating point arithmetic could lead to slight differences. Also, it looks like cpm() multiplies the the normalization factor by the library size, which is appropriate for TMM but probably not RLE. If you do the following in edgeR, are the results more similar to what you get in DESeq(2): as.matrix(y$counts)/y$samples$norm.factors? ADD REPLY 0 Entering edit mode For DESEq with default RLE below is the normalized read count for a gene for just 3 samples from the matrix am showing SAMD11 45.635237 399.248992 45.45464  With the edge with as.matrix(y$counts)/y$samples$norm.factors

SAMD11 34.324944  138.813873  15.212080


Below is the raw read count for the same

SAMD11 42         205            14


It seems a bit different but quite nearby. I was expecting the output to be same for both edgeR and DESeq if the normalization is RLE . But this seems to be quite similar . What do you think @Devon Ryan?

0
Entering edit mode

Upon looking at what edgeR is doing internally, I don't see a simple way to get the equivalent of simple normalized counts as you would get from DESeq2. edgeR will get the same size factor/normalization factor, but then it divides it be the library size and recenters it around 1 before storing it, whereas DESeq2 just stores it as is. Having said that, cpm is also useful for graphing, which should be all that you're using normalized counts for anyway.

1
Entering edit mode
7.7 years ago
Gordon Smyth ★ 4.4k

Well, if you can explain what you mean by a "normalized read count", then I will be able to tell you how to get it from edgeR.

What you do think a normalized count is? Normalized for what? What would you use it for? Why not use the cpm() or rpkm() functions of edgeR, which normalize the counts for library size, compositional differences and gene length?

I have been trying to dissuade users from the using the term "normalized count" since edgeR first went public, because it seems to me to be vague and unhelpful. Better to use specific names for specific quantities. The edgeR User's Guide explains that normalization in edgeR is something that is applied to the fitted models rather than to the counts.

0
Entering edit mode

I am actually trying to perform an analysis of some samples with different tools like edgeR , cuffdiff and DESeq and our own stand alone algorithm for performance measurement. So across all I want to work with the read counts and not fpkm and I want to perform the same normalization as that of DESEq on all the algorithms which is the default normalization as the RLE(relative log expression). I want to have the matrix of these RLE normalized read and use it as input for our algorithm and then compare the performances across different replicates. Now for both edgeR and DESeq RLE normalization can be performed but when I was trying to do that on my samples I found different output for edgeR RLE read count vs DESeq RLE read count. So I was wondering what is introduced by edgeR and to what extent it is different. So @Gordon Smyth can you tell me from the above results of the gene why we see the gene? Is it because the edgeR again divides the genes with the library size right?

1
Entering edit mode

If you have your own method then you should really be able to go through the edgeR code and follow what it's doing (if you can't, you shouldn't be writing your own method).

0
Entering edit mode

@Devon Ryan, I did not write the algorithm nor did I write the code for the in house method. I am using it for analysis and performance measurement. It is developed in our lab but not by me. So I want to test its performances across different algorithms.

1
Entering edit mode

There is no difference between RLE in edgeR and the normalization method in DESeq. You are just creating problems by hacking the code in an inappropriate way. There is no such thing as an "RLE read count" in edgeR, so it is meaningless to say that these are different between the packages.

To run edgeR or DESeq correctly, you simply have to follow the documentation.

To make input for your own lab's in-house algorithm, that is your own responsibility.

0
Entering edit mode
7.6 years ago

The Biostatistics paper that introduces the method used to estimate the negative binomial dispersion provides a little more background about what is going on compared to the rather sketchy information scattered through the EdgeR manual.