Question: How to export normalized read count in RLE method from edgeR?
0
gravatar for ivivek_ngs
4.5 years ago by
ivivek_ngs4.7k
Seattle,WA, USA
ivivek_ngs4.7k wrote:

I want to create the normalized read count with the RLE method from edgeR or DESeq.

Below is the commands am using to normalize my raw read count for edgeR suite with RLE, but how to output that matrix with normalized counts? Am unable to find the object for which the table has to be called.

library(edgeR)

x <- read.delim("my_path/raw_counts.txt",row.names="symbol")

## filtering genes with low expression so filtering gene with a normal read count less than 50

keep <- rowSums(x)>50
x <- x[keep,]
dim(x)

norm_factors=calcNormFactors(as.matrix(x),method="RLE")

 group <- c(rep("A",24),rep("B",25),rep("C",23))

 y <- DGEList(counts=x,group=group,norm.factors=norm_factors)

design <- model.matrix(~group)

Now I want to extract the normalized count matrix from here. How do I do that? 

 

Also the default normalization in DESeq is RLE method. I want to extract the normalized matrix by DESeq, by any one method I want to get the normalized matrix? how do I do that? below is the DESeq code

cds<-newCountDataSet(x, c(rep("A",25),rep("B",24),rep("C",23)))

disp<-estimateSizeFactors(cds)

disp1<-estimateDispersions(disp)

How do I now extract the matrix of normalized counts for my samples?

So can anyone help me how I can extract the normalized read count matrix from the above methods? It would be very helpful for me then.

rna-seq edger next-gen • 6.7k views
ADD COMMENTlink modified 4.5 years ago by simon rayner0 • written 4.5 years ago by ivivek_ngs4.7k
3
gravatar for Devon Ryan
4.5 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

With DESeq2 (since there's no reason to use DESeq without a good reason):

counts(disp, normalized=T)
#or
counts(disp1, normalized=T)

These will give you the same result. BTW, there's no reason to duplicate the disp object like you did.

 

Edit: The following is wrong. There's no simple way to get normalized counts exactly like DESeq2 in edgeR without recalculating them. Having said that, a CPM can serve a similar purpose.

With edgeR:

1e6*cpm(y, normalized.lib.sizes=T)

 

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Devon Ryan88k

Thanks @Devon Ryan , but however I see both giving different normalized read counts? Ideally it should be the same right? since the DESeq default is RLE method for normalization and am forcing in edgeR the same. So can you tell me why the two outputs are comnig different? Is there anything wrong in my edgeR code for having normalized read counts with RLE method?

ADD REPLYlink written 4.5 years ago by ivivek_ngs4.7k

They implement RLE slightly differently, so the floating point arithmetic could lead to slight differences. Also, it looks like cpm() multiplies the the normalization factor by the library size, which is appropriate for TMM but probably not RLE. If you do the following in edgeR, are the results more similar to what you get in DESeq(2): as.matrix(y$counts)/y$samples$norm.factors?

ADD REPLYlink written 4.5 years ago by Devon Ryan88k

For DESEq with default RLE below is the normalized read count for a gene for just 3 samples from the matrix am showing

SAMD11  45.635237 399.248992  45.45464

With the edge with as.matrix(y$counts)/y$samples$norm.factors

SAMD11 34.324944  138.813873  15.212080 

Below is the raw read count for the same

SAMD11 42         205            14

It seems  a bit different but quite nearby. I was expecting the output to be same for both edgeR and DESeq if the normalization is RLE . But this seems to be quite similar . What do you think  @ Devon Ryan?

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by ivivek_ngs4.7k

Upon looking at what edgeR is doing internally, I don't see a simple way to get the equivalent of simple normalized counts as you would get from DESeq2. edgeR will get the same size factor/normalization factor, but then it divides it be the library size and recenters it around 1 before storing it, whereas DESeq2 just stores it as is. Having said that, cpm is also useful for graphing, which should be all that you're using normalized counts for anyway.

ADD REPLYlink written 4.5 years ago by Devon Ryan88k
1
gravatar for Gordon Smyth
4.5 years ago by
Gordon Smyth690
Australia
Gordon Smyth690 wrote:

Well, if you can explain what you mean by a "normalized read count", then I will be able to tell you how to get it from edgeR.

What you do think a normalized count is? Normalized for what? What would you use it for? Why not use the cpm() or rpkm() functions of edgeR, which normalize the counts for library size, compositional differences and gene length?

I have been trying to disuade users from the using the term "normalized count" since edgeR first went public, because it seems to me to be vague and unhelpful. Better to use specific names for specific quantities. The edgeR User's Guide explains that normalization in edgeR is something that is applied to the fitted models rather than to the counts.

ADD COMMENTlink written 4.5 years ago by Gordon Smyth690

I am actually trying to perform an analysis of some samples with different tools like edgeR , cuffdiff and DESeq and our own stand alone algorithm for performance measurement. So across all I want to work with the read counts and not fpkm and I want to perform the same normalization as that of DESEq on all the algorithms which is the default normalization as the RLE(relative log expression). I want to have the matrix of these RLE normalized read and use it as input for our algorithm and then compare the performances across different replicates. Now for both edgeR and DESeq RLE normalization can be performed but when I was trying to do that on my samples I found different output for edgeR RLE read count vs DESeq RLE read count. So I was wondering what is introduced by edgeR and to what extent it is different. So @Gordon Smyth  can you tell me from the above results of the gene why we see the gene? Is it because the edgeR again divides the genes with the library size right?

ADD REPLYlink written 4.5 years ago by ivivek_ngs4.7k
1

If you have your own method then you should really be able to go through the edgeR code and follow what it's doing (if you can't, you shouldn't be writing your own method).

ADD REPLYlink written 4.5 years ago by Devon Ryan88k

@ Devon Ryan, I did not write the algorithm nor did I write the code for the in house method. I am using it for analysis and performance measurement. It is developed in our lab but not by me. So I want to test its performances across different algorithms. 

ADD REPLYlink written 4.5 years ago by ivivek_ngs4.7k
1

There is no difference between RLE in edgeR and the normalization method in DESeq. You are just creating problems by hacking the code in an inappropriate way. There is no such thing as an "RLE read count" in edgeR, so it is meaningless to say that these are different between the packages.

To run edgeR or DESeq correctly, you simply have to follow the documentation.

To make input for your own lab's in-house algorithm, that is your own responsibility.

ADD REPLYlink written 4.5 years ago by Gordon Smyth690
0
gravatar for simon rayner
4.5 years ago by
China
simon rayner0 wrote:

@ vchris_ngs

The Biostatistics paper that introduces the method used to estimate the negative binomial dispersion provides a little more background about what is going on compared to the rather sketchy information scattered through the EdgeR manual. 

here

http://biostatistics.oxfordjournals.org/content/9/2/321.long

ADD COMMENTlink written 4.5 years ago by simon rayner0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour