Question: How does regularized log transformed data from DESeq2 relate to read counts?
1
gravatar for amymccarthy91
2.8 years ago by
amymccarthy9110 wrote:

Hi,

I have used DESeq2 to analyse some RNAseq data and am using regularized log transformed data (rlog) output from DESeq2 to plot heatmaps.

However I am struggling to understand how the rlog output relates to the actual number of reads obtained for that gene. Can anyone explain this to me? For example, if I have a regularized log value of -1 for a particular gene, does this mean that I had less than 1 read for that particular gene in my sample (if so, how can this be)?

The reason I would like to understand this is I want to set a threshold for gene expression, so I am only looking at well expressed genes in my analysis. Is this possible with regularized log transformed data? Or would it be better to use a metric like RPKM? I was under the impression that working with regularized log transformed data is more advisable than RPKM, but perhaps in this instance RPKM is more appropriate.

Many thanks in advance for your help, Amy

rna-seq R • 3.5k views
ADD COMMENTlink modified 2.8 years ago by tarek.mohamed260 • written 2.8 years ago by amymccarthy9110
0
gravatar for tarek.mohamed
2.8 years ago by
tarek.mohamed260
tarek.mohamed260 wrote:

Hi rlog function produces log2 scale which has been normalized with respect to library size. So for example a P-value of 0.05 is -4.3 on the log2 scale, and P-value of 0.005 is -7.6. gene expression of 100 read counts will be 6.6 on log2 scale.

Tarek

ADD COMMENTlink modified 2.8 years ago • written 2.8 years ago by tarek.mohamed260
1

It's not the P-values that are on the log2 scale, it's the number of counts.

ADD REPLYlink written 2.8 years ago by mastal5112.0k

I am just giving an example by the P-value! you can replace the word "P-value" with anything! gene expression of 100 read counts will be 6.6 on log2 scale.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by tarek.mohamed260

Hi Tarek, Thanks for your answer. So is the rlog is essentially log2(counts/library size)? If so, how can I have positive rlog values (since the number of read counts must always be less than the library size?) Amy

ADD REPLYlink written 2.8 years ago by amymccarthy9110

Hi Amy, DESeq2 performs normalization where geometric mean is calculated for each gene across all samples. The counts for a gene in each sample is then divided by this mean. The median of these ratios in a sample is the size factor for that sample.

Tarek

ADD REPLYlink written 2.8 years ago by tarek.mohamed260

Hi Tarek, Okay, that helps explain it. Thanks for your help! Cheers, Amy

ADD REPLYlink written 2.8 years ago by amymccarthy9110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1023 users visited in the last hour