Question

DESEQ analysis contain rlog formation?

0

Entering edit mode

6.2 years ago

soojinima ▴ 10

Hi. I am a 'R' starter. And recently, I analysed GEO database about my research fields. I have a question about analytic method. The analytic method of DEseq2 contain rlog formation? To reduce the amount of heteroskedasticity, many analytic methods contain several means to shrink the variance of low read count. So people convert raw data by rlog formation instead of log2 formation. I'm curious about the function of DEseq2 contain this rlog formation or I should convert these DEseq 's output data to rlog Data? Thank you. I look forward to your precious reply.

RNA-Seq DESeq2 rlog raw counts • 2.7k views

ADD COMMENT • link 6.2 years ago by soojinima ▴ 10

2

Entering edit mode

DEseq2 has a separate rlog function which you can use to transform your count data. You can supply it a matrix of counts or a deseq2 dataset. The default DESeq function will not give you rlog transformed data.

ADD REPLY • link 6.2 years ago by kautilya ▴ 430

0

Entering edit mode

Thank you for your kindly reply. It's a basic concept, but I do not know well, so can I ask you a question? If the default DESeq function will not give me rlog transformed data, and the data used in the deseq analysis are not rlog-transformed, so the heteroskedasticity is high, can I interpret the results of log2FlodChange of DEseq data as it is? (for example, A gene was increased by 2 times as compared to B gene in specific condition)

If so, is the rlog transformation just offered not as differential expression estimation but as separate functionality which can be used for visualization, clustering?

ADD REPLY • link 6.2 years ago by soojinima ▴ 10

score 3 · Answer 1 · 2018-02-26

In order to normalise data, DESeq2 estimates 2 things:

size factors, which help to deal with differences in library sizes across samples
Dispersion parameters, which help to deal with heteroskedasticity, amongst other things

DESeq2 models raw RNA-seq counts as a negative binomial distribution and it is through this model that it derives P values and other statistics via the Wald test (applied to the model for each gene). During this process, log base 2 fold-changes are also 'shrunk' in order to deal with biased fold-change differences that can be observed when comparing low-count transcripts.

Statistics are not derived from the regularised log transformation. This transformation is mainly introduced for downstream plotting functions, like heatmaps, etc.