Hi. I am a 'R' starter. And recently, I analysed GEO database about my research fields. I have a question about analytic method. The analytic method of DEseq2 contain rlog formation? To reduce the amount of heteroskedasticity, many analytic methods contain several means to shrink the variance of low read count. So people convert raw data by rlog formation instead of log2 formation. I'm curious about the function of DEseq2 contain this rlog formation or I should convert these DEseq 's output data to rlog Data? Thank you. I look forward to your precious reply.
In order to normalise data, DESeq2 estimates 2 things:
- size factors, which help to deal with differences in library sizes across samples
- Dispersion parameters, which help to deal with heteroskedasticity, amongst other things
DESeq2 models raw RNA-seq counts as a negative binomial distribution and it is through this model that it derives P values and other statistics via the Wald test (applied to the model for each gene). During this process, log base 2 fold-changes are also 'shrunk' in order to deal with biased fold-change differences that can be observed when comparing low-count transcripts.
Statistics are not derived from the regularised log transformation. This transformation is mainly introduced for downstream plotting functions, like heatmaps, etc.