Question

How to retrieve the batch corrected data frame when using Deseq in R?

0

Entering edit mode

2.3 years ago

BlueSky ▴ 10

I have several different RNAseq dataframes that I have merged together; they are from different studies and are raw counts. I want to correct the merged dataframe for study batch effects without getting negative values (I have already tried the "remove batch effect" function from Limma, which gave some negative values).

I have read that I can use Deseq2 to avoid this, and I have therefore used design = ~ studyID + condition in the DESeqDataSetFromMatrix function in order to batch correct for the different studies the RNAseq data comes from.

I am going to use the batch corrected dataframe in another analyse (a pipeline that normalize the data from 0-1) and I need to retrieve the batch corrected data frame from the output from deseq. How do I do this, do I just call counts(deseq, normalized = TRUE) as they do in this post: How to recover treated/control count from DESeq2 output , is this the batch corrected version of the data frame or is there another function?

Ideally I want to retrieve the raw data counts with batch correction, so that I can normalize it later with values between 0 and 1.

Thanks in advance

RNAseq Deseq2 batch-correction normalization • 885 views

ADD COMMENT • link updated 2.3 years ago by ATpoint 82k • written 2.3 years ago by BlueSky ▴ 10

score 2 · Answer 1 · 2022-02-04

See vignette: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot

DESeq2 does not return batch-corrected data. These information are used internally as offsets only. For downstream you will need to correct for batch effects externally, e.g. removeBatchEffect or other tools. I personally like ComBat-Seq from sva as it preserves the integer nature of the data and avoids the infmous negative counts.