Question: Batch correction using DESeq2
gravatar for Rahil
6 weeks ago by
Rahil170 wrote:

Hi all,

I have RNAseq data (read count) of 96 mouse primary tumors with 15 different genotypes. These 96 samples are sequenced in 10 different days, however most of the data with the same genotype are sequenced at the same day. I am afraid if I do batch correction for sequencing day I also loose biological differences that exist across different genotypes. Any suggestion?

This is my script : After batch correction I see a lot change in the PCA plot

dds <- DESeqDataSetFromMatrix(as.matrix(all), colData, design = ~ Batch)

vsd <- vst(dds, blind = F)
plotPCA(vsd, "Batch")

assay(vsd) <- limma::removeBatchEffect(assay(vsd), vsd$Batch)
plotPCA(vsd, "Batch")

Part of colData:

  Genotype condition      Batch
1        A   primary 2017-06-29
2        A   primary 2017-06-29
3        A   primary 2017-06-29
4        A   primary 2017-06-29
5        A   primary 2017-06-29
6       AK   primary 2017-11-09
7       AK   primary 2017-11-09
8       AK   primary 2017-11-09
9       AP   primary 2018-04-18
10     AP   primary 2018-04-18
11     AP   primary 2018-04-18
12     AKP   primary 2019-09-12
13     AKP   primary 2019-09-12
14     AKP   primary 2019-09-12

I also look at these questions:

Batch correction in DESeq2

DESeq2, batch effect correction, multiple conditions

Batch effect problem DEG, DESseq2

But still not sure what should I do, I really appreciate any help!

vst rna-seq deseq2 batch • 146 views
ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by Rahil170
gravatar for swbarnes2
6 weeks ago by
United States
swbarnes28.2k wrote:

If all your samples are primary, that doesn't belong in the ColData. Just drop it.

The dates you have given are totally deeply confounded with your genotype. So you have to drop them too. If they really represent sequencing dates, then they aren't adding any technical artifacts. If they represent day of RNA extraction, or day of library prep, then you are in deep trouble, because those do impact RNASeq results, and you will have no way of knowing which changes are due to tumor type, and which are due to prep date for tumor types with different dates.

You know your column headers don't have to literally be Condition and Genotype, right?

ADD COMMENTlink modified 6 weeks ago • written 6 weeks ago by swbarnes28.2k

Many thanks swbarnes for your prompt reply! Yes, they are all primary tumors but with different genotypes.

Sorry I don't get your question. I named the headers. What they should be?

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Rahil170

You cannot make use of a column where every single sample has the same value. There is no point in it being there.

You cannot get rid of or account for batch effect in the dataset you posted, because it is deeply confounded with genotype. You can't make use of it, except as a guide to which genotype comparisons aren't confounded by batch, and which ones are.

However, if 1) All the RNA was extracted on the same day 2) All the libraries were prepped on the same day 3) the dates really are just the instrument run date, you can safely ignore that date, because running libraries on different days does not cause a batch effect.

ADD REPLYlink written 6 weeks ago by swbarnes28.2k

If I ignore that date, is it correct to add only genotype to the design formula to account for its effect? This script is correct for normalizing the data?

dds <- DESeqDataSetFromMatrix(as.matrix(all), colData, design = ~ Genotype)
vsd <- vst(dds, blind = F)

I really appreciate your time and help!

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Rahil170

That command line doesn't normalize anything. Normalizing doesn't take your design into account at all. But ~ Genotype is the only design you should be using with that colData.

ADD REPLYlink written 6 weeks ago by swbarnes28.2k

Oh, the second line of my script was left, sorry. I edited my post. Thanks!

ADD REPLYlink modified 6 weeks ago • written 6 weeks ago by Rahil170
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 906 users visited in the last hour