Design in DESeq2: measruing effects of treatment while adjusting for sample type
2
0
Entering edit mode
15 months ago
Ridha ▴ 60

Hello everyone,

I’m doing differential expression analysis using DESeq2 of 6 heart cells(iPSC-CM) that were either treated with iron or not (treatment). The issue is that these 6 samples were taken from 3 different individuals and I am not interested in differences between individuals(sample), but the effect of iron on these heart cells. From the PCA plot, however, it seems that the samples are not separated by the intervention(FE vs AA), but rather by the sample type(https://ibb.co/fvH0xmh) despite that I have controlled for the samples in my design(see code below). From my simple understanding of DESeq2, the code I wrote indicates that the effect of treatment will be measured correcting for the sample type and that's what I want. Even if I switch the design, I still get the same clustering pattern. Am I missing something? I don't think it's because of the data because another study using the same dataset found clustering based on treatment and not individuals.

Thank you very much in advance for your help!

The code

#Overview of the count data
df_io[,1:6]
      X273.AA X273.FE X15.AA X15.FE X2.AA X2.FE
DDX11L1         0       0      0      0     0     0
WASH7P         46      56     64     79    69    68
MIR6859-1       2       0      2      0     2     2
MIR1302-2       0       0      0      0     0     5
FAM138A         0       0      0      0     0     0
OR4G4P          0       0      0      0     0     0

Making my metadata

treatment<-as.factor(c("AA","FE","AA",
                     "FE","AA","FE")) # AA= control, FE= intervention
cell_type<- as.factor(c("HiPSC273","HiPSC273","HiPSC15",
          "HiPSC15","HiPSC2","HiPSC2")) # indicating the sample type(from which individual it came from)
metadata<-data.frame(treatment,cell_type)
# matching the columnames of count  data to rownames of metadata
rownames(metadata)<-colnames(df_io)
#### checking #### 
all(rownames(metadata)==colnames(df_io)) # was true

Now making the model and extracting normalized counts to make PCA plot

df_deseq2_ijzer<-DESeqDataSetFromMatrix(countData = df_io,
                                     colData = metadata,
                                     design = ~cell_type+treatment) 
dd_normalizated<-estimateSizeFactors(df_deseq2_ijzer2)

normalized_counts_ijzer<-counts(dd_normalizated,normalized=T)
qc_log_ijzer2<-varianceStabilizingTransformation(dd_normalizated,blind = T)

Making the PCA plot

plotPCA(qc_log_ijzer2,intgroup=c("treatment","cell_type"))+theme_bw()
RNA-Seq R • 765 views
ADD COMMENT
1
Entering edit mode
15 months ago

I think doing vst with blind = T makes the PCA ignore your design. So I don't think you are correcting for anything.

ADD COMMENT
1
Entering edit mode
15 months ago
ATpoint 60k

The vst is not expected to remove these effects, even if properly used with blind=FALSE

Quoted from http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-after-vst-are-there-still-batches-in-the-pca-plot

The transformations implemented in DESeq2, vst and rlog, compute a variance stabilizing transformation which is roughly similar to putting the data on the log2 scale, while also dealing with the sampling variability of low counts. It uses the design formula to calculate the within-group variability (if blind=FALSE) or the across-all-samples variability (if blind=TRUE). It does not use the design to remove variation in the data. It therefore does not remove variation that can be associated with batch or other covariates (nor does DESeq2 have a way to specify which covariates are nuisance and which are of interest).

It is possible to visualize the transformed data with batch variation removed, using the removeBatchEffect function from limma. This simply removes any shifts in the log2-scale expression data that can be explained by batch. The paradigm for this operation for designs with balanced batches would be:

mat <- assay(vsd)
mat <- limma::removeBatchEffect(mat, vsd$batch)
assay(vsd) <- mat
plotPCA(vsd)
ADD COMMENT
0
Entering edit mode

Thank you very much for your help! I really appreciate it. I did what you said and I got a better PCA plot https://ibb.co/f4zKYDD. However, the RNA-Seq dataset I am using was already used for publication(https://www.sciencedirect.com/science/article/pii/S2211124720308676#bib11) and the PCA plot they reported is different than the one I got (https://ibb.co/RY7HG3s) with also different differentially expressed genes. This makes me wonder whether removing batch effects is the same as adjusting for them since removing the batch effects using limma resulted in a different PCA plot.

ADD REPLY
0
Entering edit mode

Getting the vst or rlog normalized counts using your formula will not output counts that are controlled for batch effect. However, the regression model for differential expression will have batch correctly controlled for. What @ATpoint is showing you how to do is make a PCA plot where batch effect is controlled for, for in appearance it is closer to what your regression model looks like.

ADD REPLY
0
Entering edit mode

Thank you very much for your reply. I think I got what you mean. However, how would you explain the discrepancies between the PCA plot I produced following ATpoint suggestion and the PCA plot produced from the original paper, knowing that they also used DESeq2 for RNA-seq analysis? I was expecting a similar PCA plot and a similar number of DEGs, but that was not the case.

ADD REPLY
0
Entering edit mode

You need the exact code they used and you need to know which data they used for the PCA plus on which genes it is based. Otherwise you cannot make any statements on why it is dissimilar.

ADD REPLY

Login before adding your answer.

Traffic: 1832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6