Pre-processing of RNA-Seq data for WGCNA
0
0
Entering edit mode
4.6 years ago
gokce.ouz ▴ 70

Hi All,

I have 2 experimental condition and each condition has 18 samples. However, they are sequenced in 3 batches. 1 st batch 6 pair (disease & control), 2nd batch 6 pair (disease & control), 3rd batch 6 pair (disease & control).

Each batch done by different people over different times so I believe we have batch effect. As I am using DESeq2 for my analysis I set my model design as “Batch+Type”. I used removeBatchEffect from limma package and plotted PCA & did hierarchical clustering using spermann correlation (I believe it is better to use Spearman rather than the Pearson??) to see which samples are outliers. Another issue I am facing is 1st & 2nd batches are 76 bp paired end, however the last batch is 101 bp paired end.I did not add this into the design separately.

I would like to do differential gene expression analysis and WGCA. So I have multiple questions:

1- Should I use rlog() for differential gene expression analysis and getVarianceStabilizedData() for WGCNA? or can I use same normalization for both? If yes, which one do you suggest ?

2- Should I remove any samples according to these PCA plots & dendogram or just do analysis as I already included batch effect into the model design as "~Batch+Type". Or is it like for DGE analysis I do not to remove any sample but for WGCNA I should remove ? For both normalization type it looks like I have to remove the Patient-Sph4 as it look has distance from others in PCA plot also (the one at the middletop)

I really appreciate if you give me some insight,

ddsMat.pre<- DESeqDataSetFromMatrix(seMat.pre, sample.table.ERSvsPT.pre, design=~Batch+Type)
dds.pre<-DESeq(ddsMat.pre)
dds.pre <- dds.pre[ rowSums(counts(dds.pre)) > 1, ]
datExpr0<- assay(dds.pre)
gsg = goodSamplesGenes(datExpr0, verbose = 5);
gsg$allOK  RLD rld <- rlog(dds.pre, blind= FALSE) rlogMat <- assay(rld) save(rld, file="RLD_Pre_Type+Batch") design=model.matrix(~sample.table.ERSvsPT.pre$Type)
removedbatch.rld=removeBatchEffect(rlogMat, batch=sample.table.ERSvsPT.pre$Batch, design= design ) write.csv(removedbatch.rld, file="removedbatch.rld_36patient") rld.removed<-rld assay(rld.removed)<-removedbatch.rld par(mfrow = c(1,2)); cex1 = 1.5; plotPCA(rld.batch, intgroup=c("Type", "Batch")) plotPCA(rld.removed, intgroup=c("Type", "Batch")) colData(rld.removed)$Type<-as.factor(colData(rld.removed)$Type) colData(rld.removed)$Batch<-as.factor(colData(rld.removed)$Batch) data.removed<-plotPCA(rld.removed, intgroup=c("Batch","Type"), returnData=TRUE) data.removed$name<-colData(rld.removed)$Patient percentVar.removed <- round(100 * attr(data.removed, "percentVar")) p2<-ggplot(data.removed, aes(PC1, PC2, color=Batch, shape=Type, label= name)) + geom_text(size=5,hjust = 0, nudge_x = 0.05) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar.removed[1],"% variance")) + ylab(paste0("PC2: ",percentVar.removed[2],"% variance")) + ggtitle("PCA of RLD_pre_Type+Batch ") + coord_fixed() mat.rld.removed<- t(assay(rld.removed)) rownames(mat.rld.removed) <- paste( rld.removed$Written.ID)
plot(hcluster(dist(mat.rld.removed), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of  RLD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of RLD_Pre_Type+Batch.png")
dev.off()


VSD

vsd = getVarianceStabilizedData(dds.pre)
save(vsd, file="VSD_Pre_Type+Batch")

design=model.matrix(~sample.table.ERSvsPT.pre$Type) vsd.removed=removeBatchEffect(vsd, batch=sample.table.ERSvsPT.pre$Batch, design= design )

colnames(vsd.removed) <-  rld.removed\$Written.ID
plot(hcluster(dist(t(vsd.removed)), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of VSD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of vsd_Pre_Type+Batch.png")
dev.off()


RNA-Seq WGCNA removebatcheffect() DESeq2 • 4.3k views
0
Entering edit mode

What type of data did you feed into removeBatchEffect? - Normalised counts, rlog transformed, vst transformed, voom'd?

0
Entering edit mode

I tried both rlog &vsd. The 1st dendogram = the rlog transformed and batch removed, 2nd dendogram from the vsd transformed and batch removed.

0
Entering edit mode

Try Voom Transforming - That way you know that the removeBatchEffect function is getting what it expects.

0
Entering edit mode

Hi Andrew, thanks for the suggestion; but I read that voom transformed data is not applicable for WGCNA analysis. So if we assume I did the vsd normalization, do you suggest me to remove any samples ?

0
Entering edit mode

Where did you read that? - Voom transforms RNA seq counts to "microarray-like" data that should be fine with WGCNA. If you want to batch correct, and you want to use Limma's removeBatchEffect (which I think is best in your current situation), then try and get the data to a point that Limma expects - Voom is best in that regard. See point 4 here

0
Entering edit mode

Thanks a lot; I will implement it into my analysis immediately.

0
Entering edit mode

Hi Andrew,

Sorry to disturb but may I ask a few quick questions ? While transforming the count data with voom()

1 - For design I am using both "~ Batch + Type", is it the correct approach ?

2 - Should I use TMM-normalization before using voom ()

3 - Should directly input the raw count data to voom ()

4 - Even though I deleted the rows with zero, my voom plots do not look like the plots in this paper. Am I doing something wrong ?

5 - Do you think it is better if I do differential gene expression analysis also using Limma's commands?

I really appreciate if you give me some insight,