Question: Pre-processing of RNA-Seq data for WGCNA
0
gravatar for gokce.ouz
2.9 years ago by
gokce.ouz50
Singapore
gokce.ouz50 wrote:

Hi All,

I have 2 experimental condition and each condition has 18 samples. However, they are sequenced in 3 batches. 1 st batch 6 pair (disease & control), 2nd batch 6 pair (disease & control), 3rd batch 6 pair (disease & control).

Each batch done by different people over different times so I believe we have batch effect. As I am using DESeq2 for my analysis I set my model design as “Batch+Type”. I used removeBatchEffect from limma package and plotted PCA & did hierarchical clustering using spermann correlation (I believe it is better to use Spearman rather than the Pearson??) to see which samples are outliers. Another issue I am facing is 1st & 2nd batches are 76 bp paired end, however the last batch is 101 bp paired end.I did not add this into the design separately.

I would like to do differential gene expression analysis and WGCA. So I have multiple questions:

1- Should I use rlog() for differential gene expression analysis and getVarianceStabilizedData() for WGCNA? or can I use same normalization for both? If yes, which one do you suggest ?

2- Should I remove any samples according to these PCA plots & dendogram or just do analysis as I already included batch effect into the model design as "~Batch+Type". Or is it like for DGE analysis I do not to remove any sample but for WGCNA I should remove ? For both normalization type it looks like I have to remove the Patient-Sph4 as it look has distance from others in PCA plot also (the one at the middletop)

I really appreciate if you give me some insight,

Thanks in advance,

ddsMat.pre<- DESeqDataSetFromMatrix(seMat.pre, sample.table.ERSvsPT.pre, design=~Batch+Type)
dds.pre<-DESeq(ddsMat.pre)
dds.pre <- dds.pre[ rowSums(counts(dds.pre)) > 1, ]
datExpr0<- assay(dds.pre)
gsg = goodSamplesGenes(datExpr0, verbose = 5);
gsg$allOK

RLD

rld <- rlog(dds.pre, blind= FALSE)
rlogMat <- assay(rld)
save(rld, file="RLD_Pre_Type+Batch")

design=model.matrix(~sample.table.ERSvsPT.pre$Type)
removedbatch.rld=removeBatchEffect(rlogMat, batch=sample.table.ERSvsPT.pre$Batch, design= design )
write.csv(removedbatch.rld, file="removedbatch.rld_36patient")
rld.removed<-rld
assay(rld.removed)<-removedbatch.rld

par(mfrow = c(1,2));
cex1 = 1.5;
plotPCA(rld.batch, intgroup=c("Type", "Batch"))
plotPCA(rld.removed, intgroup=c("Type", "Batch"))

colData(rld.removed)$Type<-as.factor(colData(rld.removed)$Type)
colData(rld.removed)$Batch<-as.factor(colData(rld.removed)$Batch)
data.removed<-plotPCA(rld.removed, intgroup=c("Batch","Type"), returnData=TRUE)
data.removed$name<-colData(rld.removed)$Patient
percentVar.removed <- round(100 * attr(data.removed, "percentVar"))
p2<-ggplot(data.removed, aes(PC1, PC2, color=Batch, shape=Type, label= name)) + geom_text(size=5,hjust = 0, nudge_x = 0.05) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar.removed[1],"% variance")) +
  ylab(paste0("PC2: ",percentVar.removed[2],"% variance")) + ggtitle("PCA of RLD_pre_Type+Batch ") + coord_fixed()

mat.rld.removed<- t(assay(rld.removed))
rownames(mat.rld.removed) <- paste( rld.removed$Written.ID)
plot(hcluster(dist(mat.rld.removed), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of  RLD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of RLD_Pre_Type+Batch.png")
dev.off()

VSD

vsd = getVarianceStabilizedData(dds.pre)
save(vsd, file="VSD_Pre_Type+Batch")

design=model.matrix(~sample.table.ERSvsPT.pre$Type)
vsd.removed=removeBatchEffect(vsd, batch=sample.table.ERSvsPT.pre$Batch, design= design )

colnames(vsd.removed) <-  rld.removed$Written.ID
plot(hcluster(dist(t(vsd.removed)), method= "spearman", link="average" ), cex=1.5, cex.main=2, main= "Hierarchical Clustering of VSD Normalized Count Data ")
dev.copy(png, "Sample clustering (hcluster) of vsd_Pre_Type+Batch.png")
dev.off()

PCA plot after rld() &removebatcheffect() Hierarchical Clustering Dendogram after rld() &removebatcheffect() Hierarchical Clustering Dendogram after vsd() &removebatcheffect()

ADD COMMENTlink written 2.9 years ago by gokce.ouz50

What type of data did you feed into removeBatchEffect? - Normalised counts, rlog transformed, vst transformed, voom'd?

ADD REPLYlink written 2.9 years ago by andrew.j.skelton735.8k

I tried both rlog &vsd. The 1st dendogram = the rlog transformed and batch removed, 2nd dendogram from the vsd transformed and batch removed.

ADD REPLYlink written 2.9 years ago by gokce.ouz50

Try Voom Transforming - That way you know that the removeBatchEffect function is getting what it expects.

ADD REPLYlink written 2.9 years ago by andrew.j.skelton735.8k

Hi Andrew, thanks for the suggestion; but I read that voom transformed data is not applicable for WGCNA analysis. So if we assume I did the vsd normalization, do you suggest me to remove any samples ?

ADD REPLYlink written 2.9 years ago by gokce.ouz50

Where did you read that? - Voom transforms RNA seq counts to "microarray-like" data that should be fine with WGCNA. If you want to batch correct, and you want to use Limma's removeBatchEffect (which I think is best in your current situation), then try and get the data to a point that Limma expects - Voom is best in that regard. See point 4 here

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by andrew.j.skelton735.8k

Thanks a lot; I will implement it into my analysis immediately.

ADD REPLYlink written 2.9 years ago by gokce.ouz50

Hi Andrew,

Sorry to disturb but may I ask a few quick questions ? While transforming the count data with voom()

1 - For design I am using both "~ Batch + Type", is it the correct approach ?

2 - Should I use TMM-normalization before using voom ()

voomplot after TMM normalization Hierarchical clustering after removedBatchEffect

3 - Should directly input the raw count data to voom () voom plot Hierarchical clustering after removedBatchEffect

4 - Even though I deleted the rows with zero, my voom plots do not look like the plots in this paper. Am I doing something wrong ?

5 - Do you think it is better if I do differential gene expression analysis also using Limma's commands?

I really appreciate if you give me some insight,

Thanks in advance,

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by gokce.ouz50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1508 users visited in the last hour