Question: Batch Effect Removal in RNA-Seq Data for DGE & WGCNA. Should I omit any samples according to PCA plots?
0
gravatar for gokce.ouz
2.6 years ago by
gokce.ouz40
Singapore
gokce.ouz40 wrote:

Hi All,

I have 2 experimental condition and each condition has 18 samples. However, they are sequenced in 3 batches. 1 st batch 6 pair (disease & control), 2nd batch 6 pair (disease & control), 3rd batch 6 pair (disease & control).

Each batch done by different people over different times so I believe we have batch effect. As I am using DESeq2 for my analysis I set my model design as “Batch+Type”. I used removeBatchEffect from limma package and plotted PCA & did hierarchical clustering using spermann correlation to see which samples are outliers.(I attached the graphs below). I also used an alternative PCA &plotting (prcomp() & fviz_pca_ind) which actually confused me. The ggplot & fviz_pca_ind graphs look different from each other. Which one should I trust?

Another issue I am facing is 1st & 2nd batches are 76 bp paired end, however the last batch is 101 bp paired end.I did not add this into the design separately. Should I add it to the model design ? If so, how should I desing ? or Should I trim all fastq files to 76 bp to have same size ? What do you suggest?

I would like to do differential gene expression analysis and WGCA. So Should I remove any samples according to these PCA plots or just do analysis as I already included batch effect into the model design as "~Batch+Type"

I really appreciate if you give me some insight,

Thanks in advance,

DESeq2

ddsMat.pre<- DESeqDataSetFromMatrix(seMat.pre, sample.table.ERSvsPT.pre, design=~Batch+Type)

dds.pre<-DESeq(ddsMat.pre)

dds.pre <- dds.pre[ rowSums(counts(dds.pre)) > 1, ]

Normalization

rld.batch <- rlog(dds.pre, blind= FALSE)

rlogMat.batch <- assay(rld.batch)

Removing batch effect

design=model.matrix(~sample.table.ERSvsPT.pre$Type)

removedbatch=removeBatchEffect(rlogMat.batch, batch=sample.table.ERSvsPT.pre$Batch, design= design )

rld.removed<-rld.batch

assay(rld.removed)<-removedbatch

Plotting the PCA

before removing batch effect:

data.batch<-plotPCA(rld.batch, intgroup=c("Batch","Type"), returnData=TRUE)

data.batch$name<-colData(rld.batch)$Patient

percentVar.batch <- round(100 * attr(data.batch, "percentVar"))

ggplot(data.batch, aes(PC1, PC2, color=Batch, shape=Type, label= name)) + geom_text(size=6,hjust = 0, nudge_x = 0.05) + geom_point(size=2) + xlab(paste0("PC1: ",percentVar.batch[1],"% variance")) + ylab(paste0("PC2: ",percentVar.batch[2],"% variance")) + ggtitle("PCA of RLD_Type+Batch ") + coord_fixed()

after removing batch effect:

data.removed<-plotPCA(rld.removed, intgroup=c("Batch","Type"), returnData=TRUE)

data.removed$name<-colData(rld.removed)$Patient

percentVar.removed <- round(100 * attr(data.removed, "percentVar"))

ggplot(data.removed, aes(PC1, PC2, color=Batch, shape=Type, label= name)) + geom_text(size=6,hjust = 0, nudge_x = 0.05) + geom_point(size=3) + xlab(paste0("PC1: ",percentVar.removed[1],"% variance")) + ylab(paste0("PC2: ",percentVar.removed[2],"% variance")) + ggtitle("PCA after removeBatchEffect ") + coord_fixed()

Alternative PCA method : prcomp() & fviz_pca_ind

before.pca <- prcomp(t(rlogMat.batch), scale=T)

after.pca <- prcomp(t(removedbatch), scale=T)

quali.sup <- as.factor(colData(rld.readlength)$Type)

fviz_pca_ind(before.pca,habillage = quali.sup, addEllipses = TRUE, ellipse.level = 0.75) + theme_minimal()

fviz_pca_ind(after.pca,habillage = quali.sup, addEllipses = TRUE, ellipse.level = 0.68) + theme_minimal()

Before removing the batch effect

After removing the batch effect

Alternative PCA plot for Before Batch effect removal

Alternative PCA plot for After Batch effect removal

batch effect rna-seq deseq2 R • 2.5k views
ADD COMMENTlink modified 2.6 years ago by Carlo Yague4.4k • written 2.6 years ago by gokce.ouz40
1
gravatar for Carlo Yague
2.6 years ago by
Carlo Yague4.4k
Belgium
Carlo Yague4.4k wrote:

The ggplot & fviz_pca_ind graphs look different from each other. Which one should I trust?

Those are different graphs (biplot vs graph of indidividuals) so one can expect different output. The first representation is used more often as far as I know and might be a safer pick.

Another issue I am facing is 1st & 2nd batches are 76 bp paired end, however the last batch is 101 bp paired end.I did not add this into the design separately. Should I add it to the model design ? If so, how should I desing ? or Should I trim all fastq files to 76 bp to have same size ? What do you suggest?

I don't think it is necessary to add this in the design since it is already taken into account in the batch factor. You definitely should NOT trim the 101 bp-reads fastq files.

I would like to do differential gene expression analysis and WGCA. So Should I remove any samples according to these PCA plots or just do analysis as I already included batch effect into the model design as "~Batch+Type"

For DEG analysis in DESeq, there is no need to explicitely remove the batch effect because it is already taken into account, as you said. For WGCA I don't know, I never did that kind of analysis.

ADD COMMENTlink written 2.6 years ago by Carlo Yague4.4k

I really appreciate your insights. Thanks a lot Carlos.
So I do not need to remove the P5 &P6 which belong to the left hand side group in first two graphs.

ADD REPLYlink written 2.6 years ago by gokce.ouz40
1

I don't see a reason to remove them. Don't be too scared by the vertical distance between P5/P6 and the others points : the y-axis represents only 8% of total variance.

ADD REPLYlink written 2.6 years ago by Carlo Yague4.4k

In general for WGCNA you should adjust for batch effects prior to performing the analysis (input should be a matrix containing normalized expression). In addition it is recommended to remove low count "genes" and apply a variance stabilizing transformation when using RNAseq as input (from WGCNA FAQ page: https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/faq.html).

ADD REPLYlink written 2.6 years ago by Marge280
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 933 users visited in the last hour