Question: How to use Principle Component Analysis to find batch effects?
gravatar for tolgaturant
3.5 years ago by
tolgaturant20 wrote:

I am going to profile a clinical RNA-seq study with 51 samples for differentially expressed genes. As described in limma-voom vignettes,I have created a DGEList object:

y1<-DGEList(counts=assays(summarizedExperiment1)$counts, genes=annotations1)


Then to explore the clustering of the samples, I have created PCA plots

plotMDS(y2, labels=resp, top=50, col=ifelse(resp=="N", "red", "blue"), gene.selection="common", prior.count=5)

Graph of First to PCs with response group

There is a clear separation of samples over PC1 but I don't know the attribute that correlates with it. Should I create an attribute, as batch_1 for the 2 groups on either side of PC1 and create a model.matrix as:


or should I just model the comparison I am interested in:


Any suggestion would be appreciated.


voom rna-seq limma pca • 1.6k views
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by tolgaturant20

Mmmh, In principle adding a batch term would be the way to go.

But are you sure it's a batch effect (let's say something technical) and not something biological that you would want to look at and understand rather than discarding? Just asking since you say that in fact you don't know where the separation is coming from, and I would want to understand what I am about to throw out.

ADD REPLYlink written 3.5 years ago by Marge280

Thank you for your answer. I agree that separation over PCs might as well be biological. But there can also be a technical effect that we don't know. I guess one cannot know without additional info. So I ended up processing the study study as is.

ADD REPLYlink written 3.1 years ago by tolgaturant20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 981 users visited in the last hour