Question: Quick way to Identify batch effect from covariates?
1
gravatar for James Ashmore
4.0 years ago by
James Ashmore2.9k
UK/Edinburgh/MRC Centre for Regenerative Medicine
James Ashmore2.9k wrote:

Given a SummarizedExperiment container, what is the quickest way to identify a batch effect from one of the covariates found within the DataFrame in the colData slot? Right now I am plotting the principal components and colouring the samples by each of the covariates. I then have to check the first few components for any separation and colour by the covariates to see which is responsible. I have a large number of libraries I have to check and was wondering if there was a Bioconductor package to perform this step? I've looked at svaseq and RUVseq but I can't see that they produce any QC plots which will tell me if an effect is present and which covariate is responsible?

ADD COMMENTlink modified 4.0 years ago by chris86330 • written 4.0 years ago by James Ashmore2.9k

I'm sure it can be done, but it's tricky with PCA since it doesn't tell you the size of the effect in absolute terms. For example, if you hand it 4 samples, (2x control 2x drug), and you get clustering not on control/drug but on batch1/batch2, it might just be that there's no effect due to the drug and a very small batch effect.

So my point is if you have a large number of libraries and you automate looking at a large number of PCAs, you can't say that experiment A had more/less batch effect than experiment B. Thus you can't quantify the batch effect of A in a meaningful way. All you can say is it has more/less of an effect than the treatment did. Conversely, that means if your treatment has a very strong effect, you can also have a lot more batch effect before it becomes apparent on the PCA.

The problem basically boils down to the fact that we can see batch effects, but we don't understand the dynamics behind what's causing it, and thus we can't quantify it or normalise it away (in a meaningful way). tl;dr, you're probably better off looking at the PCA's by eye, and judging for yourself if there's a meaningful batch effect or not, given what you know about the treatment/control/batches.

ADD REPLYlink written 4.0 years ago by John12k
2
gravatar for chris86
4.0 years ago by
chris86330
United Kingdom, London
chris86330 wrote:

Two methods are best used for analysis of batch effects.

  1. PCA with annotation - as you are doing, but relies on manual visual analysis
  2. PVCA - https://www.bioconductor.org/packages/release/bioc/html/pvca.html - this fits a mixed effects model to the principle components then looks at effects of various co variates, quantitatively. It is called principle variance components analysis.

Update: I highly recommend https://github.com/dswatson/bioplotr/blob/master/R/plot_drivers.R, this function it makes a great plot for examining for batch effects and more.

ADD COMMENTlink modified 12 months ago • written 4.0 years ago by chris86330

Looks promising, I'll give this a try, thank you.

ADD REPLYlink written 4.0 years ago by James Ashmore2.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1537 users visited in the last hour