Given a SummarizedExperiment container, what is the quickest way to identify a batch effect from one of the covariates found within the DataFrame in the colData slot? Right now I am plotting the principal components and colouring the samples by each of the covariates. I then have to check the first few components for any separation and colour by the covariates to see which is responsible. I have a large number of libraries I have to check and was wondering if there was a Bioconductor package to perform this step? I've looked at svaseq and RUVseq but I can't see that they produce any QC plots which will tell me if an effect is present and which covariate is responsible?
I'm sure it can be done, but it's tricky with PCA since it doesn't tell you the size of the effect in absolute terms. For example, if you hand it 4 samples, (2x control 2x drug), and you get clustering not on control/drug but on batch1/batch2, it might just be that there's no effect due to the drug and a very small batch effect.
So my point is if you have a large number of libraries and you automate looking at a large number of PCAs, you can't say that experiment A had more/less batch effect than experiment B. Thus you can't quantify the batch effect of A in a meaningful way. All you can say is it has more/less of an effect than the treatment did. Conversely, that means if your treatment has a very strong effect, you can also have a lot more batch effect before it becomes apparent on the PCA.
The problem basically boils down to the fact that we can see batch effects, but we don't understand the dynamics behind what's causing it, and thus we can't quantify it or normalise it away (in a meaningful way). tl;dr, you're probably better off looking at the PCA's by eye, and judging for yourself if there's a meaningful batch effect or not, given what you know about the treatment/control/batches.