Question

Batch effect correction

2

Entering edit mode

6.0 years ago

goldie.ed ▴ 30

Hello, I'm very new to rna-seq analysis and I would love to get some help regarding batch effect in my data. Data is comprised of 40 individuals (blood samples) 3 conditions (control, infection 1, infection 2. each individual has 3 replicates for each conditions- overall 120 *3 samples) the preparation was done in 4 - 96 wells plates. each plate comprised of a single condition (+ leftovers from the previous condition) so for example - the first plate is only control. the second is the leftovers of the control and infection 1. and so on...

after running PCA I observed batch effect: PCA

as you can see there is a obvious plate-effect. I would like to correct between the same condition on different plates and so when i try using limma's removeBatcheffect I dont know how to define my batches nor my covariates.

Any help will be greatly appreciated. Thanks.

rna-seq • 9.1k views

ADD COMMENT • link 6.0 years ago by goldie.ed ▴ 30

0

Entering edit mode

Please see How to add images to a Biostars post

each plate comprised of a single condition

Of course, it's too late now, but a design like this is just not a good idea. Randomize your samples as much as possible.

ADD REPLY • link 6.0 years ago by WouterDeCoster 48k

0

Entering edit mode

Thank you very much for your detailed answer. After posting this question I figured a way to remove the batch effect but I wonder whether what I did is valid. I would be happy to get your opinion. I separated each condition from all plates and than I used limma's function to remove the batch effect and finally I joined all the fixed conditions together. i'll give an example- I separated all Lm samples from plate 3 and joined all the rest Lm samples (plate 4) thus creating a matrix comprising only Lm samples. I than used the removeBatchEffect function on this matrix using plate annotation (3 and 4) as 'batch' argument. I did this for all three conditions and joined the 3 matrices and ran PCA again. this is the result:

Is this valid?

ADD REPLY • link updated 6.0 years ago by GenoMax 151k • written 6.0 years ago by goldie.ed ▴ 30

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is for new answers to original question.

ADD REPLY • link 6.0 years ago by GenoMax 151k

0

Entering edit mode

Please, do read How to add images to a Biostars post. I added the image at your original question, and probably genomax added at your comment above.

The method you used above is not intended to be used for differential expression analysis, it is intended to be used with clustering, PCA, MDS, heatmaps, and other exploratory analyses - read the ?removeBatchEffect help page. The preferred method is to remove batch effects - especially when they are known, as in your case - is to include the batch effect in the model, then batch effects will be accounted for and remove when testing the other factors. If you search for batch edgeR or batch DESeq2, you will find plenty of posts discussing how to do this.

ADD REPLY • link 6.0 years ago by h.mon 35k

0

Entering edit mode

Thank you very much guys you have been a great help.

One more question - I heard loess normalization can also be used for dealing with batch effect. Can it be used in my situation?

thanks

ADD REPLY • link 6.0 years ago by goldie.ed ▴ 30

0

Entering edit mode

If you have a follow up question, please either add a comment to someone's answer or open a new question and reference this question in the new question. Adding an answer is not the right thing to do. I'm moving this "answer" to a comment on the top-level post now.

ADD REPLY • link 6.0 years ago by Ram 45k

score 5 · Answer 1 · 2019-05-29

5

Entering edit mode

6.0 years ago

predeus ★ 2.1k

Your condition and plate variables are confounded, but luckily not completely confounded, since your plates are 96-well.

First of all, make a condition file with three columns - donor, condition, and plate

use something like comBat (do DESeq2 rlog transformation, then run comBat to get a batch-corrected expression matrix)

after this, run a PCA and see if you succeeded in removal of most of this effect.

If it looks better after comBat, incorporate new design into DESeq2 and do diff expression with it. R code would look something like this:

dds <- DESeqDataSetFromMatrix(countData=round(exp,0),colData=cond,design = ~ Batch + Condition)
rlog <- assay(rlog(dds,blind=T))
mod <- model.matrix(~ Condition, cond)
cbat <- ComBat(rlog, batch=cond$Batch, mod=mod)

ADD COMMENT • link 6.0 years ago by predeus ★ 2.1k

0

Entering edit mode

Hi, I did not understand what do you mean by incorporate new design into DESeq2 after I removed the batch effect. Can I use the new batch corrected matrix as input for DESeq2? from what I read DESeq2 input can be only original count matrix and it cannot be given log-transformed matrix.

Is it valid to 1.log transform the data 2.remove batch effect using RemoveBatchEffect or ComBat 3. re- transform the data to original counts 4. use this count matrix with DESeq2?

UPDATE I just found out that after modelling the batch in to the design - and run DESeq function it removed the batch effect like magic.

thanks

ADD REPLY • link 5.9 years ago by goldie.ed ▴ 30

0

Entering edit mode

Yes, I meant you incorporate your "batch" variable into Deseq2 formula, and use NON-batch corrected data. Authors of comBat acknowledged that it might be over-correcting it that way.

ADD REPLY • link 5.9 years ago by predeus ★ 2.1k

score 2 · Answer 2 · 2019-05-29

I would recommend visualization before and after your correction, to see if there could be over-fitting.

With what you are showing me, it looks like batch-centered expression might also be a useful visualization.

For differential expression, I would actually recommend using two variables (as described above "Condition + Batch"), and then use the two visualization results to asses your results with multiple methods tested for your project (I would at least test edgeR, DESeq2, and limma-voom).