Hello, I'm very new to rna-seq analysis and I would love to get some help regarding batch effect in my data. Data is comprised of 40 individuals (blood samples) 3 conditions (control, infection 1, infection 2. each individual has 3 replicates for each conditions- overall 120 *3 samples) the preparation was done in 4 - 96 wells plates. each plate comprised of a single condition (+ leftovers from the previous condition) so for example - the first plate is only control. the second is the leftovers of the control and infection 1. and so on...
after running PCA I observed batch effect:
as you can see there is a obvious plate-effect. I would like to correct between the same condition on different plates and so when i try using limma's removeBatcheffect I dont know how to define my batches nor my covariates.
Any help will be greatly appreciated. Thanks.
Please see How to add images to a Biostars post
Of course, it's too late now, but a design like this is just not a good idea. Randomize your samples as much as possible.
Thank you very much for your detailed answer. After posting this question I figured a way to remove the batch effect but I wonder whether what I did is valid. I would be happy to get your opinion. I separated each condition from all plates and than I used limma's function to remove the batch effect and finally I joined all the fixed conditions together. i'll give an example- I separated all Lm samples from plate 3 and joined all the rest Lm samples (plate 4) thus creating a matrix comprising only Lm samples. I than used the removeBatchEffect function on this matrix using plate annotation (3 and 4) as 'batch' argument. I did this for all three conditions and joined the 3 matrices and ran PCA again. this is the result:
Is this valid?
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original question.Please, do read How to add images to a Biostars post. I added the image at your original question, and probably genomax added at your comment above.
The method you used above is not intended to be used for differential expression analysis, it is intended to be used with clustering, PCA, MDS, heatmaps, and other exploratory analyses - read the
?removeBatchEffect
help page. The preferred method is to remove batch effects - especially when they are known, as in your case - is to include the batch effect in the model, then batch effects will be accounted for and remove when testing the other factors. If you search forbatch edgeR
orbatch DESeq2
, you will find plenty of posts discussing how to do this.Thank you very much guys you have been a great help.
One more question - I heard loess normalization can also be used for dealing with batch effect. Can it be used in my situation?
thanks
If you have a follow up question, please either add a comment to someone's answer or open a new question and reference this question in the new question. Adding an answer is not the right thing to do. I'm moving this "answer" to a comment on the top-level post now.