I'm new to RNA-seq data analysis and i need help to remove some batch effect from my RNA-seq data. First, my data structure is a little complex, similar to some extent to the example in the edgeR manual in 3.5.
In my case, I have 12 RNA samples collected from 6 subjects. The samples correspond to 3 subjects treated with one treatment (treatment1) and another 3 subject treated with another treatment (treatment2) and in both cases RNA samples were extracted before and after treatment. The idea is to check the effect of the treatments respect their initial state and the differences between treatments counting the paired structure of the data. The problem is that I have a batch effect result of using a different rna extraction technique (although both methods with good quality samples, with a RIN superior to 8) (checked the batch in a multidimensional scaling plot). The frame looks like this:
Treatment Patient Time batch Treatment1 1 before 1 Treatment1 1 after 1 Treatment1 2 before 1 Treatment1 2 after 1 Treatment1 3 before 1 Treatment1 3 after 1 Treatment2 4 before 1 Treatment2 4 after 2 Treatment2 5 before 1 Treatment2 5 after 2 Treatment2 6 before 1 Treatment2 6 after 2
The problem is that my batch effect is totally unbalanced (it's only present in the treatment2 group after treatment) and I'm not sure if it's possible to apply some batch effect correction to try to use that data, because there are not samples from that group with the other method to estimate adequately the effect.
I was trying to built a model in edgeR like this:
design <- model.matrix(~Treatment+Treatment:Patient+Treatment:Time, data=Data)
But it's impossible to add the batch variable to the model because is the same as the Treatment2:Time_after column of the design matrix. Because of that, to add the batch as a covariate to the model is not an option. I would like to know if in limma voom with the duplicateCorrelation function it's possible to correct adequately the batch or i'm going to have the same problem.
Searching other ways to correct the batch effect, I found that it's possible to first try to estimate the batch effect and remove it, creating a "batch effect free" data, and then perform the analysis of the adjusted data without further consideration of the batch effect.
In sort, I don't know if it's possible to add to the model or to estimate the batch effect in other ways with the data structure that i have, because i understand that i need samples from the Treatment2:Time_after group with the other extraction method (not possible to obtain that samples) to estimate adequately the effect of the extraction method and not confuse it with the effect that treatment2 has.
I accept any suggestion or opinion about the experiment.