Hello you all,
I have RNA sequencing data from two different runs. In the first batch I have samples from 3 groups (A,B,C) and in the second one I have samples from 3 groups (A,C,D). My PCA data shows samples clustering by group but then separated by batch (and the batch effect is much stronger than group differences).
So you get an idea, this is the distribution of the samples
group<-factor(c(rep("A",each=12),rep("B",7),rep("C",9),"A",rep("C",6),rep("D",16))) batch<-factor(c(rep(1,28),rep(2,23))) design <- model.matrix(~0 + group + batch)
I have two main questions, first, can I find a vector/value/something using samples from group A and C in both batches and then use this to compensate for the batch effect of all samples in batch 2 (A, C and D)?
And second, I have come up with a signature of different genes to discriminate between A, B and C in batch one, so I need to compensate for batch effect only in Batch 2 to validate the initial signature.
Is all of this possible? I have been playing with limma package but I am not succeeding.
Thank you very much in advance,