Hi, we are currently doing RNAseq analysis and would like to remove batch effects. I would like to compare my data with previously published data, but to begin with, our data needs to have the batch effects removed. The design (designated as "samples") is as follows.
Sample RNA_extraction Experiment treatment
1 1 ours A
2 1 ours A
3 1 ours A
4 1 ours A
5 1 ours A
6 1 ours B
7 1 ours B
8 1 ours B
9 1 ours B
10 1 ours B
11 2 ours C
12 1 ours C
13 2 ours C
14 2 ours C
15 2 ours C
16 1 ours C
17 3 ref A
18 3 ref A
19 3 ref A
20 3 ref D
21 3 ref D
22 3 ref D
#two batches are included
group <- samples$treatment
design <- model.matrix(~ group)
batch <- samples$Experimenter
batch2 <- samples$RNA_extraction
rldData <- dat %>%
removeBatchEffect(batch = batch, batch2 = batch2,
design = design)
When I ran the code above, batch2 was ignored. I thought it was because both $RNA_extraction and $Experiment are different from the other for the previously reported data. So I set batch to samples$Experimenter only, and I could see the batch effect in my data remains in PCA (data attached below). Therefore, my question is, is it possible to first remove the batch effect in my data and then remove the batch effect which appears by comparison with previously reported data?
#tonly one batch is included
group <- samples$treatment
design <- model.matrix(~ group)
batch <- samples$Experimenter
rldData <- dat %>%
removeBatchEffect(batch = batch,
design = design)
Black:A, Blue:B, Pink: C, Green: D
Thank you in advance! (I'm not a native English speaker, so please forgive me if I'm not clear.)
Are these two completely independent datasets?
Yes, they are.