RNA-seq unbalanced batch effect correction
Entering edit mode
7.1 years ago


I'm new to RNA-seq data analysis and i need help to remove some batch effect from my RNA-seq data. First, my data structure is a little complex, similar to some extent to the example in the edgeR manual in 3.5.

In my case, I have 12 RNA samples collected from 6 subjects. The samples correspond to 3 subjects treated with one treatment (treatment1) and another 3 subject treated with another treatment (treatment2) and in both cases RNA samples were extracted before and after treatment. The idea is to check the effect of the treatments respect their initial state and the differences between treatments counting the paired structure of the data. The problem is that I have a batch effect result of using a different rna extraction technique (although both methods with good quality samples, with a RIN superior to 8) (checked the batch in a multidimensional scaling plot). The frame looks like this:

Treatment      Patient     Time       batch
Treatment1     1           before     1
Treatment1     1           after      1
Treatment1     2           before     1
Treatment1     2           after      1
Treatment1     3           before     1
Treatment1     3           after      1
Treatment2     4           before     1
Treatment2     4           after      2
Treatment2     5           before     1
Treatment2     5           after      2
Treatment2     6           before     1
Treatment2     6           after      2

The problem is that my batch effect is totally unbalanced (it's only present in the treatment2 group after treatment) and I'm not sure if it's possible to apply some batch effect correction to try to use that data, because there are not samples from that group with the other method to estimate adequately the effect.

I was trying to built a model in edgeR like this:

design <- model.matrix(~Treatment+Treatment:Patient+Treatment:Time, data=Data)

But it's impossible to add the batch variable to the model because is the same as the Treatment2:Time_after column of the design matrix. Because of that, to add the batch as a covariate to the model is not an option. I would like to know if in limma voom with the duplicateCorrelation function it's possible to correct adequately the batch or i'm going to have the same problem.

Searching other ways to correct the batch effect, I found that it's possible to first try to estimate the batch effect and remove it, creating a "batch effect free" data, and then perform the analysis of the adjusted data without further consideration of the batch effect.

In sort, I don't know if it's possible to add to the model or to estimate the batch effect in other ways with the data structure that i have, because i understand that i need samples from the Treatment2:Time_after group with the other extraction method (not possible to obtain that samples) to estimate adequately the effect of the extraction method and not confuse it with the effect that treatment2 has.

I accept any suggestion or opinion about the experiment.


limma RNA-Seq edgeR batch-effect • 2.8k views
Entering edit mode

Yeah it's a poor design, I agree. I don't think you can use duplicateCorrelation from limma, that was meant for technical replicates (spots on arrays, or technical replicates for RNAseq if available).

Entering edit mode

we have a sample from treatment1:time_after (from the other treatment) that was extracted and sequenced with both methods, and it can be seen in the multidimensional scaling plot that the sample group correctly when it changes the extraction method (for the other samples we couldn't do the same because there wasn't enough material for it).

could be possible to use in some way the difference found in that sample with both extraction methods to correct the batch effect in the other 3 samples? Suppossing that the batch effect acts similarly in both treatments.


Login before adding your answer.

Traffic: 1595 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6