Question: RNA-Seq data, batch effect source
0
gravatar for lu.ne
20 months ago by
lu.ne50
lu.ne50 wrote:

Hi All,

I am currently trying to normalise some RNA-Seq data. Indeed, samples came from two different batches and when plotting the values using a PCA plot the separation is clearly marked.

It seems that the protocol used for samples processing is the same as well as the lab where the analyses were performed.

I tried several normalisations including limma's removeBatchEffect function, housekeeping genes normalisation using RUVg or adding the batch effect as part of a model but either the separation is still here (using removeBatchEffect) or it seems completely random (moreover the idea of using housekeeping genes for normalisation seems quite a subject of controverse).

Before trying any more things to normalise this dataset I would like to know where it comes from (or at least determine if it is possible to identify the reason or not) in order to select the best normalisation methods. To do so I fitted a model (using limma in R), used the batch effect as a control/treatment situation and extracted significant GO terms related to the difference between batches (using the gage function). I obtained terms related to either antigenes or viral processes.

I have two questions, does this result mean anything in this situation, could it point to a specific issue? and, is this a suitable method to identify the source of difference?

Thank you,

rna-seq • 785 views
ADD COMMENTlink modified 20 months ago by Devon Ryan89k • written 20 months ago by lu.ne50
4
gravatar for Devon Ryan
20 months ago by
Devon Ryan89k
Freiburg, Germany
Devon Ryan89k wrote:

Don't bother trying to interpret GO results from something like this, you can get a batch effect just from preparing the same thing on different days if you let the tubes warm up a bit more/less between the days. If you want to check if the batch is being driven by a couple genes (so you can exclude them) then just look at the projections from prcomp() in R. More likely than not, you have a bunch of genes all contributing a little to this, since what you're seeing is some combination of length and GC bias between the batches (plus other things, likes). You might have a look at the CQN package if this ends up being GC-bias based.

ADD COMMENTlink written 20 months ago by Devon Ryan89k

Ok, makes sense, thanks for the answer, it sure gives me ideas for what to do next!

ADD REPLYlink written 20 months ago by lu.ne50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 799 users visited in the last hour