Question: Assessing controls for RNASeq
gravatar for lockdowndog
5 weeks ago by
lockdowndog0 wrote:

I have RNASeq datasets composed of multiple treatments vs multiple control batches (i.e., each treatment has its own control). All samples however come from the same parent cell line, hence I believe I should be able to use the controls from other treatments -- in order to expand my n for controls and increase the robustness of the treatment vs control DE analysis. I have checked each control sample and they are wt (by calling SNPs) -- the treatments here are CRISPR introduction of mutations into the parent cell line. What is a good way to additionally check that all these controls could indeed by grouped together, besides exploratory PCA (checking to see that treatment vs control is on PC1 and not the different control batches)? How about doing DGE analysis only on controls and checking to see that the most variable genes there are not the genes identified in treatment vs control analysis? Any other checks?

rna-seq deseq gene expression • 139 views
ADD COMMENTlink modified 5 weeks ago by i.sudbery9.1k • written 5 weeks ago by lockdowndog0

When doing differential expression, you could include batch as an additional variable in your regression model ~ condition + batch. The regression model will at the same time help to control for batch effect, and also let you test which genes are differential expressed between batches.

ADD REPLYlink written 5 weeks ago by rpolicastro1.7k
gravatar for i.sudbery
5 weeks ago by
Sheffield, UK
i.sudbery9.1k wrote:

The problem with conducting DE tests of our control samples against each other, is that there will undoubtedly be SOME differences between them. But how many is too many? If its only 3 genes, is that too many, what if its 30? Or 300?

Probably the best solution here is instead of just using the pooled controls for a series of many DE analyses, instead encode the whole thing as a single model with one factor for treatment and one factor for "batch", where a batch refers to a pair of treatment and control, so for two treatments with 2 reps your design table might look like:

 Sample     Treatment     Batch
   1        Control       batch1
   2        Control       batch1
   3        Treat1        batch1  
   4        Treat1        batch1    
   5        Control       batch2  
   6        Control       batch2    
   7        Treat2        batch2
   8        Treat2        batch2

And the design formula ~ 0 + Batch + Treatment. The model will then attempt to correct for any pair specific differences (that is effects that are shared between control and treatment for any particular pair, but not with other pairs).

ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by i.sudbery9.1k

I think this is the eventual goal, to conduct signature discovery using treatments vs controls, and correcting for batches. But the question is whether to do a pre-analysis step of asking whether lumping controls together makes sense and creates a signal beyond the treatment vs control signal (and ideally it doesnt)

ADD REPLYlink written 5 weeks ago by lockdowndog0

Since you are adding batch to the regression formula, you can use a contrast to test which genes change between certain batches. That will probably be one of the better indicators of batch effect.

ADD REPLYlink written 5 weeks ago by rpolicastro1.7k

Even better might be to use an LRT test to test if genes vary between any batches, rather than look at certain batches.

ADD REPLYlink written 5 weeks ago by i.sudbery9.1k

The point of adding the batch to the regression like this, is that if the batch does create a signal, it will be removed.

ADD REPLYlink written 5 weeks ago by i.sudbery9.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1115 users visited in the last hour