Question

Assessing controls for RNASeq

0

Entering edit mode

3.7 years ago

lockdowndog • 0

I have RNASeq datasets composed of multiple treatments vs multiple control batches (i.e., each treatment has its own control). All samples however come from the same parent cell line, hence I believe I should be able to use the controls from other treatments -- in order to expand my n for controls and increase the robustness of the treatment vs control DE analysis. I have checked each control sample and they are wt (by calling SNPs) -- the treatments here are CRISPR introduction of mutations into the parent cell line. What is a good way to additionally check that all these controls could indeed by grouped together, besides exploratory PCA (checking to see that treatment vs control is on PC1 and not the different control batches)? How about doing DGE analysis only on controls and checking to see that the most variable genes there are not the genes identified in treatment vs control analysis? Any other checks?

RNA-Seq rna-seq gene expression DESeq • 1.2k views

ADD COMMENT • link updated 3.7 years ago by i.sudbery 19k • written 3.7 years ago by lockdowndog • 0

0

Entering edit mode

When doing differential expression, you could include batch as an additional variable in your regression model ~ condition + batch. The regression model will at the same time help to control for batch effect, and also let you test which genes are differential expressed between batches.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

score 1 · Answer 1 · 2020-08-12

1

Entering edit mode

3.7 years ago

i.sudbery 19k

The problem with conducting DE tests of our control samples against each other, is that there will undoubtedly be SOME differences between them. But how many is too many? If its only 3 genes, is that too many, what if its 30? Or 300?

Probably the best solution here is instead of just using the pooled controls for a series of many DE analyses, instead encode the whole thing as a single model with one factor for treatment and one factor for "batch", where a batch refers to a pair of treatment and control, so for two treatments with 2 reps your design table might look like:

 Sample     Treatment     Batch
   1        Control       batch1
   2        Control       batch1
   3        Treat1        batch1  
   4        Treat1        batch1    
   5        Control       batch2  
   6        Control       batch2    
   7        Treat2        batch2
   8        Treat2        batch2

And the design formula ~ 0 + Batch + Treatment. The model will then attempt to correct for any pair specific differences (that is effects that are shared between control and treatment for any particular pair, but not with other pairs).

ADD COMMENT • link 3.7 years ago by i.sudbery 19k

0

Entering edit mode

I think this is the eventual goal, to conduct signature discovery using treatments vs controls, and correcting for batches. But the question is whether to do a pre-analysis step of asking whether lumping controls together makes sense and creates a signal beyond the treatment vs control signal (and ideally it doesnt)

ADD REPLY • link 3.7 years ago by lockdowndog • 0

0

Entering edit mode

Since you are adding batch to the regression formula, you can use a contrast to test which genes change between certain batches. That will probably be one of the better indicators of batch effect.

ADD REPLY • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Even better might be to use an LRT test to test if genes vary between any batches, rather than look at certain batches.

ADD REPLY • link 3.7 years ago by i.sudbery 19k

0

Entering edit mode

The point of adding the batch to the regression like this, is that if the batch does create a signal, it will be removed.

ADD REPLY • link 3.7 years ago by i.sudbery 19k