I have targeted RNA-seq data from 13 groups of mice (9 mice per group). RNA from each mouse was sequenced using 2 panels (panel 1 has about 3 times as many genes as panel 2). I now want to assess differential expression between groups. The problem is that the library size for each panel x each mouse is different. So, for one mouse, I get 200k reads from panel 1 and 40k from panel 2, while for another mouse, i get 200k reads from panel 1 and 90k reads from panel 2. This means that if I were to just
rbind the counts from each panel, the relative contribution from each panel would differ from mouse to mouse. (What's worse is that some groups were run later and the relative contributions in those groups are different.)
To get differential expression, I can run
limma+voom twice, once for each panel, then correct for multiple testing after combining outputs from both runs. Is there a way to run voom such that the normalization factors know about the difference between the two sets of rows?
I'm not certain this would improve differential expression assessment, but it would certainly make the code cleaner :)
Note: For plotting purposes, I've used the
rlog function from
DESeq2 on each panel's data separately and then simply used
rbind to concatenate the data. The two panels share 54 genes, 9 of which are designated "housekeeping genes"; if you look at the ratio of normalized expression for these duplicate genes, the distribution hovers tightly around +1.5%, indicating both that the technical agreement between panels is excellent and that this normalization method is sound.
> summary(as.numeric(bothPanelRatio)) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8316 0.9943 1.0168 1.0167 1.0395 1.2952
(NB: This calculation excluded genes that had zero reads for any sample, so you go from 54 to 44 shared genes.)