I am working with RNA seq data which contains a date batch effect because an error was done in size selection before sequencing.
Data with no normalization:
I have been looking to correct this using the RUVseq approach. The most important effect is seen between the wrongly size filtered (red) and the normal sequences (green) (ideally they should overlap just like the 2013.08.08 purple technical replicates).
I first removed unwanted variance with RUVs by specifying these "strange technical" replicates and then removed the unwanted variance with empirical 5000 non DE transcripts (RUVg). I did this second normalisation because the RLE plots weren't similar after the RUVs normalisation. This didn't give good RLE plots so I abandoned that idea.
When I did this double normalisation starting with RUVg and then normalising with RUVs, this gave very satisfactory PCA and RLE plots (maybe to good as this really is how I expect my biological model I am using should behave).
PCA and p-values when no technical replicates are combined
When I do a DE transcript detection between my species with EdgeR without combining the technical replicates, the p-value distribution is more or less ok. Although it looks ok, it is wrong because there is pseudo-replication because the replicate counts weren't combined.
When I combine those counts by adding them together, everything comes crumbling down. The p-value distribution looks very strange, as if it was depleted in significant values.
p-values after combining the technical replicates:
I also simply removed those strange replicates and normalised with RUVg. This looks ok but not as nice as the more complicated way I have tried to explain above. I also realise that my approach is probably a bit flawed because I sort of know what to expect and I am trying to get to that ...
I need enlightenment, am I doing multiple testing somewhere, is it right to do these two normalisations and how come it doesn't give the same results when done in one or the other way (i.e not commutative I think would be the term).
Thank you all for the time in considering this issue, I realise it it kind of a vast question.