We have an RNA Seq dataset where PC1 primarily captures library size (54% of variance), even after TMM normalisation. Our proposed solution is to use RUV (
k=1) to add a single term (
W_1) to the design matrix. After fitting the model using
W_1 effectively captures PC1 which is pretty much library size, and which is exactly what we think we want.
Can someone please help me understand why this is a bad idea? Should I just add
log10(lib.size) or PC1 to the design matrix instead? The results are near identical.
(In case you're wondering why it's a bad idea? Reviewer #2 says so.)