I have my RNASeq counts and the various factors of my experimental setup (condition,strain,pool etc). I am interested to know which of these effects matter and how much so that I can make a decision of how to model them in GLM.
I have found this package called
PVCA which seems to show the proportion of variance explained by each factor and interaction of factors.
counts is my count table and
met is my metadata table, I use:
library(pvca) eset <- ExpressionSet(as.matrix(counts),new("AnnotatedDataFrame",data=met)) pvcaobj <- pvcaBatchAssess(eset, batch.factors=c("bias","diet","line"), threshold=0.6) df <- data.frame(label=as.character(pvcaobj$label),wmpv=round(as.numeric(pvcaobj$dat),2)
And this returns something like this
label wmpv 1 diet:line 0.04 2 bias:line 0.02 3 line 0.02 4 bias:diet 0.02 5 bias 0.02 6 diet 0.02 7 resid 0.86
So here are my questions.
Which dataset should I use as
counts? They all produce different results.
- raw filtered counts
- cpm transformed counts
- cpm log transformed counts
What does the
Are there any other such tools or methods to access batch effects?