Question

Working with PVCA

0

Entering edit mode

7.0 years ago

firestar ★ 1.6k

I have my RNASeq counts and the various factors of my experimental setup (condition,strain,pool etc). I am interested to know which of these effects matter and how much so that I can make a decision of how to model them in GLM.

I have found this package called PVCA which seems to show the proportion of variance explained by each factor and interaction of factors.

If counts is my count table and met is my metadata table, I use:

library(pvca)
eset <- ExpressionSet(as.matrix(counts),new("AnnotatedDataFrame",data=met))
pvcaobj <- pvcaBatchAssess(eset, batch.factors=c("bias","diet","line"), threshold=0.6)
df <- data.frame(label=as.character(pvcaobj$label),wmpv=round(as.numeric(pvcaobj$dat),2)

And this returns something like this

      label wmpv
1 diet:line  0.04
2 bias:line  0.02
3      line  0.02
4 bias:diet  0.02
5      bias  0.02
6      diet  0.02
7     resid  0.86

So here are my questions.

Which dataset should I use as counts? They all produce different results.

raw filtered counts
cpm transformed counts
cpm log transformed counts

What does the threshold=0.6 in pvcaBatchAccess() do?

Are there any other such tools or methods to access batch effects?

RNA-Seq rna-seq PVCA DGE • 2.3k views

ADD COMMENT • link 7.0 years ago by firestar ★ 1.6k