Question: Bug in RUV spike-in normalization?
0
SmallChess510 wrote:

RUV is a normalization algorithm for ERCC spike-in. In the paper http://www.nature.com/nbt/journal/v32/n9/pdf/nbt.2931.pd, the Online Methods section. It was stated: The algorithm performs SVD decomposition on the input count gene count matrix. The hidden matrix W is estimated by the left singular matrix (U) and the diagonal matrix, both from SVD.

But in the Bioconductor code implementation, https://github.com/drisso/RUVSeq/blob/master/R/RUVg-methods.R

``````svdWa <- svd(Ycenter[, cIdx])
first <- 1 + drop
k <- min(k, max(which(svdWa\$d > tolerance)))
W <- svdWa\$u[, (first:k), drop = FALSE]  <--- This line!!!! ONLY the U matrix is used. Bug?
alpha <- solve(t(W) %*% W) %*% t(W) %*% Y
correctedY <- Y - W %*% alpha
``````

The R implementation ignores the diagonal matrix, and only use the left singular matrix (U)

The code and the paper don't match. Why?

modified 3.1 years ago by debitboro180 • written 3.6 years ago by SmallChess510
2
debitboro180 wrote:

Hi student-t,

I stated the same thing when I tried to run a reduced example. I don't know why Risso used the U matrix only ??? The corresponding code to what it has been stated in the article would be like:

``````k <- min(k, max(which(svdWa\$d > tolerance)))
W <- svdWa\$u %*% svdWa\$d
``````

By doing that correction, I get a completely different result. I've to txt Risso about that.

Another thing, is about the calculation of alpha. We want to calculate it from the equation (1): But, what I see in the source code is

``````alpha <- solve(t(W) %*% W) %*% t(W) %*% Y
``````