log2 normalization produces NaNs
2
0
Entering edit mode
3 months ago

I want to calculate the global median normalization on 4 arrays using Cy5 background subtracted from Cy5 foreground values.

> for(i in 1:4){   name <- paste("sample", i, sep = ".")   bg <-
> maRb(dat[,i])   fg <- maRf(dat[,i])   diff <- fg - bg }
>
> assign(name, log2(diff))
>
> data.prenorm <- cbind(sample.1, sample.2, sample.3, sample.4)
> data.median  <- apply(data.prenorm, 2, median, na.rm = T) data.norm
> <- sweep(data.prenorm, 2, data.median)
>
> colnames(data.norm) <- c("Array 1", "Array 2", "Array 3", "Array 4")
>
> median(data.norm[ , 1], na.rm = T)  median(data.norm[ , 2], na.rm = T)
> median(data.norm[ , 3], na.rm = T) median(data.norm[ , 4], na.rm = T)


My code produces a warning message in R: In assign(name, log2(diff)) : NaNs produced

r bioinfomatics • 293 views
0
Entering edit mode

What Mensur Dlakic said + I would encourage you to use a dedicated package for normalization of data. Most likely there are several out there, arrays are really not new technology and extensive methodology for analysis has been developed already.

1
Entering edit mode
3 months ago
Mensur Dlakic ★ 14k

Logarithm of zero is undefined. Try adding 1 to all the values before applying the log function.

0
Entering edit mode
3 months ago

Firstly, instead of adding 1 to all the values, I decided to assign all the negative values as NA instead.

for(i in 1:4){
name <- paste("sample", i, sep = ".")
bg <- maRb(dat[,i])
fg <- maRf(dat[,i])
diff <- fg - bg
diff[diff < 0] <- NA
assign(name, log2(diff))
}


However, after that, I have the following issue. I want to calculate the global median normalization on these 4 arrays using the log2(diff) values, such that all the arrays will have a median of 1 after normalization.

data.prenorm <- cbind(sample.1, sample.2, sample.3, sample.4)
data.median  <- apply(data.prenorm, 2, median, na.rm = T)
data.norm    <- sweep(data.prenorm, 2, data.median)

colnames(data.norm) <- c("Array 1", "Array 2", "Array 3", "Array 4")

median(data.norm[ , 1], na.rm = T)
median(data.norm[ , 2], na.rm = T)
median(data.norm[ , 3], na.rm = T)
median(data.norm[ , 4], na.rm = T)


However, all the median evaluates to 0 instead of 1. Why?