Hi,
I want to a question in normalizing the gene RPKM with ERCC spike-in control. I found a paper using loess.normalize() to do that. But I am not sure how to set the parameter in it. When I set the 'mat' object as the rpkm of all genes and the 'subset' object as the rpkm of 92 ERCC spike-in control, I got the error like this:
           >loess.normalize(mat,subset)
            Error in x[index] : invalid subscript type 'list'
           > head(mat)
                          EL11       EL12        EL14      EL16        EL17      EL18
           A1BG      0.1163737  0.1097968  0.00000000  0.000000  0.00000000  0.000000
           A1BG-AS1  0.1447295  0.3641338  0.08168055  0.000000  0.26798862  0.000000
           A1CF      3.2351127  1.1802152  1.97185377  3.495367  5.63088272 39.181522
           A2M      63.2050908 56.7993487 48.27157466 53.446147 77.22512806 81.373673
           A2M-AS1   0.7987587  0.8866083  0.95462046  1.570776  2.08803204  2.410139
           A2ML1     0.0000000  0.0000000  0.00000000  0.000000  0.07982693  0.000000
           > head(subset)
                             EL11      EL12       EL14      EL16      EL17      EL18
           ERCC-00002 185.8553527 166.30548 112.652188 260.00801 391.25896 125.57826
           ERCC-00003   8.5380516  11.56205   9.013605  12.70220  23.29576  22.89986
           ERCC-00004 119.6550648 145.33323 115.764615  91.67239 296.32214 225.08306
           ERCC-00009  23.2876276  27.39052  21.217022  31.72389  45.10257  33.33047
           ERCC-00012   0.0000000   0.00000   0.000000   0.00000   0.00000   0.00000
           ERCC-00013   0.2543514   0.00000   0.000000   0.00000   0.00000   0.00000
Can anyone help this?
Thanks. Cam
Thanks a lot dpryan, after I do this one, I get another error: "> loess.normalize(mat,subset=which(row.names(subset) %in% row.names(mat))) Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : NA/NaN/Inf in foreign function call (arg 1) ". I really have no idea about this error. Can you give some guide about this or some another way to normalize by spike-in controls?
What's the output of which(row.names(subset) %in% row.names(mat))?
I got the same error to above one
I see there is an explanation about this in internet: These can occur if your background subtracted values are <= 0 in one of the channels, leading to a nonsensical log(ratio) value.
But I have no idea how to adjust it.
I've seen some papers that arbitrary add a very small number to all RPKM values prior to log2 normalization. Like log2(x+.01). Alternatively, you could apply your loess normalization to untransformed RPKM values, and just filter/remove genes/transcripts with RPKM < minimum cut off.