Question In Normalizing With Ercc Spike-In Control
8.7 years ago
camelbbs ▴ 680

Hi,

I want to a question in normalizing the gene RPKM with ERCC spike-in control. I found a paper using loess.normalize() to do that. But I am not sure how to set the parameter in it. When I set the 'mat' object as the rpkm of all genes and the 'subset' object as the rpkm of 92 ERCC spike-in control, I got the error like this:

           >loess.normalize(mat,subset)
Error in x[index] : invalid subscript type 'list'

EL11       EL12        EL14      EL16        EL17      EL18
A1BG      0.1163737  0.1097968  0.00000000  0.000000  0.00000000  0.000000
A1BG-AS1  0.1447295  0.3641338  0.08168055  0.000000  0.26798862  0.000000
A1CF      3.2351127  1.1802152  1.97185377  3.495367  5.63088272 39.181522
A2M      63.2050908 56.7993487 48.27157466 53.446147 77.22512806 81.373673
A2M-AS1   0.7987587  0.8866083  0.95462046  1.570776  2.08803204  2.410139
A2ML1     0.0000000  0.0000000  0.00000000  0.000000  0.07982693  0.000000

EL11      EL12       EL14      EL16      EL17      EL18
ERCC-00002 185.8553527 166.30548 112.652188 260.00801 391.25896 125.57826
ERCC-00003   8.5380516  11.56205   9.013605  12.70220  23.29576  22.89986
ERCC-00004 119.6550648 145.33323 115.764615  91.67239 296.32214 225.08306
ERCC-00009  23.2876276  27.39052  21.217022  31.72389  45.10257  33.33047
ERCC-00012   0.0000000   0.00000   0.000000   0.00000   0.00000   0.00000
ERCC-00013   0.2543514   0.00000   0.000000   0.00000   0.00000   0.00000


Can anyone help this?

Thanks. Cam

8.7 years ago

You're not giving normalize.loess a list of indices for the subset= option, which is what it's expecting. From the names you used, I assume that "subset" is a subset of mat. In that case, just pass a vector of the index of those:

loess.normalize(mat,subset=which(row.names(mat) %in% row.names(subset)))


or something like that. You can see this in the example at the bottom of help(normalize.loess), where "subset=1:nrow(x)" is used.

Thanks a lot dpryan, after I do this one, I get another error: "> loess.normalize(mat,subset=which(row.names(subset) %in% row.names(mat))) Error in simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, : NA/NaN/Inf in foreign function call (arg 1) ". I really have no idea about this error. Can you give some guide about this or some another way to normalize by spike-in controls?

What's the output of which(row.names(subset) %in% row.names(mat))?

which(row.names(subset) %in% row.names(mat))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 [37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 [73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92

I got the same error to above one

I see there is an explanation about this in internet: These can occur if your background subtracted values are <= 0 in one of the channels, leading to a nonsensical log(ratio) value.

But I have no idea how to adjust it.

I've seen some papers that arbitrary add a very small number to all RPKM values prior to log2 normalization. Like log2(x+.01). Alternatively, you could apply your loess normalization to untransformed RPKM values, and just filter/remove genes/transcripts with RPKM < minimum cut off.