Question

baseline error distribution and missing values

0

Entering edit mode

5.6 years ago

vasilislenis ▴ 150

Hi all,

I am trying to apply the LPE approach on a gene expression set that I have but I have stacked in the calculation of the baseline distribution (I am following the steps that I found on the Bioconductor for the LPEadj package) . I am trying to run the following command:

var.1 <- baseOlig.error(G13[,2:12],q=0.01)

And I am receiving the following error:

Error in if (sum(A == min(A)) > (q * length(A))) { : 
missing value where TRUE/FALSE needed

I realize that the error occurs due to some missing values in my data. So, is there any way to make the function to ignore these missing values? I tried to ignore the lines that include at least one "NA", but each line represents a unique gene and I am losing it.

Thank you very much in advance, Vasilis.

gene expression LPE Transcriptomics • 1.1k views

ADD COMMENT • link updated 5.6 years ago by Kevin Blighe 87k • written 5.6 years ago by vasilislenis ▴ 150

score 1 · Answer 1 · 2018-09-16

1

Entering edit mode

5.6 years ago

Kevin Blighe 87k

Hi Vasilis,

I looked at the code of the internal functions of baseOlig.error (baseOlig.error.step1, baseOlig.error.step2, am.trans) and it looks like you just have to put a lot of na.rm = TRUE after each sum, min, max, and mean command, i.e., edit the functions.

Just type baseOlig.error at the command prompt and you will see the code.

Also, when running it, you should set the value of the stats parameter as follows:

var.1 <- baseOlig.error(G13[,2:12], stats=function(x) median(x, na.rm=TRUE), q=0.01)

Another option is to set the NAs to 0:

G13[is.na(G13)] <- 0

Kevin

ADD COMMENT • link 5.6 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you very much, Kevin. I can see your point. I will generate each of these functions with the '''na.rm''' option enabled, but what do you mean with the '''stats=function(x)'''? Which will be the '''function(x)'''?

The NAs <- 0 is much easier but I believe that I'm missing information with this approach. I have tried to replace the NAs with the mean value of the row (gene in different samples) but again I am not so sure for this.

ADD REPLY • link 5.6 years ago by vasilislenis ▴ 150

0

Entering edit mode

I tried it myself (editing the code) and ended up getting new errors, but I believe it is possible to do.

Regarding the function command, I should actually have written (now edited, above):

stats=function(x) median(x, na.rm=TRUE)

This is the same as just doing things like this:

calculateMedian <- function(x) {median(x)}
calculateMedian(c(1,2,3,4,5,6))
[1] 3.5
calculateMedian(c(1,2,3,4,5,6,NA))
[1] NA

Now add na.rm=TRUE:

calculateMedian <- function(x) {median(x, na.rm=TRUE)}
calculateMedian(c(1,2,3,4,5,6,NA))

ADD REPLY • link 5.6 years ago by Kevin Blighe 87k