Question

Strategies to offset the effects of row-wise correction from missing data

0

Entering edit mode

4.0 years ago

strkiky2 • 0

Here's a dataset

data <- t(data.frame(met1 = c(2,2,2,2,2),
                   met2 = c(5,4,NA,2,1),
                   met3 = c(2,2,2,NA,2),
                   met4 = c(2,4,6,8,6),
                   met5 = c(1,3,4,7,2)))

This gives:

  [,1] [,2] [,3] [,4] [,5]
met1    2    2    2    2    2
met2    5    4   NA    2    1
met3    2    2    2   NA    2
met4    2    4    6    8    6
met5    1    3    4    7    2

I often conduct row-wise correction on my dataset. Which divide all the values after summing, meaning that all the values are between 0 and 1.

data <- data / rowSums(data, na.rm = TRUE)

This works great when there's no missing data. But as you can see when comparing met1 and met3, each value of met3 is considerably higher than met1 due to the missing data.

           [,1]      [,2]      [,3]      [,4]       [,5]
met1 0.20000000 0.2000000 0.2000000 0.2000000 0.20000000
met2 0.41666667 0.3333333        NA 0.1666667 0.08333333
met3 0.25000000 0.2500000 0.2500000        NA 0.25000000
met4 0.07692308 0.1538462 0.2307692 0.3076923 0.23076923
met5 0.05882353 0.1764706 0.2352941 0.4117647 0.11764706

How could I offset this effect? Currently I've removed any column with missing data, but I prefer not doing so as some important data could be removed.

R • 525 views

ADD COMMENT • link updated 4.0 years ago by Biostar 20 • written 4.0 years ago by strkiky2 • 0

0

Entering edit mode

You can replace them with the row means. https://stackoverflow.com/questions/6918086/replace-na-values-by-row-means

ADD REPLY • link 4.0 years ago by lessismore ★ 1.3k