check for zeros in every column of R data frame
2
0
Entering edit mode
7.7 years ago

Hi All,

I have a dataframe in R with rows as genes and columns as samples with expression values for each gene. I would like to check if all the samples (columns) for each gene have an expression value > 0 and remove this gene (row) if the number of zeros are more than a certian number (10 or 15).

Instead of writing a for loop and checking through every cell, is there a direct/easiest way to do this?

Thanks a lot.

cheers, Aravind

R dataframe • 21k views
ADD COMMENT
1
Entering edit mode

Google for "Pre filtering rna-seq data, deseq2"

ADD REPLY
1
Entering edit mode
7.7 years ago
VHahaut ★ 1.2k

Something like this should do the trick:

index <- rowSums( dataframe > threshold) >= 10 or 15
dataframe <- dataframe[index,]

And if it is related to differential expression analysis be careful in your choice of threshold.

ADD COMMENT
0
Entering edit mode
3.9 years ago

generate random data

df <- data.frame(matrix(rexp(50, rate=.1), ncol=5))
df
          X1        X2        X3        X4        X5
1   6.740580 8.0732001 11.559373  3.852068  2.867826
2   8.017221 6.5130366  2.704126  8.905308 12.894070
3   6.727714 3.2452073 10.196754  3.307838 25.877438
4   5.965735 9.4551141  5.846683 23.719300 11.091497
5  15.263253 0.1952222 15.024644 21.170202 19.905295
6  22.297499 1.9259003  3.370276  2.223720  5.634376
7  18.492478 8.9295170  1.240304 34.976717  9.909120
8   2.029231 1.2276215  5.696635  4.990737 27.469183
9   3.686119 5.0614293  2.240032 19.733418  1.801601
10  4.641686 5.0175711 16.526846 35.593992 13.176071

check if all samples (columns) have at least 1 zero

all(apply(apply(df, 2, function(x) x==0), 2, any))
[1] FALSE

impute zero values in 4 of the 5 samples

df[2,1] <- 0
df[3,2] <- 0
df[1,3] <- 0
df[7,4] <- 0

all(apply(apply(df, 2, function(x) x==0), 2, any))
[1] FALSE

okay, impute a zero in the fifth sample so that all now have a single zero

df[10,5] <- 0
df
          X1        X2        X3        X4        X5
1   6.740580 8.0732001  0.000000  3.852068  2.867826
2   0.000000 6.5130366  2.704126  8.905308 12.894070
3   6.727714 0.0000000 10.196754  3.307838 25.877438
4   5.965735 9.4551141  5.846683 23.719300 11.091497
5  15.263253 0.1952222 15.024644 21.170202 19.905295
6  22.297499 1.9259003  3.370276  2.223720  5.634376
7  18.492478 8.9295170  1.240304  0.000000  9.909120
8   2.029231 1.2276215  5.696635  4.990737 27.469183
9   3.686119 5.0614293  2.240032 19.733418  1.801601
10  4.641686 5.0175711 16.526846 35.593992  0.000000

all(apply(apply(df, 2, function(x) x==0), 2, any))
[1] TRUE
ADD COMMENT

Login before adding your answer.

Traffic: 2547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6