Question

How Do I Count The Number Values Less Than X In A Column Using R ?

5

Entering edit mode

14.2 years ago

Jason ▴ 940

Hello, I'm working with R and have obtained a table which contains 3 columns and a row for each of my genes in an RNA-seq study. The first two columns contain fold conc and log fold change, respectively, but I'm most interested in the third column and finding how many of the genes have a p.value less than 0.05. Also bonus points if you can tell me how to count genes with p-value < 0.05 and logFC > 1.

So can anyone help me write some lines in R which will count the number of genes with values less than 0.05 in the third column? Also my data is in a matrix table.

Thanks so much

(here is an illustration what I'm working with)

> head(edgeHSMvHSF$table)
    logConc         logFC   p.value
tag.1 -13.67186  -0.009198564 0.9914611
tag.2 -36.72240 -26.587301949 1.0000000
tag.3 -15.82522   0.272339616 0.6033744
tag.4 -15.35435  -0.253093387 0.6161544
tag.5 -18.13806  -0.079021620 1.0000000
tag.6 -15.43403   0.868789217 0.0904064

sorry looks like those numbers won't format well here

rna edger r counts • 90k views

ADD COMMENT • link updated 10.6 years ago by seidel 11k • written 14.2 years ago by Jason ▴ 940

1

Entering edit mode

where is the column that adjusts for multiple comparisons?

ADD REPLY • link 14.2 years ago by Jeremy Leipzig 23k

0

Entering edit mode

Thanks for formatting my post so the values don't look so cluttered.

ADD REPLY • link 14.2 years ago by Jason ▴ 940

Jeremy Leipzig · Answer 1 · 2011-04-12

15

Entering edit mode

14.2 years ago

Michael 56k

This is a basic R question, but still:

# accessing the p.values, should work like this, but you will fiddle that out yourself: 
# p.value <- edgeHSMvHSF$table$p.value # or edgeHSMvHSF$table[,3]
# Fcvalues:
# logFC <- edgeHSMvHSF$table$logFC # or edgeHSMvHSF$table[,2]
# or quicker:
# attach(edgeHSMvHSF$table) # table should be a data.frame
# or better:
with(edgeHSMvHSF$table, c(
    sum(p.value < 0.05),
    # bonus points pls ;)
    sum(p.value < 0.05 & logFC > 1)
))

ADD COMMENT • link updated 14.2 years ago by Jeremy Leipzig 23k • written 14.2 years ago by Michael 56k

4

Entering edit mode

But always remember "attach is considered harmful ..." Use expressions inside a call to with if you like, eg:

with(edgeHSMvHSF$table, sum(p.value < 0.05 & logFC > 1))

ADD REPLY • link 14.2 years ago by Steve Lianoglou 5.2k

3

Entering edit mode

Thanks, I actually figured out how to do the p.value and logFC separately while I was waiting for a reply, but that little & sign totally helped me out. I used this instead

sum(edgeHSMvHSF$table[,2]>1 & edgeHSMvHSF$table[,3]<0.05)

Which also gave me the same answer yours did. It's good to have a way to double check what I was doing.

Bonus points for you

ADD REPLY • link 14.2 years ago by Jason ▴ 940

1

Entering edit mode

if sum() is misleading you can also use length(which())

ADD REPLY • link 14.2 years ago by Jeremy Leipzig 23k

0

Entering edit mode

Agreed; attach causes all sorts of issues, avoid it.

ADD REPLY • link 14.2 years ago by Neilfws 49k

0

Entering edit mode

sure, attach is only for an interactive session and it can cause issues, though it is sometimes very nifty. Don't use it in functions or scripts. The clean way is to use <-

ADD REPLY • link 14.2 years ago by Michael 56k

0

Entering edit mode

Jeremy thanx for correcting the syntax error.

ADD REPLY • link 14.2 years ago by Michael 56k

0

Entering edit mode

Seems like sum() is faster than length(which) for very large vectors

ADD REPLY • link 14.1 years ago by Michael 56k

score 5 · Answer 2 · 2011-04-13

5

Entering edit mode

14.2 years ago

Manu Prestat 4.1k

You may be also interested in the subset function :

subset(edgeHSMvHSF$table,p.value<0.05 & abs(logFC) > 1)

Emmanuel

ADD COMMENT • link 14.2 years ago by Manu Prestat 4.1k

Michael · Answer 3 · 2011-04-12

1

Entering edit mode

14.2 years ago

Joshua ▴ 70

CountMySignificant <- function(x) {
count = 0
for(i in 1:length(x))
{
if(x[i] < 0.05)
{
count = count + 1
}
}
return(count)
}

Then just pass you column which i'm assuming would be something like. CountMySignificant(edgeHSMvHSF$table[,3])

ADD COMMENT • link updated 14.2 years ago by Michael 56k • written 14.2 years ago by Joshua ▴ 70

4

Entering edit mode

In R, for-loops are best avoided where possible.

ADD REPLY • link 14.2 years ago by biobot 0.0.77.a.1099 6.2k

4

Entering edit mode

I'm afraid this is an example of how not to write R code because it is much slower and far less readable than x<0.05.

ADD REPLY • link 14.2 years ago by Laurent ★ 1.7k

score 0 · Answer 4 · 2014-11-24

This is a bit terse, but would be one way to do it *instead* of using a for loop, and is more the R way. Use apply() on your matrix:

sum(apply(edgeHSMvHSF$table, 1, function(x){ ifelse(x[2] > 1 & x[3] < 0.05,TRUE,FALSE) })

The apply function takes three arguments: apply(matrix, rows or columns, function). The first is your matrix or data frame, the second is whether to work on rows (1) or columns (2), and the third is a function to apply to each row or column. You can place a custom function there. In the code above, each row of the matrix is handed to the function as a vector called x (you get to make up the name of the vector, I called it x), so all we have to do is check the 2nd and 3rd positions of x and return true or false. I'm using the ifelse statement: ifelse(test, return value if true, return value if false). So the apply statement above returns a boolean vector for each row on the matrix, reflecting if each row has a Fold Change > 1 AND a p-value < 0.05. You can sum a boolean vector to get a count of the number that's TRUE. If you want an index vector of the rows, you could use the boolean vector, or you could replace sum() with which().