Question: How Do I Count The Number Values Less Than X In A Column Using R ?
5
7.5 years ago by
Jason850
United States
Jason850 wrote:

Hello, I'm working with R and have obtained a table which contains 3 columns and a row for each of my genes in an RNA-seq study. The first two columns contain fold conc and log fold change, respectively, but I'm most interested in the third column and finding how many of the genes have a p.value less than 0.05. Also bonus points if you can tell me how to count genes with p-value < 0.05 and logFC > 1.

So can anyone help me write some lines in R which will count the number of genes with values less than 0.05 in the third column? Also my data is in a matrix table.

Thanks so much

(here is an illustration what I'm working with)

``````> head(edgeHSMvHSF\$table)
logConc         logFC   p.value
tag.1 -13.67186  -0.009198564 0.9914611
tag.2 -36.72240 -26.587301949 1.0000000
tag.3 -15.82522   0.272339616 0.6033744
tag.4 -15.35435  -0.253093387 0.6161544
tag.5 -18.13806  -0.079021620 1.0000000
tag.6 -15.43403   0.868789217 0.0904064
``````

sorry looks like those numbers won't format well here

R rna counts edger • 50k views
modified 3.8 years ago by seidel6.6k • written 7.5 years ago by Jason850
1

where is the column that adjusts for multiple comparisons?

Thanks for formatting my post so the values don't look so cluttered.

15
7.5 years ago by
Bergen, Norway
Michael Dondrup44k wrote:

This is a basic R question, but still:

``````# accessing the p.values, should work like this, but you will fiddle that out yourself:
# p.value <- edgeHSMvHSF\$table\$p.value # or edgeHSMvHSF\$table[,3]
# Fcvalues:
# logFC <- edgeHSMvHSF\$table\$logFC # or edgeHSMvHSF\$table[,2]
# or quicker:
# attach(edgeHSMvHSF\$table) # table should be a data.frame
# or better:
with(edgeHSMvHSF\$table, c(
sum(p.value < 0.05),
# bonus points pls ;)
sum(p.value < 0.05 & logFC > 1)
))
``````
4

But always remember "`attach` is considered harmful ..." Use expressions inside a call to `with` if you like, eg:

``````with(edgeHSMvHSF\$table, sum(p.value < 0.05 & logFC > 1))
``````
3

Thanks, I actually figured out how to do the p.value and logFC separately while I was waiting for a reply, but that little & sign totally helped me out. I used this instead

sum(edgeHSMvHSF\$table[,2]>1 & edgeHSMvHSF\$table[,3]<0.05)

Which also gave me the same answer yours did. It's good to have a way to double check what I was doing.

Bonus points for you

1

if sum() is misleading you can also use length(which())

Agreed; attach causes all sorts of issues, avoid it.

sure, attach is only for an interactive session and it can cause issues, though it is sometimes very nifty. Don't use it in functions or scripts. The clean way is to use <-

Jeremy thanx for correcting the syntax error.

Seems like sum() is faster than length(which) for very large vectors

5
7.5 years ago by
Manu Prestat3.8k
Marseille, France
Manu Prestat3.8k wrote:

You may be also interested in the subset function :

subset(edgeHSMvHSF\$table,p.value<0.05 & abs(logFC) > 1)

Emmanuel

1
7.5 years ago by
Joshua70
Joshua70 wrote:

CountMySignificant <- function(x) {
count = 0
for(i in 1:length(x))
{
if(x[i] < 0.05)
{
count = count + 1
}
}
return(count)
}

Then just pass you column which i'm assuming would be something like. CountMySignificant(edgeHSMvHSF\$table[,3])

4

In R, for-loops are best avoided where possible.

4

I'm afraid this is an example of how not to write R code because it is much slower and far less readable than x<0.05.

0
3.8 years ago by
seidel6.6k
United States
seidel6.6k wrote:

This is a bit terse, but would be one way to do it *instead* of using a for loop, and is more the R way. Use apply() on your matrix:

`sum(apply(edgeHSMvHSF\$table, 1, function(x){ ifelse(x[2] > 1 & x[3] < 0.05,TRUE,FALSE) })`

The apply function takes three arguments: apply(matrix, rows or columns, function). The first is your matrix or data frame, the second is whether to work on rows (1) or columns (2), and the third is a function to apply to each row or column. You can place a custom function there. In the code above, each row of the matrix is handed to the function as a vector called x (you get to make up the name of the vector, I called it x), so all we have to do is check the 2nd and 3rd positions of x and return true or false. I'm using the ifelse statement: ifelse(test, return value if true, return value if false). So the apply statement above returns a boolean vector for each row on the matrix, reflecting if each row has a Fold Change > 1 AND a p-value < 0.05. You can sum a boolean vector to get a count of the number that's TRUE. If you want an index vector of the rows, you could use the boolean vector, or you could replace sum() with which().