How Do I Count The Number Values Less Than X In A Column Using R ?
4
5
Entering edit mode
10.0 years ago
Jason ▴ 900

Hello, I'm working with R and have obtained a table which contains 3 columns and a row for each of my genes in an RNA-seq study. The first two columns contain fold conc and log fold change, respectively, but I'm most interested in the third column and finding how many of the genes have a p.value less than 0.05. Also bonus points if you can tell me how to count genes with p-value < 0.05 and logFC > 1.

So can anyone help me write some lines in R which will count the number of genes with values less than 0.05 in the third column? Also my data is in a matrix table.

Thanks so much

(here is an illustration what I'm working with)

> head(edgeHSMvHSF$table)
    logConc         logFC   p.value
tag.1 -13.67186  -0.009198564 0.9914611
tag.2 -36.72240 -26.587301949 1.0000000
tag.3 -15.82522   0.272339616 0.6033744
tag.4 -15.35435  -0.253093387 0.6161544
tag.5 -18.13806  -0.079021620 1.0000000
tag.6 -15.43403   0.868789217 0.0904064

sorry looks like those numbers won't format well here

rna edger r counts • 72k views
ADD COMMENT
1
Entering edit mode

where is the column that adjusts for multiple comparisons?

ADD REPLY
0
Entering edit mode

Thanks for formatting my post so the values don't look so cluttered.

ADD REPLY
15
Entering edit mode
10.0 years ago

This is a basic R question, but still:

# accessing the p.values, should work like this, but you will fiddle that out yourself: 
# p.value <- edgeHSMvHSF$table$p.value # or edgeHSMvHSF$table[,3]
# Fcvalues:
# logFC <- edgeHSMvHSF$table$logFC # or edgeHSMvHSF$table[,2]
# or quicker:
# attach(edgeHSMvHSF$table) # table should be a data.frame
# or better:
with(edgeHSMvHSF$table, c(
    sum(p.value < 0.05),
    # bonus points pls ;)
    sum(p.value < 0.05 & logFC > 1)
))
ADD COMMENT
4
Entering edit mode

But always remember "attach is considered harmful ..." Use expressions inside a call to with if you like, eg:

with(edgeHSMvHSF$table, sum(p.value < 0.05 & logFC > 1))
ADD REPLY
3
Entering edit mode

Thanks, I actually figured out how to do the p.value and logFC separately while I was waiting for a reply, but that little & sign totally helped me out. I used this instead

sum(edgeHSMvHSF$table[,2]>1 & edgeHSMvHSF$table[,3]<0.05)

Which also gave me the same answer yours did. It's good to have a way to double check what I was doing.

Bonus points for you

ADD REPLY
1
Entering edit mode

if sum() is misleading you can also use length(which())

ADD REPLY
0
Entering edit mode

Agreed; attach causes all sorts of issues, avoid it.

ADD REPLY
0
Entering edit mode

sure, attach is only for an interactive session and it can cause issues, though it is sometimes very nifty. Don't use it in functions or scripts. The clean way is to use <-

ADD REPLY
0
Entering edit mode

Jeremy thanx for correcting the syntax error.

ADD REPLY
0
Entering edit mode

Seems like sum() is faster than length(which) for very large vectors

ADD REPLY
5
Entering edit mode
10.0 years ago

You may be also interested in the subset function :

subset(edgeHSMvHSF$table,p.value<0.05 & abs(logFC) > 1)

Emmanuel

ADD COMMENT
1
Entering edit mode
10.0 years ago
Joshua ▴ 70

CountMySignificant <- function(x) {
count = 0
for(i in 1:length(x))
{
if(x[i] < 0.05)
{
count = count + 1
}
}
return(count)
}

Then just pass you column which i'm assuming would be something like. CountMySignificant(edgeHSMvHSF$table[,3])

ADD COMMENT
4
Entering edit mode

In R, for-loops are best avoided where possible.

ADD REPLY
4
Entering edit mode

I'm afraid this is an example of how not to write R code because it is much slower and far less readable than x<0.05.

ADD REPLY
0
Entering edit mode
6.4 years ago
seidel 7.6k

This is a bit terse, but would be one way to do it *instead* of using a for loop, and is more the R way. Use apply() on your matrix:

sum(apply(edgeHSMvHSF$table, 1, function(x){ ifelse(x[2] > 1 & x[3] < 0.05,TRUE,FALSE) })

The apply function takes three arguments: apply(matrix, rows or columns, function). The first is your matrix or data frame, the second is whether to work on rows (1) or columns (2), and the third is a function to apply to each row or column. You can place a custom function there. In the code above, each row of the matrix is handed to the function as a vector called x (you get to make up the name of the vector, I called it x), so all we have to do is check the 2nd and 3rd positions of x and return true or false. I'm using the ifelse statement: ifelse(test, return value if true, return value if false). So the apply statement above returns a boolean vector for each row on the matrix, reflecting if each row has a Fold Change > 1 AND a p-value < 0.05. You can sum a boolean vector to get a count of the number that's TRUE. If you want an index vector of the rows, you could use the boolean vector, or you could replace sum() with which().

ADD COMMENT

Login before adding your answer.

Traffic: 1403 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6