Question: How Do I Count The Number Values Less Than X In A Column Using R ?
5
gravatar for Jason
7.3 years ago by
Jason840
United States
Jason840 wrote:

Hello, I'm working with R and have obtained a table which contains 3 columns and a row for each of my genes in an RNA-seq study. The first two columns contain fold conc and log fold change, respectively, but I'm most interested in the third column and finding how many of the genes have a p.value less than 0.05. Also bonus points if you can tell me how to count genes with p-value < 0.05 and logFC > 1.

So can anyone help me write some lines in R which will count the number of genes with values less than 0.05 in the third column? Also my data is in a matrix table.

Thanks so much

(here is an illustration what I'm working with)

> head(edgeHSMvHSF$table)
    logConc         logFC   p.value
tag.1 -13.67186  -0.009198564 0.9914611
tag.2 -36.72240 -26.587301949 1.0000000
tag.3 -15.82522   0.272339616 0.6033744
tag.4 -15.35435  -0.253093387 0.6161544
tag.5 -18.13806  -0.079021620 1.0000000
tag.6 -15.43403   0.868789217 0.0904064

sorry looks like those numbers won't format well here

R rna counts edger • 48k views
ADD COMMENTlink modified 3.6 years ago by seidel6.5k • written 7.3 years ago by Jason840
1

where is the column that adjusts for multiple comparisons?

ADD REPLYlink written 7.3 years ago by Jeremy Leipzig17k

Thanks for formatting my post so the values don't look so cluttered.

ADD REPLYlink written 7.3 years ago by Jason840
15
gravatar for Michael Dondrup
7.3 years ago by
Bergen, Norway
Michael Dondrup44k wrote:

This is a basic R question, but still:

# accessing the p.values, should work like this, but you will fiddle that out yourself: 
# p.value <- edgeHSMvHSF$table$p.value # or edgeHSMvHSF$table[,3]
# Fcvalues:
# logFC <- edgeHSMvHSF$table$logFC # or edgeHSMvHSF$table[,2]
# or quicker:
# attach(edgeHSMvHSF$table) # table should be a data.frame
# or better:
with(edgeHSMvHSF$table, c(
    sum(p.value < 0.05),
    # bonus points pls ;)
    sum(p.value < 0.05 & logFC > 1)
))
ADD COMMENTlink modified 7.3 years ago by Jeremy Leipzig17k • written 7.3 years ago by Michael Dondrup44k
4

But always remember "attach is considered harmful ..." Use expressions inside a call to with if you like, eg:

with(edgeHSMvHSF$table, sum(p.value < 0.05 & logFC > 1))
ADD REPLYlink written 7.3 years ago by Steve Lianoglou4.9k
3

Thanks, I actually figured out how to do the p.value and logFC separately while I was waiting for a reply, but that little & sign totally helped me out. I used this instead

sum(edgeHSMvHSF$table[,2]>1 & edgeHSMvHSF$table[,3]<0.05)

Which also gave me the same answer yours did. It's good to have a way to double check what I was doing.

Bonus points for you

ADD REPLYlink written 7.3 years ago by Jason840
1

if sum() is misleading you can also use length(which())

ADD REPLYlink written 7.3 years ago by Jeremy Leipzig17k

Agreed; attach causes all sorts of issues, avoid it.

ADD REPLYlink written 7.3 years ago by Neilfws48k

sure, attach is only for an interactive session and it can cause issues, though it is sometimes very nifty. Don't use it in functions or scripts. The clean way is to use <-

ADD REPLYlink written 7.3 years ago by Michael Dondrup44k

Jeremy thanx for correcting the syntax error.

ADD REPLYlink written 7.3 years ago by Michael Dondrup44k

Seems like sum() is faster than length(which) for very large vectors

ADD REPLYlink written 7.1 years ago by Michael Dondrup44k
5
gravatar for Manu Prestat
7.3 years ago by
Manu Prestat3.8k
Marseille, France
Manu Prestat3.8k wrote:

You may be also interested in the subset function :

subset(edgeHSMvHSF$table,p.value<0.05 & abs(logFC) > 1)

Emmanuel

ADD COMMENTlink written 7.3 years ago by Manu Prestat3.8k
1
gravatar for Joshua
7.3 years ago by
Joshua70
Joshua70 wrote:

CountMySignificant <- function(x) {
count = 0
for(i in 1:length(x))
{
if(x[i] < 0.05)
{
count = count + 1
}
}
return(count)
}

Then just pass you column which i'm assuming would be something like. CountMySignificant(edgeHSMvHSF$table[,3])

ADD COMMENTlink modified 7.3 years ago by Michael Dondrup44k • written 7.3 years ago by Joshua70
4

In R, for-loops are best avoided where possible.

ADD REPLYlink written 7.3 years ago by iw9oel_ad6.0k
4

I'm afraid this is an example of how not to write R code because it is much slower and far less readable than x<0.05.

ADD REPLYlink written 7.3 years ago by Laurent1.6k
0
gravatar for seidel
3.6 years ago by
seidel6.5k
United States
seidel6.5k wrote:

This is a bit terse, but would be one way to do it *instead* of using a for loop, and is more the R way. Use apply() on your matrix:

sum(apply(edgeHSMvHSF$table, 1, function(x){ ifelse(x[2] > 1 & x[3] < 0.05,TRUE,FALSE) })

The apply function takes three arguments: apply(matrix, rows or columns, function). The first is your matrix or data frame, the second is whether to work on rows (1) or columns (2), and the third is a function to apply to each row or column. You can place a custom function there. In the code above, each row of the matrix is handed to the function as a vector called x (you get to make up the name of the vector, I called it x), so all we have to do is check the 2nd and 3rd positions of x and return true or false. I'm using the ifelse statement: ifelse(test, return value if true, return value if false). So the apply statement above returns a boolean vector for each row on the matrix, reflecting if each row has a Fold Change > 1 AND a p-value < 0.05. You can sum a boolean vector to get a count of the number that's TRUE. If you want an index vector of the rows, you could use the boolean vector, or you could replace sum() with which().

ADD COMMENTlink written 3.6 years ago by seidel6.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1533 users visited in the last hour