Find number of rows contain zero raw count across all sample (column) using R
1
0
Entering edit mode
6.9 years ago
Bioinfonext ▴ 460

Hi...

I have a dataframe with n columns and n rows. I need to find how many rows contains zero raw read count across all column.

Thanks

R • 24k views
ADD COMMENT
3
Entering edit mode

And what have you tried? Show some effort when asking questions by showing us what you tried and what didn't work.
Have a look at ?rowSums

ADD REPLY
0
Entering edit mode

I deleted gene name column:

mydatanew=mydata[,-1]

after that I used:

rowSums(mydatanew)

But it giving output like this:

[57817]    2677     405     443     193     281     229     227     343     752
[57826]     444     434     199     766     616     193     385    3230     251
[57835]    1348     953     176     209     271    2290    1063     476     753
[57844]    1264    2465    1657     200    1978    1916     709    5121    1926

I am trying to learn R .... I want the total row number across all column contain zero read count.

ADD REPLY
0
Entering edit mode

help(which)

ADD REPLY
0
Entering edit mode

I am trying to learn R

That's great, but you learn nothing by getting the solution from us. You may not like it, but the only way to learn how to program in R (and any other language) is to fail until you figure it out. It's going to take long, sometimes you will spend hours or days on an issue. But every time you will improve. Motivation: Suck until you don't.

ADD REPLY
1
Entering edit mode

You should write a philosophy book Wouter, really you think necessary answer "try by yourself"? Really? Has been more helpful writing help(which) by Devon than you in 2 comments.

ADD REPLY
1
Entering edit mode

"try by yourself" is a winning philosophy in life... Also, the only reason I didn't write that myself is because Wouter already did!

ADD REPLY
0
Entering edit mode

I'm sorry Devon, my point wasn't to be rude with your kid, my point is: Is it really necessary answer in that way? If it is, why don't program an automatic answer for your forum "Try by yourself"? (although this would rest upvotes to Wouter) not all are "experts", most of people are looking for quick solutions, it is clear that you will not make their thesis or save their jobs answering... or even better, if you're an expert and you are not interested in help inexperts, why not just ignore them?

Whatever, this is a very helpful forum and I think that comments like that one has no any sense. If you don't want to give a quick solution don't do it, give an advice (like help(which)). Make this forum one full of good advice, not one full of upvotes for egocentrism.

ADD REPLY
0
Entering edit mode

I presume that the reference to WouterDeCoster as "my kid" is an error in translation.

Anyway, at least attempting to figure it out yourself is an essential part of science in general and even more so in bioinformatics in particular. The exact same "what have you tried?" procedure is carried out in every wet-lab I've ever seen in the world. So, it's not even this site where such response are encouraged, but rather all of science where such replies should be expected. If people cannot demonstrate enough curiosity to at least make an effort at coming up with an answer themselves then they should not waste their time pursuing a future in science.

ADD REPLY
0
Entering edit mode

Thanks for the feedback.

ADD REPLY
0
Entering edit mode

If my answer was helpful you should upvote it,

ADD REPLY
0
Entering edit mode

And what if it wasn't?

In contrast to other posts here, you have not contributed at all to this thread. You accuse me of not being helpful, while my first post contained "Have a look at ?rowSums, which is pretty much what OP needs to fix this issue.

I see you made your account quite recently, so welcome to biostars. Perhaps it doesn't make a lot of sense to start criticising others after such a short period of observation. If your observation would be longer you would know that OP has asked tons of questions on this forum, for every step of the work he is doing. He is not doing himself a favor by copying every time what we write.

Therefore, you will notice that on an "open" question such as this one we will either ask for showing what OP tried, or give just a pointer in the right direction, such as suggesting rowSums() and which as you can see above.

if you're an expert and you are not interested in help inexperts, why not just ignore them?

The mere fact that we spend hours of our time here volunteering to help people doesn't really suggest that we don't want to help people. And if you don't like it here, feel free to go elsewhere. You'll find that biostars is the friendliest bioinformatics community.

You shouldn't summarize my answer to "try by yourself". Nobody can learn you how to ride a bike by showing you pictures of people on a bike. You have to get on the damn bike yourself and hit the ground often before you acquire the skill of driving a bike.

ADD REPLY
2
Entering edit mode

Thanks a lot for encouraging comment especially WouterDeCoster. I do not take his advise as a negative sense because I am in learning stage and really we learn thing better way when we try to resolve problem by ourselves, but sometime help is also needed.

dim(countMatrix)
[1] 57894    33
mydatanew=countMatrix[,-1]                  # first gene name column deleted
nonzero_row <- mydatanew[rowSums(mydatanew) > 0, ]  # filtered row read count above 0 
dim(nonzero_row)
[1] 48538    33

Thanks

ADD REPLY
1
Entering edit mode

I'm cleaning up a few comments here as the discussion seems to have gotten heated and out of hand.

ADD REPLY
0
Entering edit mode

I would prefer the rowSums function that doesn't require any information about your row names. Suppose you have a data frame called df.

You could run this:

keep = rowSums(counts(df)) >= 0
df.new <- df[keep,]

I usually use this in the Deseq2 package to filter out genes with low expression.

ADD REPLY
0
Entering edit mode
  1. An answer with this approach has already been given by another user. Your answer does not add anything to that.
  2. The results from your code will include rows with 0 counts, which OP wants to exclude, so it is not a correct answer.

I've moved your post to comment for the reasons above.

ADD REPLY
2
Entering edit mode
3.6 years ago
ATpoint 82k

A question with 13k views and no accepted answer, that is unfortunate.

Assuming the object is called y.

## Example data:
y <- sapply(1:4, function(x) rnorm(500,5,1))

Base R function:

isZero <- base::rowSums(y) == 0

If y is a matrix you can use matrixStats package which is often faster (for rowSums it is marginal, just wanted to mention the matrixStats package):

isZero <- matrixStats::rowSums2(y) == 0

If y is a count matrix from e.g. the single-cell world and in e.g. the dgeMatrix (or similar) format (common when working with Bioconductor tools on single-cell data), then use from the Matrix package:

## Matrix package function
isZero <- Matrix::rowSums(y) == 0

isZero is a logical vector that contains TRUE for rows where all samples have zero counts and FALSE if not. You can filter your data to remove the only-zero rows with:

y.filtered <- y[!isZero,]

For the number of rows with only zeros, use either of:

sum(isZero)

For those with not only zeros:

sum(!isZero)

Or both combined:

summary(isZero)
ADD COMMENT
1
Entering edit mode

I'd prefer the following table command over summary:

table(ifelse(isZero, "Zero", "Non-zero"))
ADD REPLY

Login before adding your answer.

Traffic: 2047 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6