Question: [R] Removing columns from big.matrix which have only one value
0
gravatar for jackarnestad
21 months ago by
jackarnestad0 wrote:

I have a very large binary matrix, stored as a big.matrix to conserve memory (it is over 2 gb otherwise - 5 million columns and 100 rows).

r <- 100
c <- 10000
m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c)
m4 <- cbind(m4, 1)
m4 <- bigmemory::as.big.matrix(m4)

I need to remove every column which has only one unique value (in this case, only 0s or only 1s). Because of the number of columns, I want to be able to do this in parallel.

How can I accomplish this while keeping the data compressed as a big.matrix? I can convert it into a df and loop over the columns looking for the number of unique values, but this takes too much RAM.

Thanks!

EDIT: It is bioinformatics as each column is actually a protein subsequence. I am running fisher's exact to select important features, but before that, I must remove features that are present in all samples.

R • 452 views
ADD COMMENTlink modified 21 months ago • written 21 months ago by jackarnestad0

This is purely an R question. How is it bioinformatics?

ADD REPLYlink written 21 months ago by RamRS25k

Hello jackarnestad!

We believe that this post does not fit the main topic of this site.

Please tell us how this is related to bioinformatics and we will reopen the question.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink written 21 months ago by RamRS25k

I addressed the bioinformatics aspect in my edit. Thanks!

ADD REPLYlink written 21 months ago by jackarnestad0

Thanks for clarifying. This is indeed a question applied to bioinformatics, but R questions like this might get a quicker answer at bioconductor support or stackoverflow. But you can still be lucky that someone here can help you, so let's wait a bit before cross posting...

ADD REPLYlink written 21 months ago by WouterDeCoster42k

Could you include the package where big.matrix is defined in your code

ADD REPLYlink written 21 months ago by russhh4.9k

Added it to the code, bigmemory

ADD REPLYlink written 21 months ago by jackarnestad0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1201 users visited in the last hour